前言
这里无损内录指的是只录制进程内部播放的声音,而避免录制到麦克风(以及其他进程)的声音。
为什么写这一篇,因为个人业余做着玩的浏览器产品加入此功能后,根据Firebase最近三个月统计到的事件数量,确实几乎没人使用,虽然或许和用户来源有关,但我已经失去了兴趣,准备干脆移除掉此功能,但实现它花了不少时间,不想这段经历直接蒸发,就大概记录分享点东西。
这技术是中立的,例如它非常适合用来实现浏览器播放视频实时翻译、在线会议内容录制等,无需任何权限,也无需额外携带几十上百兆的浏览器内核。但我不想直接开源,因为它存在被恶意滥用于录制有版权音乐的风险。我上架到Play商店之后也偶尔担心下架甚至封号问题,做了免责声明,以及录制时间等各种限制。
关于音频内录,如果是要录制其他App,Android10新增的了相应的API,需要申请录音权限,并且需要经过被录制方的允许(取决于它是否在manifest声明allowAudioPlaybackCapture
(默认值取决于targetSDK版本),以及播放音频时是否通过setAllowedCapturePolicy
设置捕获政策)。
下文Hook框架使用ShadowHook
,chromium源码版本126.0.6429.1
,只考虑的最低Android版本API26
,参考的最低chromium版本65.0.3325.229
。本人尚未从源码运行过Chromium,所以下文一些理论只是从阅读源码和现象推测,可能有BUG,欢迎&感谢指点。
Hook什么API
Android系统中,播放音频的API主要有几种,AudioTrack
、OpenSLES
、AAudio
。浏览器内核使用C++实现,感觉不会用Java层面的AudioTrack输出音频,猜完了,直接看源码media\audio\audio_manager_base.cc
:
c++
AudioOutputStream* AudioManagerBase::MakeAudioOutputStream(
const AudioParameters& params,
const std::string& device_id,
const LogCallback& log_callback) {
//... 忽略
AudioOutputStream* stream;
switch (params.format()) {
case AudioParameters::AUDIO_PCM_LINEAR:
DCHECK(AudioDeviceDescription::IsDefaultDevice(device_id))
<< "AUDIO_PCM_LINEAR supports only the default device.";
stream = MakeLinearOutputStream(params, log_callback);
break;
case AudioParameters::AUDIO_PCM_LOW_LATENCY:
stream = MakeLowLatencyOutputStream(params, device_id, log_callback);
break;
case AudioParameters::AUDIO_BITSTREAM_AC3:
case AudioParameters::AUDIO_BITSTREAM_EAC3:
case AudioParameters::AUDIO_BITSTREAM_DTS:
case AudioParameters::AUDIO_BITSTREAM_DTS_HD:
case AudioParameters::AUDIO_BITSTREAM_DTSX_P2:
case AudioParameters::AUDIO_BITSTREAM_IEC61937:
stream = MakeBitstreamOutputStream(params, device_id, log_callback);
break;
case AudioParameters::AUDIO_FAKE:
stream = FakeAudioOutputStream::MakeFakeStream(this, params);
break;
default:
stream = nullptr;
break;
}
//... 忽略
return stream;
}
从上面代码可以看到,具体输出音频的实现取决于params
参数,分别有MakeLinearOutputStream
、MakeLowLatencyOutputStream
、MakeBitstreamOutputStream
、FakeAudioOutputStream::MakeFakeStream
。
前三者是交给AudioManagerBase
的子类去实现,不同的平台有不同的子类,直接去看AudioManagerAndroid
类,其中MakeLowLatencyOutputStream
源码如下,MakeLinearOutputStream
内部也和它差不多:
c++
AudioOutputStream* AudioManagerAndroid::MakeLowLatencyOutputStream(
const AudioParameters& params,
const std::string& device_id,
const LogCallback& log_callback) {
DCHECK_EQ(AudioParameters::AUDIO_PCM_LOW_LATENCY, params.format());
if (__builtin_available(android AAUDIO_MIN_API, *)) {
if (UseAAudioOutput()) {
const aaudio_usage_t usage = communication_mode_is_on_
? AAUDIO_USAGE_VOICE_COMMUNICATION
: AAUDIO_USAGE_MEDIA;
return new AAudioOutputStream(this, params, usage);
}
}
// Set stream type which matches the current system-wide audio mode used by
// the Android audio manager.
const SLint32 stream_type = communication_mode_is_on_
? SL_ANDROID_STREAM_VOICE
: SL_ANDROID_STREAM_MEDIA;
return new OpenSLESOutputStream(this, params, stream_type);
}
其中AAUDIO_MIN_API
宏定义如下:
cpp
// For use with REQUIRES_ANDROID_API() and __builtin_available().
// We need APIs that weren't added until API Level 28. Also, AAudio crashes
// on P, so only consider Q and above.
#define AAUDIO_MIN_API 29
现在可以确定,Android10及更高版本优先使用AAudio,低Android版本使用OpenSLES。也不完全是这样,许多设备根本就没有AAudio这个库,例如Android12的华为设备。我的小项目线上有做统计是否存在64/32位aaudio库,现在顺便分享个数据:
反正无论如何,OpenSLES和AAudio都是要HOOK的了。
再看MakeBitstreamOutputStream
:
cpp
AudioOutputStream* AudioManagerAndroid::MakeBitstreamOutputStream(
const AudioParameters& params,
const std::string& device_id,
const LogCallback& log_callback) {
DCHECK(params.IsBitstreamFormat());
return new AudioTrackOutputStream(this, params);
}
cpp
bool AudioTrackOutputStream::Open() {
DCHECK(audio_manager_->GetTaskRunner()->BelongsToCurrentThread());
JNIEnv* env = AttachCurrentThread();
j_audio_output_stream_.Reset(Java_AudioTrackOutputStream_create(env));
//... 忽略
return Java_AudioTrackOutputStream_open(env, j_audio_output_stream_,
params_.channels(),
params_.sample_rate(), format);
}
还有一套反射调用Java层AudioTrack
API的实现。只是个人推测这种方法播放效率有限,以及经过大量测试,并没发现哪个网站会让Chromium输出比特流(Bitstream),但还没有在代码中找到线索。//TODO
..
至于MakeFakeStream
基本不用看,通常是音频格式错误,或者禁用了音频输出才会走这里,参考代码:
cpp
// If audio has been disabled force usage of a fake audio stream.
if (base::CommandLine::ForCurrentProcess()->HasSwitch(
switches::kDisableAudioOutput)) {
output_params.set_format(AudioParameters::AUDIO_FAKE);
}
cpp
// We've received invalid audio output parameters, so switch to a mock
// output device based on the input parameters. This may happen if the OS
// provided us junk values for the hardware configuration.
LOG(ERROR) << "Invalid audio output parameters received; using fake "
<< "audio path: " << output_params.AsHumanReadableString();
// Tell the AudioManager to create a fake output device.
output_params = params;
output_params.set_format(AudioParameters::AUDIO_FAKE);
uma_stream_format = STREAM_FORMAT_PCM_LOW_LATENCY_FALLBACK_TO_FAKE;
至于Chromium内核内部的函数,不太适合Hook,因为一个.so
上百兆,肯定要去除符号减小体积的,不同机型、系统的Chromium版本又多又杂,海外设备基本都会通过Play商店更新Chromium内核,难找特征码,所以Hook Aaudio和OpenSLES暴露的API基本是最完美的办法。
为什么能HOOK
人们常说android Webview在android8之后默认运行在沙盒进程,为什么还能在自己的App进程Hook到音频API呢?这里可以参考media\README.md
的描述:
- audio/ - Code for audio input and output. Includes platform specific output and input implementations. Due to use of platform APIs, can not normally be used from within a sandboxed process.
audio/这个目录的代码,由于使用了平台 API,通常不能在沙盒进程中使用。
AAudio
AAudio是Android8开始引入的新的音频播放API,文档:developer.android.com/ndk/guides/...
想捕获AAudio播放的PCM数据,自然要先了解它如何使用,我们直接看源码audio\android\aaudio_stream_wrapper.cc
:
c++
bool AAudioStreamWrapper::Open() {
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
CHECK(!is_closed_);
AAudioStreamBuilder* builder;
auto result = AAudio_createStreamBuilder(&builder);
if (AAUDIO_OK != result) {
return false;
}
// Parameters
AAudioStreamBuilder_setDirection(
builder, (stream_type_ == StreamType::kInput ? AAUDIO_DIRECTION_INPUT
: AAUDIO_DIRECTION_OUTPUT));
AAudioStreamBuilder_setSampleRate(builder, params_.sample_rate());
AAudioStreamBuilder_setChannelCount(builder, params_.channels());
AAudioStreamBuilder_setFormat(builder, AAUDIO_FORMAT_PCM_FLOAT);
AAudioStreamBuilder_setUsage(builder, usage_);
AAudioStreamBuilder_setPerformanceMode(builder, performance_mode_);
AAudioStreamBuilder_setFramesPerDataCallback(builder,
params_.frames_per_buffer());
if (stream_type_ == StreamType::kInput) {
// Set AAUDIO_INPUT_PRESET_VOICE_COMMUNICATION when we need echo
// cancellation. Otherwise, we use AAUDIO_INPUT_PRESET_CAMCORDER instead
// of the platform default of AAUDIO_INPUT_PRESET_VOICE_RECOGNITION, since
// it supposedly uses a wideband signal.
//
// We do not use AAUDIO_INPUT_PRESET_UNPROCESSED, even if
// `params_.effects() == AudioParameters::NO_EFFECTS` because the lack of
// automatic gain control results in quiet, sometimes silent, streams.
AAudioStreamBuilder_setInputPreset(
builder, params_.effects() & AudioParameters::ECHO_CANCELLER
? AAUDIO_INPUT_PRESET_VOICE_COMMUNICATION
: AAUDIO_INPUT_PRESET_CAMCORDER);
}
// Callbacks
AAudioStreamBuilder_setDataCallback(builder, OnAudioDataRequestedCallback,
destruction_helper_.get());
AAudioStreamBuilder_setErrorCallback(builder, OnStreamErrorCallback,
destruction_helper_.get());
result = AAudioStreamBuilder_openStream(builder, &aaudio_stream_);
AAudioStreamBuilder_delete(builder);
if (AAUDIO_OK != result) {
CHECK(!aaudio_stream_);
return false;
}
CHECK_EQ(AAUDIO_FORMAT_PCM_FLOAT, AAudioStream_getFormat(aaudio_stream_));
// After opening the stream, sets the effective buffer size to 3X the burst
// size to prevent glitching if the burst is small (e.g. < 128). On some
// devices you can get by with 1X or 2X, but 3X is safer.
int32_t frames_per_burst = AAudioStream_getFramesPerBurst(aaudio_stream_);
int32_t size_requested = frames_per_burst * (frames_per_burst < 128 ? 3 : 2);
AAudioStream_setBufferSizeInFrames(aaudio_stream_, size_requested);
TRACE_EVENT2("audio", "AAudioStreamWrapper::Open", "params",
params_.AsHumanReadableString(), "requested buffer size",
size_requested);
return true;
}
可以看到Chromium通过AAudioStreamBuilder_setDataCallback
的方式来填充PCM数据,我们编写对应的函数去hook它,然后hook传入它的回调参数,进而可以获得具体PCM数据。
c++
shadowhook_hook_sym_name(
Enc("libaaudio.so"),
Enc("AAudioStreamBuilder_setDataCallback"),
(void *) AAudioStreamBuilder_setDataCallback2,
(void **) &setDataCallback_original
);
c++
/* AAUDIO_API void AAudioStreamBuilder_setDataCallback(AAudioStreamBuilder* builder,
AAudioStream_dataCallback callback, void *userData) __INTRODUCED_IN(26); */
void AAudioStreamBuilder_setDataCallback2(
AAudioStreamBuilder* builder, AAudioStream_dataCallback callback, void *userData){
*(void **) &dataCallback_original = (void*)callback;
((setDataCallback_type)setDataCallback_original)(builder, dataCallback, userData);
}
c++
aaudio_data_callback_result_t dataCallback(AAudioStream *stream, void *userData, void *audioData, int32_t numFrames){
// ...可以通过stream对象获得音频参数,
// audioData就是PCM数据, 具体业务就可以在这里做了.
}
另外,一个网页中可能有多个音源,对应了多个AAudioStream对象,还需要自行建立一个表,关联AAudioStream对象和Callback回调。
OpenSLES
在引入AAudio之前,Android平台可通过OpenSLES播放声音。 Android对OpenSLES做了一些扩展,主要是formatType
新增了SL_ANDROID_DATAFORMAT_PCM_EX
,更多相关内容请看文档developer.android.com/ndk/guides/...
了解个大概之后,看Chromium源码audio\android\opensles_output.cc
如下:
c++
OpenSLESOutputStream::OpenSLESOutputStream(AudioManagerAndroid* manager,
const AudioParameters& params,
SLint32 stream_type)
: audio_manager_(manager),
stream_type_(stream_type),
callback_(nullptr),
player_(nullptr),
simple_buffer_queue_(nullptr),
audio_data_(),
active_buffer_index_(0),
started_(false),
muted_(false),
volume_(1.0),
samples_per_second_(params.sample_rate()),
sample_format_(kSampleFormatF32),
bytes_per_frame_(params.GetBytesPerFrame(sample_format_)),
buffer_size_bytes_(params.GetBytesPerBuffer(sample_format_)),
performance_mode_(SL_ANDROID_PERFORMANCE_NONE),
delay_calculator_(samples_per_second_) {
DVLOG(2) << "OpenSLESOutputStream::OpenSLESOutputStream("
<< "stream_type=" << stream_type << ")";
if (params.latency_tag() == AudioLatency::Type::kPlayback) {
performance_mode_ = SL_ANDROID_PERFORMANCE_POWER_SAVING;
} else if (params.latency_tag() == AudioLatency::Type::kRtc) {
performance_mode_ = SL_ANDROID_PERFORMANCE_LATENCY_EFFECTS;
}
audio_bus_ = AudioBus::Create(params);
float_format_.formatType = SL_ANDROID_DATAFORMAT_PCM_EX;
float_format_.numChannels = static_cast<SLuint32>(params.channels());
// Despite the name, this field is actually the sampling rate in millihertz.
float_format_.sampleRate = static_cast<SLuint32>(samples_per_second_ * 1000);
float_format_.bitsPerSample = float_format_.containerSize =
SampleFormatToBitsPerChannel(sample_format_);
float_format_.endianness = SL_BYTEORDER_LITTLEENDIAN;
float_format_.channelMask = ChannelCountToSLESChannelMask(params.channels());
float_format_.representation = SL_ANDROID_PCM_REPRESENTATION_FLOAT;
}
Hook OpenSLES的API比AAudio略微麻烦一些,这里通过首先HookslCreateEngine
拿到GetInterface
函数地址,然后分别hook它们。关键代码如下:
c++
shadowhook_hook_sym_name(
Enc("libOpenSLES.so"),
Enc("slCreateEngine"),
(void *) slCreateEngine2,
(void **) &slCreateEngineOriginal
);
c++
long slCreateEngine2(SLObjectItf *pEngine,
SLuint32 numOptions,
const SLEngineOption *pEngineOptions,
SLuint32 numInterfaces,
const SLInterfaceID *pInterfaceIds,
const SLboolean * pInterfaceRequired){
auto res = ((slCreateEngine_type)slCreateEngineOriginal)(pEngine, numOptions, pEngineOptions, numInterfaces, pInterfaceIds, pInterfaceRequired);
if(GetInterfaceOriginal == nullptr){
shadowhook_hook_func_addr((void*)((*(*pEngine))->GetInterface),
(void *) GetInterface2, (void **)&GetInterfaceOriginal);
shadowhook_hook_func_addr((void*)((*(*pEngine))->Destroy),
(void *) Destory2, (void **)&DestroyOriginal);
}
return res;
}
其中Destory
是业务逻辑上为了管理不同音源(区分来自不同播放器的PCM数据, 以及记录不同播放器的声道数之类的参数),这里忽略,直接看GetInterface
,在它内部HookEnqueue
和CreateAudioPlayer
:
c++
SLresult GetInterface2(
SLObjectItf self,
const SLInterfaceID iid,
void * pInterface){
auto res = ((GetInterface_Type)GetInterfaceOriginal)(self, iid, pInterface);
if(iid == SL_IID_BUFFERQUEUE){
if(EnqueueOriginal == nullptr){
SLBufferQueueItf pInterface2 = *(SLBufferQueueItf*)pInterface;
shadowhook_hook_sym_addr((void*)(*pInterface2)->Enqueue,
(void*)Enqueue, (void**)&EnqueueOriginal);
}
}else if(iid == SL_IID_ENGINE){
if(slCreateAudioPlayerOriginal == nullptr) {
SLEngineItf pInterface2 = *(SLEngineItf*)pInterface;
shadowhook_hook_sym_addr((void *) (*pInterface2)->CreateAudioPlayer,
(void *) slCreateAudioPlayer,
(void **) &slCreateAudioPlayerOriginal);
}
}
return res;
}
其中Enqueue
方法就可以拿到PCM数据了。
c++
SLresult Enqueue(
SLBufferQueueItf self,
/*const */void *pBuffer,
SLuint32 size){
// pBuffer就是PCM数据.
}
而PCM的详细参数需要在CreateAudioPlayer
记录。
c++
SLresult slCreateAudioPlayer(SLEngineItf self,
SLObjectItf * pPlayer,
SLDataSource *pAudioSrc,
SLDataSink *pAudioSnk,
SLuint32 numInterfaces,
const SLInterfaceID * pInterfaceIds,
const SLboolean * pInterfaceRequired){
// 管理音源的业务逻辑. 在这里把self、pPlayer和各种音频参数关联起来.
}
声音格式
PCM音频有各种参数,格式(Float、Int16等)、声道数、采样率,如果能证明某些参数是固定的,能大幅减少后续音频录制完成后的播放、导出Mp3的工作量。
好在从源码可以证明Chromium输出的PCM数据一定是Float格式。
参考上文贴出的AAudio的AAudioStreamWrapper::Open()
函数内AAudioStreamBuilder_setFormat(builder, AAUDIO_FORMAT_PCM_FLOAT);
,以及OpenSLESOutputStream
构造方法中的float_format_.representation = SL_ANDROID_PCM_REPRESENTATION_FLOAT
。
Android14以下切换至OpenSLES的方法
想要将AAudio输出音频切换为通过OpenSLES,主要有两个原因,一方面是我没有未使用AAudio的设备,不方便验证Hook OpenSLES的音频捕获效果。另一方面是发现某个网站录制音频有杂音,但切换至OpenSLES就没有这个问题,所以最初想强行把用户的音频输出API改为OpenSLES。
实验发现直接hook dlopen函数,让libaaudio加载失败即可。
c++
// shadowhook_hook_sym_name(
// Enc("libdl.so"),
// Enc("dlopen"),
// (void *) dlopen2,
// (void **) &dlopenOriginal
// );
c++
void* dlopen2(const char* filename, int flag){
if(filename != nullptr && strcmp(filename, Enc("libaaudio.so")) == 0){
return nullptr;
}
return ((dlopenType)dlopenOriginal)(filename, flag);
}
但在个人的OnePlus Android14上的设备上,此方法无效,打印进程的/proc/{pid}/maps
发现libaaudio.so
早早就被加载到内存中,拦晚了。暂时没再探索chromium
是咋知道设备上有libaaudio
的,尝试过hookdl_iterate_phdr
过滤掉libaaudio.so
也不行,感觉和动态库的加载有关,所以要尽快抽空系统性的学习linker方面知识了。
关于AAudio捕获的PCM有杂音,就像是后一段PCM和前一段PCM数据有小部分重合或者缺失一样,最终找到了的解决方案,还是Hook:
c++
void AAudioStreamBuilder_setPerformanceMode2(AAudioStreamBuilder* builder,
aaudio_performance_mode_t mode){
// !! 就是这里传入AAUDIO_PERFORMANCE_MODE_POWER_SAVING导致录到的音频卡顿..
mode = AAUDIO_PERFORMANCE_MODE_LOW_LATENCY;
((setPerformanceMode_type)setPerformanceModeOrigin)(builder, mode);
}
其他
关键代码已经记录完了,最后再放一段PCM添加WAV文件头形成WAV的代码,以后再做音频相关东西的时候直接来这里复制,而不必去搜了(以下忘了是AI给的还是Github搜代码片段搜到的了,但它原本是错的,参数也不太好用,自己改了改,测过了Float/Int16的单/双声道,用起来没问题)。
java
import java.io.IOException;
import java.io.OutputStream;
public class PcmToWavConverter {
/**
* 写入WAV文件头到输出流。
*/
public static void writeWavHeader(OutputStream out, boolean isFloat, int pcmDataLength, int sampleRate, int numChannels, int bitsPerSample) throws IOException {
// 计算每个声道块的字节数
int blockAlign = numChannels * (bitsPerSample / 8);
// 计算数据子块的大小(即PCM数据的大小)
int subChunk2Size = pcmDataLength;// 这里之前有问题, 别人写的是 (pcmDataLength * numChannels * (bitsPerSample / 8);)
// 计算整个文件的大小(不包括RIFF标识本身和大小字段)
int chunkSize = 36 + subChunk2Size;
// RIFF头
out.write(new byte[]{'R', 'I', 'F', 'F'}); // ChunkID - 标识为"RIFF",表示这是一个RIFF格式的文件
out.write(intToByteArray(chunkSize)); // ChunkSize - 这是整个文件的大小减去8字节(ChunkID和ChunkSize的大小)
out.write(new byte[]{'W', 'A', 'V', 'E'}); // Format - 格式标识,这里为"WAVE",表示文件是WAV格式
// fmt子块
out.write(new byte[]{'f', 'm', 't', ' '}); // Subchunk1ID - 子块1的标识为"fmt ",表示格式描述头
out.write(intToByteArray(16)); // Subchunk1Size - 子块1的大小,对于PCM格式,这里总是16字节
if(isFloat){
out.write(shortToByteArray((short) 3)); // AudioFormat - 音频格式,PCM格式为1
}else{
out.write(shortToByteArray((short) 1)); // AudioFormat - 音频格式,PCM格式为1
}
out.write(shortToByteArray((short) numChannels)); // NumChannels - 声道数,单声道为1,立体声为2
out.write(intToByteArray(sampleRate)); // SampleRate - 采样率,例如44100Hz
out.write(intToByteArray(sampleRate * blockAlign)); // ByteRate - 字节率 = 采样率 * 每个采样的字节数
out.write(shortToByteArray((short) blockAlign)); // BlockAlign - 数据块对齐单位,每次采样的字节数
out.write(shortToByteArray((short) bitsPerSample)); // BitsPerSample - 每个采样的比特数,例如16位
// data子块
out.write(new byte[]{'d', 'a', 't', 'a'}); // Subchunk2ID - 子块2的标识为"data",表示接下来是音频数据
out.write(intToByteArray(subChunk2Size)); // SubChunk2Size - 子块2的大小,即PCM数据的实际大小
}
private static byte[] intToByteArray(int value) {
return new byte[]{
(byte) (value & 0xff),
(byte) ((value >> 8) & 0xff),
(byte) ((value >> 16) & 0xff),
(byte) ((value >> 24) & 0xff),
};
}
private static byte[] shortToByteArray(short value) {
return new byte[]{
(byte) (value & 0xff),
(byte) ((value >> 8) & 0xff),
};
}
}
再记个关于lame的API,Float类型PCM数据,传入单声道数据时使用lame_encode_buffer_ieee_float
,pcm_l
和pcm_r
直接传同一个数组。
传入双声道,也就是数据中左声道、右声道每个采样交替连续在一起时,用lame_encode_buffer_interleaved_ieee_float
。