FFmpeg开发笔记（十三）：ffmpeg采集麦克风音频pcm重采样为aac录音为AAC文件

若该文为原创文章，转载请注明原文出处

本文章博客地址：https://hpzwl.blog.csdn.net/article/details/153191535

各位读者，知识无穷而人力有穷，要么改需求，要么找专业人士，要么自己研究

长沙红胖子Qt（长沙创微智科）博文大全：开发技术集合（包含Qt实用技术、树莓派、三维、OpenCV、OpenGL、ffmpeg、OSG、单片机、软硬结合等等）持续更新中...

FFmpeg和SDL开发专栏（点击传送门）

上一篇：《FFmpeg开发笔记（十二）：ffmpeg音频处理、采集麦克风音频录音为WAV》

下一篇：敬请期待...

前言

Ffmpeg音频重采样是关键核心技术，上一篇是pcm，基于上一篇的基础上进行pcm转aac，进而录制保存为aac文件。

关于音频概念

请查看上一篇《FFmpeg开发笔记（十二）：ffmpeg音频处理、采集麦克风音频录音为WAV》的"音频"章节。

建议此两篇交叉阅读加深概念原理的理解。

录音转码重采样编码流程

FFmpeg 通过 "输入设备→采集packet→将pcm的packet解码为pcm原始数据帧decframe→将decframe重采样为aac原始数据帧encframe→编码为aac的packet→封装文件" ，这块复杂一些。将音频源的信号转化为目标音频文件，具体流程如下：

步骤一：设备探测与选择

FFmpeg 首先通过操作系统的音频接口（如 Windows 的 DirectSound、Linux 的 ALSA、macOS 的 Core Audio）探测可用的音频输入设备，用户通过命令行参数指定要使用的设备（如-f dshow -i audio="麦克风阵列"）。

步骤二：音频采集与原始压缩数据包获取AVPacket

选定设备后，FFmpeg 按照指定的采样参数（采样率、位深、声道数）从设备中读取原始 PCM压缩数据包。这一步是 "无损" 的，数据直接来自硬件或虚拟设备的输出。

步骤三：解码处理AVPacket->DecAVFrame

因为需要重采样，需要解码为原始数据，那么原始数据是PCM，需压解码器将AVPacket 数据解码为pcm原始数据帧AVFrame，，FFmpeg 会根据用户设置的比特率（Bitrate）等参数，调整压缩强度，平衡音质与文件体积。

步骤四：封装为目标文件

编码后的音频数据流会被写入指定的容器格式（如 MP3、MP4）中，同时生成文件头、索引等元数据，最终形成可播放的音频文件。

注意：本篇没有重采样和压缩，直接存储的WAV+PCM。

测试录音文件，产生了噪声。

噪声来源

输入帧大小（1014）和输出帧大小（1024）不匹配，重采样器内部延迟和滤波导致需要跨帧数据，帧边界相位不连续导致咔哒/爆音。

原因

采集出来的音频帧是固定1014样本播放或后续处理要求1024样本，直接硬切会导致相位不连续，出现咔哒/爆音，AVAudioFifo是FFmpeg提供的线程安全音频队列缓冲结构，可以：流式写入（1014, 1014, 1014...）按需要的帧大小（1024）读取保证输出数据连续、无断裂。

解决方案

方案一：在采集端就保持1024，这里44100是23.2ms左右，实际四舍五入系统处理后是23秒，所以只有1014个点。
方案二：pcm1014重采样为aac1014，再将1014~1023的采样点设置为0（flip是0.0），那么pts时间戳对齐为每次增加1014（注意：不是1024），播放器使用pts对齐时间戳流，这样静音数据播放无声音，又通过pts进行了对齐，不会长时间出现误差。
方案三：使用流失数据，这里可以多次喂给swr重采样转换器但是不拿数据出来，等到满足1024再重采样输出一帧，也可以使用fifo，等到1024再重采样转换，还可以自行写缓冲区进行缓存，这三种处理方式的核心逻辑一样，但是使用的api和代码是不一样的。

方案流程

原始录音1.wav无噪声，录音1.aac有噪声，解决之后录音的2.aac没有噪声，很丝滑。

audioFifo缓冲区关键步骤

创建缓冲区

cpp 复制代码

pAudioFifo = av_audio_fifo_alloc(AV_SAMPLE_FMT_FLTP, 2, 8192);

放入缓冲区

cpp 复制代码

av_audio_fifo_write(pAudioFifo, (void **)pEncFrame->data, pEncFrame->nb_samples);

缓冲区大小

cpp 复制代码

av_audio_fifo_size(pAudioFifo)

判断缓冲区大小拿取缓冲区

cpp 复制代码

if(av_audio_fifo_size(pAudioFifo) > 1024)
{
  pEncFrame->nb_samples = 1024;
  av_frame_get_buffer(pEncFrame, 0);
  av_audio_fifo_read(pAudioFifo, (void **)pEncFrame, 1024);
}

释放缓冲区

cpp 复制代码

av_audio_fifo_free(pAudioFifo);
pAudioFifo = 0;

Demo（无缓冲处理）

cpp 复制代码

void FFmpegManager::testCaptureAudioForAac()
{
    // 命令行，查看本地可用的音频设备列表
    // linux  :  ffmpeg -list_devices true -f alsa -i dummy
    //
    // windows:  ffmpeg -list_devices true -f dshow -i dummy
    //           Windows 系统下通过 DirectShow 接口访问音频设备的场景。
    //  "麦克风 (Realtek(R) Audio)"
    //  "麦克风 (USB Audio Device)" 使用本设备
    //  "立体声混音 (Realtek(R) Audio)"
    //
    // windows录制音频测试： ffmpeg -f dshow -i audio="麦克风 (USB Audio Device)" output.wav
    //

    // ffmpeg相关变量预先定义与分配


    // 步骤一: 注册ffmpeg组件等
    av_register_all();                              // 初始化所有组件（只使用这个，找不到dshow）
    avdevice_register_all();                        // 显示注册所有设备
    avcodec_register_all();                         // 显式注册所有编解码器


    // 步骤二: 设置音频参数
    AVDictionary* pAVDictionary = nullptr;
    av_dict_set(&pAVDictionary, "sample_rate", "44100", 0);
    av_dict_set(&pAVDictionary, "channels", "2", 0);
#if 0
    av_dict_set(&pAVDictionary, "audio_buffer_size", QSTRING("%1").arg(1024 * 1000 * 1.0 / 44100).toUtf8().data(), 0);
#else
    av_dict_set(&pAVDictionary, "audio_buffer_size", "23", 0);
#endif

    // 步骤三: 打开麦克风设备
    QString deviceStr = QSTRING("audio=%1").arg(QSTRING("麦克风 (USB Audio Device)"));
    AVFormatContext* pInAVFormatContext = 0;
    LOG << deviceStr;
    int ret = avformat_open_input(&pInAVFormatContext,
                                  deviceStr.toUtf8().data(),
                                  av_find_input_format("dshow"),
                                  &pAVDictionary);
    if (ret < 0)
    {
        LOG << QSTRING("无法打开音频设备") << ret;
        return;
    }

    // 步骤四: 获取音频流序列号
    int audioStreamIndex = -1;
    for(int index = 0; index < pInAVFormatContext->nb_streams; index++)
    {
        if(pInAVFormatContext->streams[index]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
        {
            audioStreamIndex = index;

            break;
        }
    }
    if(audioStreamIndex == -1)
    {
        LOG << QSTRING("找不到音频流");
        return;
    }


    // 步骤五: 获取音频流的编码器参数
    AVCodecParameters* pInAVCodecParameters = 0;
    pInAVCodecParameters = pInAVFormatContext->streams[audioStreamIndex]->codecpar;


    // 步骤六: 查找解码器
    const AVCodec* pDecAVCodec = avcodec_find_decoder(pInAVCodecParameters->codec_id);
    if (!pDecAVCodec)
    {
        LOG << QSTRING("找不到解码器");
        return;
    }

    // 步骤七: 为解码器分配解码器上下文
    AVCodecContext* pDecAVCodecContext = avcodec_alloc_context3(pDecAVCodec);
    avcodec_parameters_to_context(pDecAVCodecContext, pInAVCodecParameters);
    ret = avcodec_open2(pDecAVCodecContext, pDecAVCodec, 0);
    if(ret < 0)
    {
        LOG << QSTRING("无法打开解码器");
        return;
    }

    // 步骤八: 创建输出上下文（ADTS 封装的 AAC）
    AVFormatContext* pEncAVFormatContext = nullptr;
#if 1
    QString fileName = "1.aac";
    avformat_alloc_output_context2(&pEncAVFormatContext, nullptr, "adts", fileName.toUtf8().data());
    // 不使用adts则录制成文件无法播放
//    avformat_alloc_output_context2(&pEncAVFormatContext, 0, 0, fileName.toUtf8().data());
#endif
#if 0
    // aac编码,mp4的音频是aac编码,所以转码为AAC是可以进行mp4文件写入的
    QString fileName = "1.mp4";
    avformat_alloc_output_context2(&pEncAVFormatContext, nullptr, "mp4", fileName.toUtf8().data());
#endif
    if(!pEncAVFormatContext)
    {
        LOG << QSTRING("无法创建输出上下文");
        return;
    }

    // 步骤九: 创建AAC编码器
    const AVCodec* pEncAVCodec = avcodec_find_encoder(AV_CODEC_ID_AAC);
    if (!pEncAVCodec)
    {
        LOG << QSTRING("找不到AAC编码器");
        return;
    }

    // 步骤十: 创建编码输出流 (使用格式输出上下文, 编码器)
    AVStream* pEncAVStream = avformat_new_stream(pEncAVFormatContext, pEncAVCodec);
    if (!pEncAVStream)
    {
        LOG << QSTRING("无法创建输出流");
        return;
    }

    // 步骤十一: 创建编码器编码上下文
    AVCodecContext* pEncAVCodecContext = avcodec_alloc_context3(pEncAVCodec);
    pEncAVCodecContext->sample_fmt     = AV_SAMPLE_FMT_FLTP;
    pEncAVCodecContext->sample_rate    = 44100;
    pEncAVCodecContext->channel_layout = av_get_default_channel_layout(2);
    pEncAVCodecContext->channels       = 2;
    pEncAVCodecContext->bit_rate       = 192000;
    if(pEncAVFormatContext->oformat->flags & AVFMT_GLOBALHEADER)
    {
        pEncAVCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
    }

    // 步骤十二: 打开编码器编码
    if (avcodec_open2(pEncAVCodecContext, pEncAVCodec, nullptr) < 0)
    {
        LOG << QSTRING("无法打开 AAC 编码器");
        return;
    }

    // 步骤十三: 将编码器上下文弄到编码
    avcodec_parameters_from_context(pEncAVStream->codecpar, pEncAVCodecContext);

    // 步骤十四: 创建打开输出文件
    if (!(pEncAVFormatContext->oformat->flags & AVFMT_NOFILE))
    {
        if (avio_open(&pEncAVFormatContext->pb, fileName.toUtf8().data(), AVIO_FLAG_WRITE) < 0)
        {
            LOG << QSTRING("无法打开输出文件");
            return;
        }
    }

    // 步骤十五: 文件写入头信息
    avformat_write_header(pEncAVFormatContext, 0);

#if 1
    // 打印音频信息
    LOG << QSTRING("音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
           .arg(pDecAVCodecContext->sample_rate)
           .arg(pDecAVCodecContext->channels)
           .arg(av_get_sample_fmt_name(pDecAVCodecContext->sample_fmt));
    LOG << "convert to";
    LOG << QSTRING("音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
           .arg(pEncAVCodecContext->sample_rate)
           .arg(pEncAVCodecContext->channels)
           .arg(av_get_sample_fmt_name(pEncAVCodecContext->sample_fmt));
#endif
    SwrContext* pSwrContext = 0;
    AVAudioFifo *pAudioFifo = 0;

    AVPacket pkt;
    AVFrame* pDecFrame = av_frame_alloc();
    AVFrame* pEncFrame = av_frame_alloc();

    pEncFrame->format = pEncAVCodecContext->sample_fmt;
    pEncFrame->channel_layout = pEncAVCodecContext->channel_layout;
    pEncFrame->sample_rate = pEncAVCodecContext->sample_rate;
    pEncFrame->nb_samples = 1024;
    ret = av_frame_get_buffer(pEncFrame, 0);
    if(ret < 0)
    {
        LOG << QSTRING("无法分配输出帧缓冲区");
        return;
    }
    LOG << QSTRING("正在录制...");

    int index = 0;
    while (index++ < 300)
    {
        ret = av_read_frame(pInAVFormatContext, &pkt);
        if (ret < 0)
        {
            if (ret == AVERROR_EOF)
            {
                LOG;
                break;
            }
            LOG << QSTRING("读取数据包失败") << ret;
            av_packet_unref(&pkt);
            continue;
        }

        if (pkt.stream_index == audioStreamIndex)
        {
            // 解码
            ret = avcodec_send_packet(pDecAVCodecContext, &pkt);
            if (ret < 0)
            {
                LOG << QSTRING("发送数据包到解码器失败") << ret;
                av_packet_unref(&pkt);
                continue;
            }

            while (ret >= 0)
            {
                ret = avcodec_receive_frame(pDecAVCodecContext, pDecFrame);

                if(!pSwrContext)
                {
                    pSwrContext = swr_alloc_set_opts(pSwrContext,
                                                     av_get_default_channel_layout(2),
                                                     pEncAVCodecContext->sample_fmt,
                                                     pEncAVCodecContext->sample_rate,
                                                     av_get_default_channel_layout(2),
                                                     (AVSampleFormat)pDecFrame->format,
                                                     pDecFrame->sample_rate,
                                                     0,
                                                     0);
                    if (!pSwrContext || swr_init(pSwrContext) < 0)
                    {
                        LOG << QSTRING("无法初始化重采样上下文");
                        return;
                    }
                    pAudioFifo = av_audio_fifo_alloc(AV_SAMPLE_FMT_S16, 2, 8192);
                }

                if (!pDecFrame->nb_samples || !pDecFrame->data[0])
                {
                    LOG << QSTRING("输入帧无数据");
                    continue;
                }
#if 1
                // 打印音频信息
                LOG << QSTRING("转换器输入转换源 音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
                       .arg(pDecAVCodecContext->sample_rate)
                       .arg(pDecAVCodecContext->channels)
                       .arg(av_get_sample_fmt_name(pDecAVCodecContext->sample_fmt));
                LOG << "convert to";
                LOG << QSTRING("转换器输出转换目标 音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
                       .arg(pEncAVCodecContext->sample_rate)
                       .arg(pEncAVCodecContext->channels)
                       .arg(av_get_sample_fmt_name(pEncAVCodecContext->sample_fmt));
                LOG << QSTRING("打开编码器获取编码器              音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
                       .arg(pDecAVCodecContext->sample_rate)
                       .arg(pDecAVCodecContext->channels)
                       .arg(av_get_sample_fmt_name(pDecAVCodecContext->sample_fmt));
                LOG << QSTRING("采集数据解码为PCM帧 解码出来的帧格式 音频信息 采样率: %1Hz  声道数: %2  采样格式: %3")
                       .arg(pDecFrame->sample_rate)
                       .arg(pDecFrame->channels)
                       .arg(av_get_sample_fmt_name((AVSampleFormat)pDecFrame->format));
#endif

                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                {
                    LOG;
                    break;
                }
                if (ret < 0)
                {
                    LOG << QSTRING("解码错误") << ret;
                    break;
                }
#if 1
                LOG << QSTRING("解码帧")
                    << "nb_samples = " << pDecFrame->nb_samples
                    << ", format =" << pDecFrame->format
                    << ", channels =" << pDecFrame->channels;
#endif
                // 重采样
                pDecFrame->channel_layout = av_get_default_channel_layout(2);

                ret = swr_convert_frame(pSwrContext, pEncFrame, pDecFrame);
                if(ret < 0)
                {
                    LOG << QSTRING("重采样失败") << ret;
                    continue;
                }

#if 1
                LOG << QSTRING("重采样后: nb_samples =")
                    << pEncFrame->nb_samples
                    << ", format =" << pEncFrame->format
                    << ", channels =" << pEncFrame->channels;
#endif


                static int64_t pts = 0;
                pEncFrame->pts = pts;
                pts += pDecFrame->nb_samples;

                ret = avcodec_send_frame(pEncAVCodecContext, pEncFrame);
                if (ret < 0) {
                    LOG << QSTRING("发送帧到编码器失败") << ret;
                    break;
                }

                while (ret >= 0)
                {
                    LOG << ret;
                    ret = avcodec_receive_packet(pEncAVCodecContext, &pkt);
                    if (ret == AVERROR(EAGAIN))
                    {
                        LOG << QSTRING("编码器繁忙，需要先收packet");
                    }else if(ret < 0)
                    {
                        LOG << QSTRING("发送帧到编码器失败") << ret;
                        break;
                    }else {
                        LOG << QSTRING("成功发送一帧到编码器");
                    }

                    pkt.stream_index = pEncAVStream->index;
                    av_interleaved_write_frame(pEncAVFormatContext, &pkt);
                    av_packet_unref(&pkt);
                }
            }
        }

        av_packet_unref(&pkt);
    }

    avcodec_send_frame(pEncAVCodecContext, 0);
    while (true)
    {
        ret = avcodec_receive_packet(pEncAVCodecContext, &pkt);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
        if (ret < 0) break;
        av_interleaved_write_frame(pEncAVFormatContext, &pkt);
        av_packet_unref(&pkt);
    }

    // 10. 写文件尾
    av_write_trailer(pEncAVFormatContext);

    // 11. 释放资源
    swr_free(&pSwrContext);
    av_frame_free(&pDecFrame);
    av_frame_free(&pEncFrame);
    avcodec_free_context(&pDecAVCodecContext);
    avcodec_free_context(&pEncAVCodecContext);
    avformat_close_input(&pInAVFormatContext);
    avio_closep(&pEncAVFormatContext->pb);
    avformat_free_context(pEncAVFormatContext);

    LOG << QSTRING("录制完成: ") << fileName;
}

Demo（含缓冲区）

每次采样1014个点，入缓冲区，当缓冲区大于1024点则提取出来进行编码。

涉及部分关键点，此处不直接提供代码。

工程模板v1.7.0

入坑

入坑一：无法初始化swsContext上下文

问题

原因

这个函数是单独的，注册ffmpeg组件之后，就应该与其他代码无关，进一步测试，单独测试可以，如下图：

打印出来看看：

都是0，发现应该是channels：

解决

入坑二：录制时候不写入aac文件

问题

程序运行正常，但是aac文件未写入。

尝试一

经过逐步排查，发现是重采样失败：

采集packet→解码成pcm帧→重采样→送给编码器→得到AAC packet，由于重采样失败，那么packet里面就是没有数据，送给编码器编码就是空，导致无法接受到编码包。

用解码后的帧信息来看是否重采样参数写错了，如下图，是没有问题的：

打印nb_samples，发现输入的是22050，所以采样点是22050，

而目标是1024，经过深入研究，发现就是本机的麦克风他采样率是44100Hz，但是他是1s两帧，而转码目标是1024，采样率也是44100Hz，所以这是一大帧0.5s的数据转成1024/44100大于23ms的数据，这个转换不支持。

解决思路如下：

解决思路1：进行拆封大帧循环拆小到1024再转aac的1024（因为都是44100Hz）。
解决思路2 ：设置麦克风采样率帧率为1024个点，也就是23ms左右一帧，这个要看硬件麦克风是否支持，我们强制设置采样缓存去测试就好了。
修改如下，

然后：

但是还是采样失败。

尝试二

调整代码，使用第三方代码继续研究，也是无法进行重采样。

喂给重采样器的音频帧参数（采样率、声道布局、采样格式）和初始化 SwrContext 时传的参数不一样，重采样器拒绝处理。

实时打印检测输入参数变化：

深挖下channel_layout：

可以得出，channel_layout需要用户自己设置了，这里解出来的AVFrame数据帧的channel_layout都为0，没设置的，所以输入错误，以上就是解码出来送入重采样之前。

回到之前的尝试一的代码，按照这个原因进行代码修改，重采样也可以成功：

原因

解码一帧之后的AVFrame，将其传递给重采样函数，此时AVFrame里面的channel_layout是0，channel是2，而channel_layout需要用户设置，这里设置一下就好了，至于各地方的代码都没设置，笔者不太清楚，本机解码后是没有channel_layout的（0没有枚举，所以失败）。

解决

每次传送采样之前，进行输入帧的channel_layout赋值即可。

入坑三：重采样之后杂音问题

问题

44100Hz，立体声2通道，采样aac为后44100Hz，立体声2通道对齐1024采样点，播放wav（pcm）时，没有杂音，播放aac（aac）时，有杂音。

原因

实际输入是1014样本/帧，输出是1024样本/帧，两者不成比例，所以重采样器在拼接时会产生相位不连续→听上去就是"杂音/咔哒声"。

解决

查看"噪声来源"章节，实际通过fifo解决，由于涉及核心技术，文章中此处代码不提供解决的Demo。

上一篇：《FFmpeg开发笔记（十二）：ffmpeg音频处理、采集麦克风音频录音为WAV》

下一篇：敬请期待...

本文章博客地址：https://hpzwl.blog.csdn.net/article/details/153191535