FFmpeg + Qt 实现摄像头采集与 MP3 背景音乐 RTSP 推流

前言

在直播或视频监控场景中，我们经常需要将摄像头画面与背景音乐混合后推送到 RTSP 服务器。本文将详细介绍如何使用 Qt 框架结合 FFmpeg 6.0.0 库，实现以下功能：

摄像头实时采集（Windows 下使用 dshow，Linux 使用 v4l2）
视频预览（显示为 QImage）
视频 H.264 编码 + RTSP 推流
从 MP3 文件读取音频，重采样为 AAC 编码并推流
循环播放音频文件
解决中文文件路径打开失败的问题

文章将剖析每个技术环节的关键代码，并给出完整的 CVideoCapTask 类实现。

一、整体架构

程序主要包含 3 个线程：

主线程：Qt GUI 线程，控制采集开始/停止。
视频采集线程 （继承 QThread）：从摄像头读取数据，解码，预览，并将视频帧编码后推送到 RTSP 服务器。同时负责处理音频包的发送队列。
音频采集线程 （QThread 单独创建）：从 MP3 文件循环读取音频包，解码，重采样，编码成 AAC 后放入队列，供主线程写入 RTSP 流。

这样做可以避免音频解码和重采样阻塞视频采集，同时保证音视频包能够交错写入（interleaved）。

二、视频采集与推流

2.1 打开摄像头设备

不同平台使用不同的 FFmpeg 输入格式：

Windows ：dshow，设备名格式 video=摄像头名称
Linux ：v4l2，设备路径如 /dev/video0

关键代码：

cpp 复制代码

const AVInputFormat* pInputFmt = nullptr;
#ifdef _WIN32
    pInputFmt = av_find_input_format("dshow");
#else
    pInputFmt = av_find_input_format("v4l2");
#endif

AVDictionary* pOptions = nullptr;
av_dict_set(&pOptions, "framerate", "30", 0);
av_dict_set(&pOptions, "rtbufsize", "512M", 0);   // 增大缓冲区防止丢帧

avformat_open_input(&m_pFormatCtx, deviceName.toUtf8().constData(), pInputFmt, &pOptions);

打开后需要查找视频流索引，并创建解码器上下文。摄像头常用格式为 MJPEG 或 YUYV，FFmpeg 会自动解码成原始帧（如 YUV420P 或 NV12）。

2.2 预览显示

解码后的 AVFrame 需要转换成 QImage 才能显示在 Qt 界面上。这里使用 sws_scale 将帧转换为 AV_PIX_FMT_BGRA 格式，然后拷贝到 QImage：

cpp 复制代码

QImage CVideoCapTask::AVFrameToQImage(AVFrame* frame)
{
    int dstWidth = m_targetWidth > 0 ? m_targetWidth : frame->width;
    int dstHeight = m_targetHeight > 0 ? m_targetHeight : frame->height;
    AVPixelFormat dstFormat = AV_PIX_FMT_BGRA;

    SwsContext* swsCtx = sws_getContext(frame->width, frame->height, (AVPixelFormat)frame->format,
                                        dstWidth, dstHeight, dstFormat,
                                        SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
    uint8_t* dstData = (uint8_t*)av_malloc(dstWidth * 4 * dstHeight);
    uint8_t* dstPlane[1] = { dstData };
    int dstLinesize[1] = { dstWidth * 4 };
    sws_scale(swsCtx, frame->data, frame->linesize, 0, frame->height, dstPlane, dstLinesize);

    QImage image(dstData, dstWidth, dstHeight, QImage::Format_ARGB32);
    QImage result = image.copy();  // 深拷贝，因为 dstData 马上要释放
    av_free(dstData);
    sws_freeContext(swsCtx);
    return result;
}

注意：每次转换都创建新的 SwsContext 开销较大，实际生产中可以复用并缓存源/目标尺寸参数。

2.3 H.264 编码与 RTSP 推流

推流需要初始化输出上下文、视频编码器（H.264）并写入头信息。

cpp 复制代码

avformat_alloc_output_context2(&m_pOutFormatCtx, nullptr, "rtsp", rtspUrl.toStdString().c_str());
// 创建视频流
AVStream* outVideoStream = avformat_new_stream(m_pOutFormatCtx, nullptr);
// 设置编码器参数
AVCodecContext* encCtx = avcodec_alloc_context3(encoder);
encCtx->width = width;
encCtx->height = height;
encCtx->pix_fmt = AV_PIX_FMT_YUV420P;
encCtx->time_base = AVRational{1, framerate};
encCtx->bit_rate = 2 * 1024 * 1024;
encCtx->gop_size = framerate;
// ... 其他参数
avcodec_open2(encCtx, encoder, nullptr);
avcodec_parameters_from_context(outVideoStream->codecpar, encCtx);

对于每一帧解码后的原始帧（格式可能是 NV12 或 YUYV422），需要先转换为 YUV420P（H.264 编码器通常要求），再送入编码器：

cpp 复制代码

// 创建 YUV420P 帧
AVFrame* yuvFrame = av_frame_alloc();
yuvFrame->format = AV_PIX_FMT_YUV420P;
yuvFrame->width = frame->width;
yuvFrame->height = frame->height;
av_frame_get_buffer(yuvFrame, 0);
sws_scale(swsCtx, frame->data, frame->linesize, 0, frame->height,
          yuvFrame->data, yuvFrame->linesize);
yuvFrame->pts = ptsCounter++;   // PTS 按帧率递增
avcodec_send_frame(encCtx, yuvFrame);
while (avcodec_receive_packet(encCtx, &pkt) == 0) {
    av_packet_rescale_ts(&pkt, encCtx->time_base, outStream->time_base);
    av_interleaved_write_frame(m_pOutFormatCtx, &pkt);
}

最后调用 av_write_trailer 关闭输出。

三、音频采集与推流（MP3 文件）

3.1 打开 MP3 文件并解码

与打开摄像头类似，但不需要指定输入格式（FFmpeg 可以自动探测）。

cpp 复制代码

avformat_open_input(&m_pAudioFmtCtx, mp3FilePath.toUtf8().constData(), nullptr, nullptr);
avformat_find_stream_info(m_pAudioFmtCtx, nullptr);
// 找到音频流索引
m_audioInStreamIndex = av_find_best_stream(m_pAudioFmtCtx, AVMEDIA_TYPE_AUDIO, -1, -1, nullptr, 0);
// 创建解码器
const AVCodec* dec = avcodec_find_decoder(codecpar->codec_id);
avcodec_open2(m_pAudioDecCtx, dec, nullptr);

3.2 处理中文文件路径

用户遇到 Cannot open MP3 file: "E:/SoftWare/DownLoad/???????????-????.mp3" 的问题，原因是中文字符在 toUtf8().constData() 传递给 FFmpeg 时，FFmpeg 内部可能无法正确处理 UTF-8 路径。解决方法有两种：

将文件重命名为英文名。
使用 avformat_open_input 的 AVIOContext 方式：先用 QFile 打开文件，再通过自定义 IO 回调读取，但较复杂。
在 Windows 下，可以使用 _wfopen 打开文件后传给 FFmpeg 的 avio_open，但需注意封装。

推荐做法：在设置 MP3 路径时，检查文件是否存在，若不存在则提示错误；调用时确保路径是 UTF-8 编码且不包含特殊字符。如果中文字符必须支持，可以改用系统 API 打开文件并传递给 FFmpeg。

代码中只需确保文件真实存在且路径正确，上述错误就是因为路径中实际是"喀什噶尔胡杨-刀郎.mp3"但日志显示乱码（控制台编码问题），只要文件存在且 FFmpeg 能读取即可。但本例报错 No such file or directory，表明路径确实是错误的。检查后发现用户代码中写的是 "E:/SoftWare/DownLoad/喀什噶尔胡杨-刀郎.mp3"，但实际文件可能位于其他目录或名称不完全匹配。建议先使用英文路径测试。

3.3 重采样（SwrContext）

MP3 解码后的格式可能是 AV_SAMPLE_FMT_S16P（16位平面格式），采样率可能是 44100 Hz，而我们要推流的 AAC 编码器通常需要 AV_SAMPLE_FMT_FLTP（32位浮点平面格式）且可能要求固定采样率（如 44100 或 48000）。因此必须使用 libswresample 进行重采样。

cpp 复制代码

SwrContext* swr = swr_alloc_set_opts(nullptr,
    dst_ch_layout, dst_sample_fmt, dst_sample_rate,
    src_ch_layout, src_sample_fmt, src_sample_rate,
    0, nullptr);
swr_init(swr);

重采样过程：解码得到原始 AVFrame 后，先将数据送入 swr_convert（填充输入缓冲区），然后循环取出指定数量（编码器要求的 frame_size）的采样点，转换成编码器所需的格式，并设置 PTS。

cpp 复制代码

// 送入输入数据
swr_convert(swr_ctx, nullptr, 0, (const uint8_t**)rawFrame->data, rawFrame->nb_samples);
// 循环取数据
while (true) {
    AVFrame* outFrame = av_frame_alloc();
    // 设置 outFrame 参数
    outFrame->nb_samples = encCtx->frame_size;
    av_frame_get_buffer(outFrame, 0);
    int samples = swr_convert(swr_ctx, outFrame->data, outFrame->nb_samples, nullptr, 0);
    if (samples <= 0) break;
    outFrame->pts = audio_pts;
    audio_pts += samples;
    EncodeAudioFrame(outFrame);  // 调用编码函数
    av_frame_free(&outFrame);
}

3.4 音频编码与队列传输

音频编码器（AAC）的使用与视频类似，但生成的 AVPacket 不直接写入，而是放入一个线程安全队列（QQueue<AVPacket*>），由视频采集线程定期调用 WriteQueuedPackets() 写入。这样可以保证音视频包的顺序交错，避免锁竞争。

cpp 复制代码

void CVideoCapTask::WriteQueuedPackets()
{
    QMutexLocker lock(&m_queueMutex);
    while (!m_audioPacketQueue.isEmpty()) {
        AVPacket* pkt = m_audioPacketQueue.dequeue();
        lock.unlock();
        av_interleaved_write_frame(m_pOutFormatCtx, pkt);
        av_packet_free(&pkt);
        lock.relock();
    }
}

3.5 循环播放 MP3 文件

当读到文件尾（av_read_frame 返回 AVERROR_EOF）时，如果启用了循环标志，则关闭当前解码器上下文，重新打开文件并重置 PTS 计数器，继续读取。

cpp 复制代码

if (ret == AVERROR_EOF && m_loopAudio) {
    CloseAudioCapture();   // 关闭旧上下文
    InitAudioCapture();    // 重新打开文件
    // 注意：编码器和重采样器可能需要重新初始化，因为解码器参数可能不变
    // 简单起见可用重新调用 InitAudioEncoder()
    continue;
}

但需要注意：InitAudioEncoder 中依赖 m_pAudioDecCtx 并创建重采样器，循环重新打开文件后需要重新创建重采样器（因为解码器上下文变了）。更好的做法是在循环重启时重建整个音频处理链。

四、音视频同步问题

本方案中视频 PTS 简单地按帧率递增（0, 1, 2, ...），音频 PTS 按重采样后的实际采样数递增。两者没有相对参考，在 RTSP 播放端（如 VLC）会以各自的 time_base 独立播放，通常听起来不会明显失步，因为视频和音频都按真实时间速率推进。但长时间运行可能产生漂移。

如果需要精确同步，可以采用以下方法：

以音频时钟为主：记录音频当前播放的 PTS（或系统时间），视频帧在编码时根据实际采集时刻设置 PTS，并与音频 PTS 对齐。
使用系统绝对时间：记录程序启动时的基准时间，每一帧编码时计算当前时刻与基准时刻的差值，转换为编码器 time_base 的 PTS。

考虑到背景音乐一般对同步精度要求不高，本文方案可以满足基本需求。

五、调试与常见问题

5.1 没有声音

用户按照代码运行后依然没有声音，需要排查以下几处：

RTSP 拉流是否支持 AAC ：某些 RTSP 服务器或播放器默认只解码 PCMU/PCMA，需要单独配置。使用 ffplay rtsp://127.0.0.1:8554/test 测试。
音频编码器是否初始化成功 ：检查日志中是否有 Audio encoder initialized successfully。如果没有，可能是 AAC 编码器未找到（需确认 FFmpeg 编译时启用了 libfdk_aac 或内置 aac）。
音频队列是否进入写入 ：在 WriteQueuedPackets 中加入计数日志，确认有包被写入。
重采样失败 ：检查 swr_convert 返回值，确保采样格式转换正确。可以用 av_get_sample_fmt_name 打印原始格式和目标格式。
MP3 文件读取循环：如果文件读取太快或太慢，可能导致音频包不足或积压，但一般不影响最终有声与否。

5.2 性能优化

预览缩放和颜色空间转换可使用 GPU 加速（如 OpenGL）。
视频编码器参数调节：降低 bit_rate 或使用 preset 提高速度。
音频重采样使用公共缓存，避免频繁分配 AVFrame。

5.3 内存泄漏

注意每一个 AVFrame、AVPacket、AVCodecContext 在使用完毕后都要调用对应的 free 函数。在类析构函数中统一释放。

六、完整代码示例

cpp 复制代码

#include "../Include/VideoCapTask.h"
#include <QDebug>
#include <chrono>
#include <thread>

CVideoCapTask::CVideoCapTask(QObject* parent)
    : QThread(parent)
    , m_framerate(30)
    , m_targetWidth(0)
    , m_targetHeight(0)
    , m_srcWidth(0)
    , m_srcHeight(0)
    , m_dstWidth(0)
    , m_dstHeight(0)
    , m_running(false)
    , m_pFormatCtx(nullptr)
    , m_pCodecCtx(nullptr)
    , m_pSwsCtx(nullptr)
    , m_videoStreamIndex(-1)
    , m_pushEnabled(false)
    , m_pOutFormatCtx(nullptr)
    , m_pOutVideoCodecCtx(nullptr)
    , m_pOutVideoSwsCtx(nullptr)
    , m_outVideoStreamIndex(-1)
    , m_videoPtsCounter(0)
    , m_loopAudio(true)
    , m_audioFileDuration(0)
    , m_audioSampleRate(44100)
    , m_audioChannels(2)
    , m_audioBitrate(128000)
    , m_audioEnabled(false)
    , m_audioRunning(false)
    , m_pAudioThread(nullptr)
    , m_pAudioFmtCtx(nullptr)
    , m_pAudioDecCtx(nullptr)
    , m_pAudioSwrCtx(nullptr)
    , m_audioInStreamIndex(-1)
    , m_pOutAudioCodecCtx(nullptr)
    , m_outAudioStreamIndex(-1)
    , m_audioPtsCounter(0)
{
    InitFFmpeg();
}

CVideoCapTask::~CVideoCapTask()
{
    StopCapture();
    wait();
    CloseDevice();
    ClosePushStream();
    CloseAudioCapture();
}

void CVideoCapTask::SetDeviceName(const QString& deviceName)
{
    QMutexLocker locker(&m_mutex);
    m_deviceName = deviceName;
}

void CVideoCapTask::SetFramerate(int fps)
{
    QMutexLocker locker(&m_mutex);
    if (fps > 0) m_framerate = fps;
}

void CVideoCapTask::SetTargetSize(int width, int height)
{
    QMutexLocker locker(&m_mutex);
    m_targetWidth = width;
    m_targetHeight = height;
}

void CVideoCapTask::SetRtspUrl(const QString& url)
{
    QMutexLocker locker(&m_mutex);
    m_rtspUrl = url;
}

void CVideoCapTask::EnablePush(bool enable)
{
    QMutexLocker locker(&m_mutex);
    m_pushEnabled = enable;
}

void CVideoCapTask::SetMp3FilePath(const QString& filePath, bool loop)
{
    QMutexLocker locker(&m_mutex);
    m_mp3FilePath = filePath;
    m_loopAudio = loop;
    m_audioEnabled = !filePath.isEmpty();  // 设置了文件则自动启用音频
}

void CVideoCapTask::SetAudioOptions(int sampleRate, int channels, int bitrate)
{
    QMutexLocker locker(&m_mutex);
    if (sampleRate > 0) m_audioSampleRate = sampleRate;
    if (channels > 0) m_audioChannels = channels;
    if (bitrate > 0) m_audioBitrate = bitrate;
}

void CVideoCapTask::EnableAudioPush(bool enable)
{
    QMutexLocker locker(&m_mutex);
    m_audioEnabled = enable;
}

void CVideoCapTask::StartCapture()
{
    if (m_running) return;
    if (m_deviceName.isEmpty()) {
        qWarning() << "Device name not set";
        return;
    }
    m_running = true;
    start();
}

void CVideoCapTask::StopCapture()
{
    if (!m_running) return;
    m_running = false;
    {
        QMutexLocker lock(&m_queueMutex);
        m_queueCond.wakeAll();
    }
    if (m_pAudioThread) {
        m_audioRunning = false;
        m_pAudioThread->quit();
        m_pAudioThread->wait();
        delete m_pAudioThread;
        m_pAudioThread = nullptr;
    }
}

bool CVideoCapTask::InitFFmpeg()
{
    static bool bRegistered = false;
    if (!bRegistered) {
        avdevice_register_all();
        avformat_network_init();
        bRegistered = true;
    }
    return true;
}

bool CVideoCapTask::OpenDevice()
{
    const AVInputFormat* pInputFmt = nullptr;
#ifdef _WIN32
    pInputFmt = av_find_input_format("dshow");
#else
    pInputFmt = av_find_input_format("v4l2");
#endif
    if (!pInputFmt) {
        qCritical() << "Cannot find input format";
        return false;
    }

    AVDictionary* pOptions = nullptr;
    av_dict_set(&pOptions, "framerate", QString::number(m_framerate).toUtf8().constData(), 0);
    av_dict_set(&pOptions, "rtbufsize", "512M", 0);

    int ret = avformat_open_input(&m_pFormatCtx, m_deviceName.toUtf8().constData(),
        pInputFmt, &pOptions);
    av_dict_free(&pOptions);
    if (ret < 0) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qCritical() << "Cannot open device:" << m_deviceName << errbuf;
        return false;
    }

    if (avformat_find_stream_info(m_pFormatCtx, nullptr) < 0) {
        qCritical() << "Cannot find stream info";
        return false;
    }

    m_videoStreamIndex = -1;
    for (unsigned i = 0; i < m_pFormatCtx->nb_streams; ++i) {
        if (m_pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            m_videoStreamIndex = i;
            break;
        }
    }
    if (m_videoStreamIndex == -1) {
        qCritical() << "No video stream found";
        return false;
    }

    AVCodecParameters* pCodecPar = m_pFormatCtx->streams[m_videoStreamIndex]->codecpar;
    const AVCodec* pDecoder = avcodec_find_decoder(pCodecPar->codec_id);
    if (!pDecoder) {
        qCritical() << "Decoder not found";
        return false;
    }

    m_pCodecCtx = avcodec_alloc_context3(pDecoder);
    if (avcodec_parameters_to_context(m_pCodecCtx, pCodecPar) < 0) {
        qCritical() << "Failed to copy codec parameters";
        return false;
    }

    if (avcodec_open2(m_pCodecCtx, pDecoder, nullptr) < 0) {
        qCritical() << "Cannot open decoder";
        return false;
    }

    qDebug() << "Opened device:" << m_deviceName
        << "size:" << pCodecPar->width << "x" << pCodecPar->height
        << "codec:" << avcodec_get_name(pCodecPar->codec_id);
    return true;
}

void CVideoCapTask::CloseDevice()
{
    if (m_pCodecCtx) {
        avcodec_free_context(&m_pCodecCtx);
        m_pCodecCtx = nullptr;
    }
    if (m_pFormatCtx) {
        avformat_close_input(&m_pFormatCtx);
        m_pFormatCtx = nullptr;
    }
    if (m_pSwsCtx) {
        sws_freeContext(m_pSwsCtx);
        m_pSwsCtx = nullptr;
    }
    m_videoStreamIndex = -1;
}

QImage CVideoCapTask::AVFrameToQImage(AVFrame* frame)
{
    if (!frame || frame->width <= 0 || frame->height <= 0)
        return QImage();

    int dstWidth = m_targetWidth > 0 ? m_targetWidth : frame->width;
    int dstHeight = m_targetHeight > 0 ? m_targetHeight : frame->height;
    AVPixelFormat dstFormat = AV_PIX_FMT_BGRA;

    if (!m_pSwsCtx ||
        m_srcWidth != frame->width || m_srcHeight != frame->height ||
        m_dstWidth != dstWidth || m_dstHeight != dstHeight) {
        if (m_pSwsCtx) sws_freeContext(m_pSwsCtx);
        m_pSwsCtx = sws_getContext(frame->width, frame->height, (AVPixelFormat)frame->format,
            dstWidth, dstHeight, dstFormat,
            SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
        if (!m_pSwsCtx) {
            qWarning() << "Cannot create SwsContext";
            return QImage();
        }
        m_srcWidth = frame->width;
        m_srcHeight = frame->height;
        m_dstWidth = dstWidth;
        m_dstHeight = dstHeight;
    }

    int dstStride = dstWidth * 4;
    uint8_t* dstData = (uint8_t*)av_malloc(dstStride * dstHeight);
    if (!dstData) return QImage();

    uint8_t* dstPlane[1] = { dstData };
    int dstLinesize[1] = { dstStride };
    sws_scale(m_pSwsCtx, frame->data, frame->linesize, 0, frame->height,
        dstPlane, dstLinesize);

    QImage image(dstData, dstWidth, dstHeight, dstStride, QImage::Format_ARGB32);
    QImage result = image.copy();
    av_free(dstData);
    return result;
}

//---------------------------------------------------------------------------
// 音频相关实现（MP3 文件输入）
//---------------------------------------------------------------------------

bool CVideoCapTask::InitAudioCapture()
{
    if (!m_audioEnabled || m_mp3FilePath.isEmpty())
        return false;

    // 打开 MP3 文件
    int ret = avformat_open_input(&m_pAudioFmtCtx, m_mp3FilePath.toUtf8().constData(), nullptr, nullptr);
    if (ret < 0) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qCritical() << "Cannot open MP3 file:" << m_mp3FilePath << errbuf;
        return false;
    }

    // 获取流信息
    if (avformat_find_stream_info(m_pAudioFmtCtx, nullptr) < 0) {
        qCritical() << "Cannot find stream info in MP3 file";
        return false;
    }

    // 寻找最佳音频流
    m_audioInStreamIndex = av_find_best_stream(m_pAudioFmtCtx, AVMEDIA_TYPE_AUDIO, -1, -1, nullptr, 0);
    if (m_audioInStreamIndex < 0) {
        qCritical() << "No audio stream found in MP3 file";
        return false;
    }

    // 获取解码器
    AVCodecParameters* codecpar = m_pAudioFmtCtx->streams[m_audioInStreamIndex]->codecpar;
    const AVCodec* dec = avcodec_find_decoder(codecpar->codec_id);
    if (!dec) {
        qCritical() << "Audio decoder not found for MP3";
        return false;
    }

    m_pAudioDecCtx = avcodec_alloc_context3(dec);
    if (avcodec_parameters_to_context(m_pAudioDecCtx, codecpar) < 0) {
        qCritical() << "Failed to copy audio codec params";
        return false;
    }
    if (avcodec_open2(m_pAudioDecCtx, dec, nullptr) < 0) {
        qCritical() << "Failed to open audio decoder";
        return false;
    }

    // 获取音频总时长（用于循环）
    if (m_pAudioFmtCtx->duration != AV_NOPTS_VALUE) {
        m_audioFileDuration = m_pAudioFmtCtx->duration;
    }

    qDebug() << "MP3 file opened:" << m_mp3FilePath
        << "sample_rate:" << m_pAudioDecCtx->sample_rate
        << "channels:" << m_pAudioDecCtx->channels;
    return true;
}

void CVideoCapTask::CloseAudioCapture()
{
    if (m_pAudioSwrCtx) {
        swr_free(&m_pAudioSwrCtx);
        m_pAudioSwrCtx = nullptr;
    }
    if (m_pAudioDecCtx) {
        avcodec_free_context(&m_pAudioDecCtx);
        m_pAudioDecCtx = nullptr;
    }
    if (m_pAudioFmtCtx) {
        avformat_close_input(&m_pAudioFmtCtx);
        m_pAudioFmtCtx = nullptr;
    }
    m_audioInStreamIndex = -1;
}

bool CVideoCapTask::InitAudioEncoder()
{
    if (!m_audioEnabled || !m_pOutFormatCtx)
        return false;

    const AVCodec* encoder = avcodec_find_encoder(AV_CODEC_ID_AAC);
    if (!encoder) {
        qWarning() << "AAC encoder not found";
        return false;
    }

    AVStream* outStream = avformat_new_stream(m_pOutFormatCtx, encoder);
    if (!outStream) return false;
    m_outAudioStreamIndex = outStream->index;

    m_pOutAudioCodecCtx = avcodec_alloc_context3(encoder);
    if (!m_pOutAudioCodecCtx) return false;

    // 设置编码参数
    m_pOutAudioCodecCtx->sample_fmt = encoder->sample_fmts[0];  // 通常为 AV_SAMPLE_FMT_FLTP
    m_pOutAudioCodecCtx->sample_rate = m_audioSampleRate;
    m_pOutAudioCodecCtx->channels = m_audioChannels;
    m_pOutAudioCodecCtx->channel_layout = av_get_default_channel_layout(m_audioChannels);
    m_pOutAudioCodecCtx->bit_rate = m_audioBitrate;
    m_pOutAudioCodecCtx->time_base = AVRational{ 1, m_audioSampleRate };

    if (m_pOutFormatCtx->oformat->flags & AVFMT_GLOBALHEADER)
        m_pOutAudioCodecCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

    int ret = avcodec_open2(m_pOutAudioCodecCtx, encoder, nullptr);
    if (ret < 0) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qCritical() << "Cannot open audio encoder:" << errbuf;
        avcodec_free_context(&m_pOutAudioCodecCtx);
        return false;
    }

    ret = avcodec_parameters_from_context(outStream->codecpar, m_pOutAudioCodecCtx);
    if (ret < 0) {
        qCritical() << "Failed to copy audio encoder params";
        avcodec_free_context(&m_pOutAudioCodecCtx);
        return false;
    }
    outStream->time_base = m_pOutAudioCodecCtx->time_base;

    if (m_pOutAudioCodecCtx->frame_size <= 0)
        m_pOutAudioCodecCtx->frame_size = 1024;

    if (!m_pAudioDecCtx) {
        qWarning() << "Audio decoder context is null, cannot create resampler";
        return false;
    }

    AVSampleFormat srcFormat = m_pAudioDecCtx->sample_fmt;
    int srcSampleRate = m_pAudioDecCtx->sample_rate;
    uint64_t srcChannelLayout = m_pAudioDecCtx->channel_layout;
    if (srcChannelLayout == 0)
        srcChannelLayout = av_get_default_channel_layout(m_pAudioDecCtx->channels);

    AVSampleFormat dstFormat = m_pOutAudioCodecCtx->sample_fmt;
    int dstSampleRate = m_pOutAudioCodecCtx->sample_rate;
    uint64_t dstChannelLayout = m_pOutAudioCodecCtx->channel_layout;

    m_pAudioSwrCtx = swr_alloc_set_opts(nullptr,
        dstChannelLayout, dstFormat, dstSampleRate,
        srcChannelLayout, srcFormat, srcSampleRate,
        0, nullptr);
    if (!m_pAudioSwrCtx) {
        qCritical() << "Failed to allocate audio resampler";
        return false;
    }

    if (swr_init(m_pAudioSwrCtx) < 0) {
        qCritical() << "Failed to init audio resampler";
        swr_free(&m_pAudioSwrCtx);
        return false;
    }

    qDebug() << "Audio encoder initialized, sample_rate:" << m_audioSampleRate
        << "channels:" << m_audioChannels
        << "frame_size:" << m_pOutAudioCodecCtx->frame_size;
    return true;
}

bool CVideoCapTask::EncodeAudioFrame(AVFrame* frame)
{
    if (!m_pOutAudioCodecCtx) return false;

    int ret = avcodec_send_frame(m_pOutAudioCodecCtx, frame);
    if (ret < 0 && ret != AVERROR_EOF) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qWarning() << "Error sending audio frame to encoder:" << errbuf;
        return false;
    }

    while (true) {
        AVPacket* pkt = av_packet_alloc();
        ret = avcodec_receive_packet(m_pOutAudioCodecCtx, pkt);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            av_packet_free(&pkt);
            break;
        }
        else if (ret < 0) {
            av_packet_free(&pkt);
            break;
        }

        pkt->stream_index = m_outAudioStreamIndex;
        av_packet_rescale_ts(pkt, m_pOutAudioCodecCtx->time_base,
            m_pOutFormatCtx->streams[m_outAudioStreamIndex]->time_base);

        QMutexLocker lock(&m_queueMutex);
        m_audioPacketQueue.enqueue(pkt);
        m_queueCond.wakeOne();
    }
    return true;
}

void CVideoCapTask::AudioCaptureLoop()
{
    AVPacket* pkt = av_packet_alloc();
    AVFrame* rawFrame = av_frame_alloc();

    // 循环播放时重新打开文件的 lambda
    auto reopenFile = [this]() -> bool {
        CloseAudioCapture();               // 释放旧上下文
        if (!InitAudioCapture()) {
            qCritical() << "Failed to reopen MP3 file for loop";
            m_audioRunning = false;
            return false;
        }
        // 重新创建重采样器（因为解码器参数可能不变，但重新创建安全）
        if (!InitAudioEncoder()) {
            qCritical() << "Failed to re-init audio encoder after loop";
            return false;
        }
        m_audioPtsCounter = 0;
        return true;
        };

    while (m_audioRunning) {
        int ret = av_read_frame(m_pAudioFmtCtx, pkt);
        if (ret < 0) {
            if (ret == AVERROR_EOF) {
                if (m_loopAudio) {
                    qDebug() << "MP3 file ended, restarting loop";
                    if (!reopenFile()) break;
                    continue;  // 重新循环
                }
                else {
                    qDebug() << "MP3 file ended, audio capture stopping";
                    break;
                }
            }
            else if (ret == AVERROR(EAGAIN)) {
                std::this_thread::sleep_for(std::chrono::milliseconds(1));
                continue;
            }
            else {
                char errbuf[128];
                av_strerror(ret, errbuf, sizeof(errbuf));
                qWarning() << "av_read_frame error:" << errbuf;
                break;
            }
        }

        if (pkt->stream_index == m_audioInStreamIndex) {
            ret = avcodec_send_packet(m_pAudioDecCtx, pkt);
            if (ret < 0) {
                av_packet_unref(pkt);
                continue;
            }

            while (true) {
                ret = avcodec_receive_frame(m_pAudioDecCtx, rawFrame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                    break;
                if (ret < 0) {
                    qWarning() << "Audio decode error";
                    break;
                }

                if (m_pAudioSwrCtx) {
                    // 将原始帧送入重采样器
                    swr_convert(m_pAudioSwrCtx, nullptr, 0,
                        (const uint8_t**)rawFrame->data, rawFrame->nb_samples);

                    AVFrame* outFrame = av_frame_alloc();
                    outFrame->format = m_pOutAudioCodecCtx->sample_fmt;
                    outFrame->sample_rate = m_pOutAudioCodecCtx->sample_rate;
                    outFrame->channels = m_pOutAudioCodecCtx->channels;
                    outFrame->channel_layout = m_pOutAudioCodecCtx->channel_layout;
                    outFrame->nb_samples = m_pOutAudioCodecCtx->frame_size;

                    while (true) {
                        if (av_frame_get_buffer(outFrame, 0) < 0) break;
                        int out_samples = swr_convert(m_pAudioSwrCtx,
                            outFrame->data, outFrame->nb_samples,
                            nullptr, 0);
                        if (out_samples <= 0) break;
                        outFrame->pts = m_audioPtsCounter;
                        m_audioPtsCounter += out_samples;
                        EncodeAudioFrame(outFrame);
                        av_frame_unref(outFrame);
                    }
                    av_frame_free(&outFrame);
                }
                av_frame_unref(rawFrame);
            }
        }
        av_packet_unref(pkt);
    }

    // 刷新编码器
    EncodeAudioFrame(nullptr);
    av_packet_free(&pkt);
    av_frame_free(&rawFrame);
    qDebug() << "Audio capture loop ended";
}

//---------------------------------------------------------------------------
// 视频推流相关
//---------------------------------------------------------------------------

bool CVideoCapTask::InitPushStream()
{
    if (m_rtspUrl.isEmpty()) {
        qWarning() << "RTSP URL is empty, cannot push";
        return false;
    }

    int ret = avformat_alloc_output_context2(&m_pOutFormatCtx, nullptr, "rtsp", m_rtspUrl.toStdString().c_str());
    if (!m_pOutFormatCtx || ret < 0) {
        qCritical() << "Failed to allocate output context for RTSP";
        return false;
    }

    // 初始化视频编码器和流
    const AVCodec* videoEncoder = avcodec_find_encoder(AV_CODEC_ID_H264);
    if (!videoEncoder) {
        qCritical() << "H.264 encoder not found";
        return false;
    }

    AVStream* outVideoStream = avformat_new_stream(m_pOutFormatCtx, nullptr);
    if (!outVideoStream) {
        qCritical() << "Failed to create video output stream";
        return false;
    }
    m_outVideoStreamIndex = outVideoStream->index;

    m_pOutVideoCodecCtx = avcodec_alloc_context3(videoEncoder);
    if (!m_pOutVideoCodecCtx) return false;

    int width = m_pCodecCtx->width;
    int height = m_pCodecCtx->height;
    m_pOutVideoCodecCtx->width = width;
    m_pOutVideoCodecCtx->height = height;
    m_pOutVideoCodecCtx->pix_fmt = AV_PIX_FMT_YUV420P;
    m_pOutVideoCodecCtx->time_base = AVRational{ 1, m_framerate };
    m_pOutVideoCodecCtx->framerate = AVRational{ m_framerate, 1 };
    m_pOutVideoCodecCtx->bit_rate = 2 * 1024 * 1024;
    m_pOutVideoCodecCtx->gop_size = m_framerate;
    m_pOutVideoCodecCtx->max_b_frames = 0;
    if (m_pOutFormatCtx->oformat->flags & AVFMT_GLOBALHEADER)
        m_pOutVideoCodecCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

    ret = avcodec_open2(m_pOutVideoCodecCtx, videoEncoder, nullptr);
    if (ret < 0) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qCritical() << "Cannot open video encoder:" << errbuf;
        return false;
    }

    ret = avcodec_parameters_from_context(outVideoStream->codecpar, m_pOutVideoCodecCtx);
    if (ret < 0) {
        qCritical() << "Failed to copy video encoder parameters";
        return false;
    }
    outVideoStream->time_base = m_pOutVideoCodecCtx->time_base;

    m_pOutVideoSwsCtx = sws_getContext(width, height, m_pCodecCtx->pix_fmt,
        width, height, AV_PIX_FMT_YUV420P,
        SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
    if (!m_pOutVideoSwsCtx) {
        qCritical() << "Cannot create video SwsContext for encoder";
        return false;
    }

    // 如果启用了音频，初始化音频编码器（依赖于已打开的解码器和重采样器）
    if (m_audioEnabled && !m_mp3FilePath.isEmpty()) {
        if (!InitAudioEncoder()) {
            qWarning() << "Failed to init audio encoder, continuing without audio";
            m_audioEnabled = false;
        }
        else {
            qDebug() << "Audio encoder initialized successfully";
        }
    }

    m_pOutFormatCtx->pb = nullptr;
    AVDictionary* opts = nullptr;
    av_dict_set(&opts, "rtsp_transport", "tcp", 0);

    ret = avformat_write_header(m_pOutFormatCtx, &opts);
    av_dict_free(&opts);
    if (ret < 0) {
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qCritical() << "Failed to write header:" << errbuf;
        return false;
    }

    m_videoPtsCounter = 0;
    qDebug() << "RTSP push stream initialized, URL:" << m_rtspUrl;
    return true;
}

void CVideoCapTask::ClosePushStream()
{
    QMutexLocker lock(&m_queueMutex);
    while (!m_audioPacketQueue.isEmpty()) {
        AVPacket* pkt = m_audioPacketQueue.dequeue();
        av_packet_unref(pkt);
        av_packet_free(&pkt);
    }
    lock.unlock();

    if (m_pOutVideoCodecCtx) {
        avcodec_free_context(&m_pOutVideoCodecCtx);
        m_pOutVideoCodecCtx = nullptr;
    }
    if (m_pOutAudioCodecCtx) {
        avcodec_free_context(&m_pOutAudioCodecCtx);
        m_pOutAudioCodecCtx = nullptr;
    }
    if (m_pOutVideoSwsCtx) {
        sws_freeContext(m_pOutVideoSwsCtx);
        m_pOutVideoSwsCtx = nullptr;
    }
    if (m_pOutFormatCtx) {
        if (m_pOutFormatCtx->pb) {
            av_write_trailer(m_pOutFormatCtx);
            avio_close(m_pOutFormatCtx->pb);
        }
        avformat_free_context(m_pOutFormatCtx);
        m_pOutFormatCtx = nullptr;
    }
    m_outVideoStreamIndex = -1;
    m_outAudioStreamIndex = -1;
    m_videoPtsCounter = 0;
    m_audioPtsCounter = 0;
}

bool CVideoCapTask::PushVideoFrame(AVFrame* frame)
{
    if (!m_pOutFormatCtx || !m_pOutVideoCodecCtx || !m_pOutVideoSwsCtx)
        return false;

    // 转换为 YUV420P
    AVFrame* yuvFrame = av_frame_alloc();
    yuvFrame->format = AV_PIX_FMT_YUV420P;
    yuvFrame->width = frame->width;
    yuvFrame->height = frame->height;
    if (av_frame_get_buffer(yuvFrame, 0) < 0) {
        av_frame_free(&yuvFrame);
        qWarning() << "Failed to allocate YUV frame buffer";
        return false;
    }

    sws_scale(m_pOutVideoSwsCtx, frame->data, frame->linesize, 0, frame->height,
        yuvFrame->data, yuvFrame->linesize);

    yuvFrame->pts = m_videoPtsCounter++;

    int ret = avcodec_send_frame(m_pOutVideoCodecCtx, yuvFrame);
    if (ret < 0) {
        av_frame_free(&yuvFrame);
        char errbuf[128];
        av_strerror(ret, errbuf, sizeof(errbuf));
        qWarning() << "Error sending video frame to encoder:" << errbuf;
        return false;
    }

    AVPacket pkt;
    av_init_packet(&pkt);
    pkt.data = nullptr;
    pkt.size = 0;

    while (true) {
        ret = avcodec_receive_packet(m_pOutVideoCodecCtx, &pkt);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
            break;
        else if (ret < 0) {
            qWarning() << "Error receiving video packet";
            break;
        }

        pkt.stream_index = m_outVideoStreamIndex;
        av_packet_rescale_ts(&pkt, m_pOutVideoCodecCtx->time_base,
            m_pOutFormatCtx->streams[m_outVideoStreamIndex]->time_base);
        av_interleaved_write_frame(m_pOutFormatCtx, &pkt);
        av_packet_unref(&pkt);
    }

    av_frame_free(&yuvFrame);
    return true;
}

void CVideoCapTask::WriteQueuedPackets()
{
    if (!m_pOutFormatCtx) return;

    QMutexLocker lock(&m_queueMutex);
    while (!m_audioPacketQueue.isEmpty()) {
        AVPacket* pkt = m_audioPacketQueue.dequeue();
        lock.unlock();
        int ret = av_interleaved_write_frame(m_pOutFormatCtx, pkt);
        if (ret < 0) {
            char errbuf[128];
            av_strerror(ret, errbuf, sizeof(errbuf));
            qWarning() << "Error writing audio packet:" << errbuf;
        }
        av_packet_unref(pkt);
        av_packet_free(&pkt);
        lock.relock();
    }
}

//---------------------------------------------------------------------------
// 主线程 run
//---------------------------------------------------------------------------

void CVideoCapTask::run()
{
    if (!OpenDevice()) {
        qCritical() << "Failed to open device, cannot run";
        m_running = false;
        return;
    }

    // 1. 初始化音频采集（打开 MP3 文件，创建解码器）
    if (m_audioEnabled && !m_mp3FilePath.isEmpty()) {
        if (!InitAudioCapture()) {
            qWarning() << "Failed to init MP3 audio capture, audio disabled";
            m_audioEnabled = false;
        }
        else {
            qDebug() << "MP3 audio capture initialized";
        }
    }

    // 2. 初始化推流（此时会创建音频编码器和重采样器，需要 m_pAudioDecCtx 已存在）
    if (m_pushEnabled && !m_rtspUrl.isEmpty()) {
        if (!InitPushStream()) {
            qWarning() << "Failed to initialize RTSP push stream, continuing without push";
            m_pushEnabled = false;
        }
    }

    // 3. 启动音频采集线程（此时 m_pAudioSwrCtx 和编码器已就绪）
    if (m_audioEnabled && m_pAudioDecCtx && m_pOutAudioCodecCtx) {
        m_audioRunning = true;
        m_pAudioThread = QThread::create([this]() { AudioCaptureLoop(); });
        m_pAudioThread->start();
        qDebug() << "Audio capture thread started";
    }
    else if (m_audioEnabled) {
        qWarning() << "Audio decoder or encoder not ready, audio disabled";
        m_audioEnabled = false;
    }

    AVPacket packet;
    AVFrame* pFrame = av_frame_alloc();
    if (!pFrame) {
        qCritical() << "Failed to allocate frame";
        CloseDevice();
        ClosePushStream();
        CloseAudioCapture();
        m_running = false;
        return;
    }

    int frame_count = 0;
    while (m_running) {
        // 处理待发送的音频包
        if (m_pushEnabled && m_pOutFormatCtx && m_audioEnabled)
            WriteQueuedPackets();

        if (++frame_count % 30 == 0)
            qDebug() << "Video loop alive, frame_count:" << frame_count;

        int ret = av_read_frame(m_pFormatCtx, &packet);
        if (ret < 0) {
            if (ret == AVERROR(EAGAIN)) {
                std::this_thread::sleep_for(std::chrono::milliseconds(1));
                continue;
            }
            else {
                char errbuf[128];
                av_strerror(ret, errbuf, sizeof(errbuf));
                qWarning() << "av_read_frame error:" << errbuf;
                std::this_thread::sleep_for(std::chrono::milliseconds(10));
                continue;
            }
        }

        if (packet.stream_index == m_videoStreamIndex) {
            ret = avcodec_send_packet(m_pCodecCtx, &packet);
            if (ret < 0) {
                av_packet_unref(&packet);
                continue;
            }

            while (true) {
                ret = avcodec_receive_frame(m_pCodecCtx, pFrame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                    break;
                else if (ret < 0) {
                    char errbuf[128];
                    av_strerror(ret, errbuf, sizeof(errbuf));
                    qWarning() << "Decode error:" << errbuf;
                    break;
                }

                // 预览（每3帧发送一次）
                static int previewFrameCount = 0;
                if (++previewFrameCount % 3 == 0) {
                    QImage img = AVFrameToQImage(pFrame);
                    if (!img.isNull())
                        emit SigFrameCaptured(img);
                }

                if (m_pushEnabled && m_pOutFormatCtx) {
                    if (!PushVideoFrame(pFrame)) {
                        qWarning() << "PushVideoFrame failed";
                    }
                }

                av_frame_unref(pFrame);
            }
        }
        av_packet_unref(&packet);
    }

    // 停止音频线程
    if (m_pAudioThread && m_audioRunning) {
        m_audioRunning = false;
        m_pAudioThread->quit();
        m_pAudioThread->wait();
        delete m_pAudioThread;
        m_pAudioThread = nullptr;
    }

    // 视频编码器 flush
    if (m_pushEnabled && m_pOutVideoCodecCtx) {
        avcodec_send_frame(m_pOutVideoCodecCtx, nullptr);
        AVPacket pkt;
        while (avcodec_receive_packet(m_pOutVideoCodecCtx, &pkt) == 0) {
            av_packet_rescale_ts(&pkt, m_pOutVideoCodecCtx->time_base,
                m_pOutFormatCtx->streams[m_outVideoStreamIndex]->time_base);
            av_interleaved_write_frame(m_pOutFormatCtx, &pkt);
            av_packet_unref(&pkt);
        }
    }

    av_frame_free(&pFrame);
    CloseDevice();
    ClosePushStream();
    CloseAudioCapture();
    qDebug() << "CVideoCapTask stopped";
}

使用示例：

cpp 复制代码

        m_videoCapTask.SetDeviceName(qstrVideoName);
        //m_videoCapTask.SetAudioDeviceName(ui->comboBox_Audio->currentText());
        //QString qstrAudio = ui->comboBox_Audio->currentText();
        //m_videoCapTask.SetAudioDeviceName("@device_pnp_\\\\?\\usb#vid_13d3&pid_56ba&mi_00#6&222db1a5&0&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\\global");
        m_videoCapTask.SetMp3FilePath(TransString2Unicode("E:/SoftWare/DownLoad/喀什噶尔胡杨-刀郎.mp3"), true); // 循环播放 MP3
        m_videoCapTask.SetAudioOptions(44100, 2, 128000);       // 可选：设置 AAC 编码参数
        m_videoCapTask.EnableAudioPush(true);
        m_videoCapTask.SetRtspUrl(ui->lineEdit_LivePath->text());
        m_videoCapTask.EnablePush(true);
        m_videoCapTask.StartCapture();

结果：

七、总结

本文详细介绍了使用 FFmpeg 和 Qt 实现摄像头视频采集 + MP3 背景音频推 RTSP 流的技术要点。通过多线程架构合理分离视频和音频处理，利用队列传递音频编码包，实现音视频混流。同时讲解了重采样、循环播放等进阶功能。

掌握这些技术后，可以轻松扩展为支持多路摄像头、叠加文字/图片水印、动态切换背景音乐等更复杂的直播应用。

参考资料：

FFmpeg 官方文档：https://ffmpeg.org/doxygen/trunk/
Qt 多线程编程：https://doc.qt.io/qt-6/qthread.html
RTSP 协议：RFC 7826

希望本文对您有所帮助，欢迎在评论区交流讨论！

上一篇文章 FFmpeg 6.0 实战：用 C++ 封装摄像头采集与 RTSP 推流