【音视频开发】10. 使用 FFmpeg API 编码 ADTS 音频流

使用 FFmpeg API 编码 ADTS 音频流

1、音频格式简单介绍

PCM

PCM 脉冲编码调制：将模拟信号转换为数字信号的技术，未压缩的音频
packed 打包格式：左右声道的样本交替存储， L1 R1 L2 R2 L3 R3 L4 R4 ......
planar 平面格式：左右声道的样本分别连续存储，L1 L2 L3 L4 ...... R1 R2 R3 R4

AAC

AAC：Advanced Audio Coding，由 MPEG-4 标准定义的有损⾳频压缩格式
ADTS：Audio Data Transport Stream，AAC ⾳频封装格式，允许每个帧都有头
ADIF：Audio Data Interchange Format，AAC ⾳频封装格式，只有一个整体的头

2、相关的数据结构

音频 AVFrame

AVFrame 主要用于存储解码后的原始音频数据（编码前）
包括：plane数据数组、行长度数组、每个声道的样本数、格式等
plane：表示一片连续的缓冲区
.format 表示 样本格式
.sample_rate 表示音频 采样率
.nb_samples 表示 每个声道的样本数
.ch_layout.nb_channels 表示音频 声道数
.data 表示 plane数据（样本数据）缓冲区数组
- packed 音频：LR 交织存储在 data[0]
- planar 音频：data[0] 指向 L-plane，data[1] 指向 R-plane
.linesize 表示行长度数组
- 音频仅可设置 linesize[0]，表示一个音频 plane 的大小

音频 AVPacket

AVPacket 主要用于存储压缩的音频数据（编码后）
包含缓冲区信息、显示事件戳 pts、解码时间戳等信息

音频 AVCodecContext

AVCodecContext 结构体存储音频编码器的各种参数
.sample_rate 表示音频 采样率
.ch_layout.nb_channels 表示音频 声道数
.sample_fmt 表示 样本格式
.frame_size 表示 每个 AVFrame 中每声道的样本数
.bit_rate 表示最终编码器输出的 目标码率
.profile 表示 音频质量 ，直接用 FF_PROFILE_AAC_LOW 即可
.flags 表示编码的标志，AV_CODEC_FLAG_GLOBAL_HEADER 表示不带 adts 头

3、相关的 API

编码器配置 API

avcodec_find_encoder_by_name：从名字获取编码器对象
avcodec_find_encoder：从 AVCodecID 获取编码器对象
avcodec_alloc_context3：根据编码器对象分配编码器上下文内存
avcodec_free_context：释放编码器上下文内存
avcodec_get_supported_config：获取编码器支持的样本采样率、样本格式列表
avcodec_open2：根据编码器上下文的参数打开编码器，新版本不需要关闭

样本格式 API

av_get_bytes_per_sample：获取每个样本的字节数
av_get_sample_fmt_name：获取样本格式名字
av_sample_fmt_is_planar：判断是否为平面格式

音频编码 API

av_frame_make_writable：让 AVFrame 转为可写状态（编码需要写入 pcm数据）
av_samples_fill_arrays：按照指定的格式把数据填入 AVFrame
avcodec_send_frame：把未压缩的音频数据传给编码器
avcodec_receive_packet：从编码器中取出压缩后的音频数据

4、代码实战 ------ 编码 AAC 音频流

需求：
- 输入 .pcm 文件（打包格式）、声道数、采样率、样本格式、码率、编码器名称
- 输出一个 .aac ADTS 格式文件
思路：
- 生成 pcm 数据 ：ffmpeg -i av.mp4 -ar 48000 -ac 2 -f f32le audio.pcm
- 检查编码器是否支持输入的格式，编码器参数写入 AVCodecContext 中
- 把编码器参数拷贝到 AVFrame 中
- 按照编码器参数循环读取 pcm 数据，放入 AVFrame 中
- 把 AVFrame send 进编码器
- 循环从编码器 receive 出 AVPacket
- 从 AVPacket 中提取出 aac 编码帧
- 在每个 aac 编码帧前面封装 ADTS 头，写入文件中
- 测试编码结果 ：ffplay audio.aac
代码示例的环境 ：
- 工具链：VS2022，std=c++20
- 依赖1：ffmpeg7.1 的 avcodec，avformat，avutil
- 依赖2：glog
- 注意：源码中没有实现 ADTS 头封装，详见 ADTS头封装
源码：

c++ 复制代码

extern "C" {
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
}

#include <chrono>
#include <fstream>
#include <sstream>
#include <string_view> // std=c++17

#include <glog/logging.h>

static constexpr int ADTS_HEADER_LEN = 7;
static constexpr int kDefaultProfile = FF_PROFILE_AAC_LOW;
static constexpr int kDefaultCodecFlags = AV_CODEC_FLAG_GLOBAL_HEADER;
thread_local static char error_buffer[AV_ERROR_MAX_STRING_SIZE] = {}; // store FFmpeg error string

extern bool GenerateHeaderADTS(uint8_t *adts_header_buf, int data_len, int profile, int sample_rate, int nb_channels);

/**
 * @brief Convert FFmpeg error code to error string
 * @param error_code FFmpeg error code
 * @return error string
 */
static char *ErrorToString(const int error_code) {
    std::memset(error_buffer, 0, AV_ERROR_MAX_STRING_SIZE);
    return av_make_error_string(error_buffer, AV_ERROR_MAX_STRING_SIZE, error_code);
}

/**
 * @brief Set AVCodecContext->sample_fmt
 * @param codec AVCodec
 * @param codec_ctx AVCodecContext
 * @param sample_fmt sample format
 * @return true if success, false otherwise
 */
static bool SetSampleFormat(const AVCodec *codec, AVCodecContext *codec_ctx, const AVSampleFormat sample_fmt) {
    if (!codec || !codec_ctx) {
        return false;
    }

    // get supported sample formats and traverse
    int nb_sample_fmts{};
    const void *sample_fmts = nullptr;
    avcodec_get_supported_config(nullptr, codec, AV_CODEC_CONFIG_SAMPLE_FORMAT, 0, &sample_fmts, &nb_sample_fmts);
    for (int i = 0; i < nb_sample_fmts; ++i) {
        if (sample_fmt == *(static_cast<const AVSampleFormat *>(sample_fmts) + i)) {
            codec_ctx->sample_fmt = sample_fmt;
            return true;
        }
    }

    LOG(ERROR) << "Specified sample format \"" << av_get_sample_fmt_name(sample_fmt) << "\" is not supported by the \""
               << avcodec_get_name(codec->id) << "\" encoder";
    std::ostringstream oss;
    for (int i = 0; i < nb_sample_fmts; ++i) {
        oss << av_get_sample_fmt_name(*(static_cast<const AVSampleFormat *>(sample_fmts) + i)) << " ";
    }
    LOG(ERROR) << "Supported sample formats: " << oss.str();
    return false;
}

/**
 * @brief Set AVCodecContext->sample_rate
 * @param codec AVCodec
 * @param codec_ctx AVCodecContext
 * @param sample_rate sample rate
 * @return true if success, false otherwise
 */
static bool SetSampleRate(const AVCodec *codec, AVCodecContext *codec_ctx, int sample_rate) {
    if (!codec || !codec_ctx) {
        return false;
    }

    // get supported sample rates and traverse
    int nb_sample_rates{};
    const void *sample_rates = nullptr;
    avcodec_get_supported_config(nullptr, codec, AV_CODEC_CONFIG_SAMPLE_RATE, 0, &sample_rates, &nb_sample_rates);
    for (int i = 0; i < nb_sample_rates; ++i) {
        if (sample_rate == *(static_cast<const int *>(sample_rates) + i)) {
            codec_ctx->sample_rate = sample_rate;
            return true;
        }
    }

    LOG(ERROR) << "Specified sample rate " << sample_rate << " is not supported by the \""
               << avcodec_get_name(codec->id) << "\" encoder";
    std::ostringstream oss;
    for (int i = 0; i < nb_sample_rates; ++i) {
        oss << *(static_cast<const int *>(sample_rates) + i) << ", ";
    }
    LOG(ERROR) << "Supported sample rates: " << oss.str();
    return false;
}


/**
 * @brief Encode pcm to aac, write to file
 * @param codec_ctx codec context
 * @param frame pcm frame
 * @param pkt aac packet
 * @param ofs aac file stream
 * @return true if success, false otherwise
 */
static bool EncodeAndWrite(AVCodecContext *codec_ctx, AVFrame *frame, AVPacket *pkt, std::ofstream &ofs) {
    if (!codec_ctx || !pkt || !ofs) {
        return false;
    }

    int error_code{};
    int profile = codec_ctx->profile;
    int sample_rate = codec_ctx->sample_rate;
    int nb_channels = codec_ctx->ch_layout.nb_channels;

    // send pcm to encoder
    if ((error_code = avcodec_send_frame(codec_ctx, frame)) < 0) {
        if (error_code != AVERROR(EAGAIN) && error_code != AVERROR_EOF) {
            LOG(ERROR) << "Failed to send packet to encoder: " << ErrorToString(error_code);
            return false;
        }
    }

    // receive aac from encoder, until EOF
    // do not need to manage aac memory
    while ((error_code = avcodec_receive_packet(codec_ctx, pkt)) == 0) {
        if (!ofs) {
            continue;
        }
        if ((codec_ctx->flags & AV_CODEC_FLAG_GLOBAL_HEADER)) {
            uint8_t adts_header[ADTS_HEADER_LEN];
            if (GenerateHeaderADTS(adts_header, pkt->size, profile, sample_rate, nb_channels)) {
                ofs.write(reinterpret_cast<char *>(adts_header), ADTS_HEADER_LEN);
            }
        }
        ofs.write(reinterpret_cast<char *>(pkt->data), pkt->size);
    }
    if (error_code != AVERROR(EAGAIN) && error_code != AVERROR_EOF) {
        LOG(ERROR) << "Failed to receive frame from encoder: " << ErrorToString(error_code);
        return false;
    }

    if (!ofs) {
        LOG(ERROR) << "Failed to write aac file, ofstream is broken";
        return false;
    }
    return true;
}

/**
 * @brief read pcm data, encode pcm to aac, write
 * @param codec_ctx codec context
 * @param frame pcm frame
 * @param ifs pcm file stream
 * @param ofs aac file stream
 * @return true if success, false otherwise
 */
static bool InnerEncodeAudioAAC(AVCodecContext *codec_ctx, AVFrame *frame, std::ifstream &ifs, std::ofstream &ofs) {
    if (!codec_ctx || !frame || !ifs || !ofs) {
        return false;
    }

    int error_code{};

    int bytes_per_sample = av_get_bytes_per_sample(codec_ctx->sample_fmt);
    if (bytes_per_sample <= 0) {
        LOG(ERROR) << "Failed to get bytes per sample";
        return false;
    }

    // allocate AVBufferRef[] according to the codec parameters
    frame->format = codec_ctx->sample_fmt;
    frame->ch_layout = codec_ctx->ch_layout;
    frame->nb_samples = codec_ctx->frame_size;
    frame->sample_rate = codec_ctx->sample_rate;
    if ((error_code = av_frame_get_buffer(frame, 0)) < 0) {
        LOG(ERROR) << "Failed to allocate AVBufferRef[] in AVFrame: " << ErrorToString(error_code);
        return false;
    }

    // allocate AVPacket
    AVPacket *pkt = av_packet_alloc();
    if (pkt == nullptr) {
        LOG(ERROR) << "Failed to allocate AVPacket: av_packet_alloc()";
        return false;
    }

    bool ret = true;
    int64_t pts = 0;
    int nb_samples = frame->nb_samples;
    int nb_channels = frame->ch_layout.nb_channels;
    AVSampleFormat sample_fmt = codec_ctx->sample_fmt;
    int bytes_per_frame = bytes_per_sample * nb_channels * nb_samples;
    auto pcm_buffer_packed = std::make_unique<uint8_t[]>(bytes_per_frame);
    auto pcm_buffer_planar = std::make_unique<uint8_t[]>(bytes_per_frame);

    while (true) {
        // read pcm samples
        std::memset(pcm_buffer_packed.get(), 0, bytes_per_frame);
        if (!ifs.read(reinterpret_cast<char *>(pcm_buffer_packed.get()), bytes_per_frame)) {
            if (!ifs.eof()) {
                LOG(ERROR) << "Failed to read input file: ifstream is broken";
                ret = false;
                break;
            }
        }
        int bytes_read = static_cast<int>(ifs.gcount());
        int samples_read = bytes_read / bytes_per_sample;
        int nb_samples_read = samples_read / nb_channels;
        if (nb_samples_read <= 0) {
            break;
        }

        // convert pcm sample format
        uint8_t *data = nullptr;
        if (av_sample_fmt_is_planar(sample_fmt)) {
            data = pcm_buffer_planar.get();
            std::memset(data, 0, bytes_per_frame);
            for (int i = 0; i < nb_channels; ++i) {
                for (int j = i; j < samples_read; j += nb_channels) {
                    std::memcpy(data, pcm_buffer_packed.get() + j * bytes_per_sample, bytes_per_sample);
                    data += bytes_per_sample;
                }
            }
            data = pcm_buffer_planar.get();
        } else {
            data = pcm_buffer_packed.get();
        }

        // initialize AVFrame
        if ((error_code = av_frame_make_writable(frame)) < 0) {
            LOG(ERROR) << "Failed to make AVFrame writable: " << ErrorToString(error_code);
            ret = false;
            break;
        }
        if ((error_code = av_samples_fill_arrays(frame->data, frame->linesize, data, nb_channels, nb_samples_read,
                                                 sample_fmt, 0)) < 0) {
            LOG(ERROR) << "Failed to fill AVFrame data: " << ErrorToString(error_code);
            ret = false;
            break;
        }
        pts += nb_samples;
        frame->pts = pts;

        // encode pcm to aac, write to file
        if (!EncodeAndWrite(codec_ctx, frame, pkt, ofs)) {
            ret = false;
            break;
        }

        if (ifs.eof()) {
            break;
        }
    }

    // if encode end, drain the encoder
    if (!EncodeAndWrite(codec_ctx, nullptr, pkt, ofs)) {
        ret = false;
    }

    av_packet_free(&pkt);
    return ret;
}

/**
 * @brief encode pcm to aac
 * @param nb_channels number of channels
 * @param sample_rate sample rate
 * @param sample_fmt sample format
 * @param bit_rate target audio bit rate
 * @param codec_name codec name
 * @param input_file input file
 * @param output_file output file
 */
void EncodeAudioAAC(int nb_channels, int sample_rate, AVSampleFormat sample_fmt, int64_t bit_rate,
                    std::string_view codec_name, std::string_view input_file, std::string_view output_file) {
    int error_code{};

    // find AVCodec, default aac encoder
    const AVCodec *codec = avcodec_find_encoder_by_name(codec_name.data());
    if (codec == nullptr) {
        LOG(WARNING) << "AVCodec \"" << codec_name << "\" not found, use to use aac";
        codec_name = "aac";
        codec = avcodec_find_encoder(AV_CODEC_ID_AAC);
        if (!codec) {
            LOG(ERROR) << "AVCodec \"aac\" not found";
            return;
        }
    }
    LOG(INFO) << "AVCodec \"" << codec_name << "\" found";

    // open input_file and output_file
    std::ifstream ifs(input_file.data(), std::ios::in | std::ios::binary);
    if (!ifs.is_open()) {
        LOG(ERROR) << "Failed to open input file: " << input_file;
        return;
    }
    std::ofstream ofs(output_file.data(), std::ios::out | std::ios::binary);
    if (!ofs.is_open()) {
        LOG(ERROR) << "Failed to open output file: " << output_file;
        return;
    }

    // allocate AVCodecContext
    AVCodecContext *codec_ctx = avcodec_alloc_context3(codec);
    if (codec_ctx == nullptr) {
        LOG(ERROR) << "Failed to allocate AVCodecContext: " << codec->id;
        return;
    }

    // initialize AVCodecContext
    if (!SetSampleFormat(codec, codec_ctx, sample_fmt) || !SetSampleRate(codec, codec_ctx, sample_rate)) {
        avcodec_free_context(&codec_ctx);
        return;
    }
    av_channel_layout_default(&codec_ctx->ch_layout, nb_channels);
    codec_ctx->bit_rate = bit_rate;
    codec_ctx->profile = kDefaultProfile;
    codec_ctx->flags = kDefaultCodecFlags; // no adts header

    // initialize AVCodec
    if ((error_code = avcodec_open2(codec_ctx, codec, nullptr)) < 0) {
        LOG(ERROR) << "Failed to init AVCodecContext: " << ErrorToString(error_code);
        avcodec_free_context(&codec_ctx);
        return;
    }
    LOG(INFO) << "AVCodec \"" << codec_name << "\" initialized: sample_fmt=\"" << av_get_sample_fmt_name(sample_fmt)
              << "\", sample_rate=" << sample_rate << ", nb_channels=" << nb_channels << ", bit_rate=" << bit_rate
              << ", frame_size=" << codec_ctx->frame_size;

    // allocate AVFrame
    AVFrame *frame = av_frame_alloc();
    if (frame == nullptr) {
        LOG(ERROR) << "Failed to allocate AVFrame: av_frame_alloc()";
        avcodec_free_context(&codec_ctx);
        return;
    }

    LOG(INFO) << "Start to encode audio";
    auto start = std::chrono::high_resolution_clock::now();
    InnerEncodeAudioAAC(codec_ctx, frame, ifs, ofs);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    LOG(INFO) << "End of encode audio, cost " << duration << " ms";

    av_frame_free(&frame);
    avcodec_free_context(&codec_ctx);
}

#if 0
int main(int argc, char *argv[]) {
    google::InitGoogleLogging(argv[0]);
    FLAGS_logtostderr = true;
    FLAGS_minloglevel = google::GLOG_INFO;

    EncodeAudioAAC(2, 48000, AV_SAMPLE_FMT_FLTP, 128 * 1024, "aac", "audio.pcm", "audio.aac");

    google::ShutdownGoogleLogging();
    return 0;
}
#endif