关于AAC 数据包前写入 ADTS 头的详解

前言

在音频处理领域中，AAC（Advanced Audio Coding）是一种常用的音频编码格式，因其高效的压缩性能和良好的音质广泛应用于各类多媒体文件中。通常情况下，AAC 音频流需要被封装在某种容器格式中，例如 MP4。然而，在某些情况下，尤其是当你希望从 MP4 文件中提取原始 AAC 音频数据并进行存储时，添加 ADTS（Audio Data Transport Stream）头部信息至每个 AAC 数据包变得至关重要。

ADTS 是 AAC 的一种封装格式，它允许解码器正确地解码数据包。本文将详细介绍如何在从 MP4 文件中提取的每个 AAC 数据包前添加 ADTS 头，并给出完整的代码示例。

什么是 ADTS 头？

ADTS 是一种 AAC 音频流封装格式，用于将 AAC 编码的音频数据打包为帧。这些帧可以在传输过程中独立传输，并由接收方正确解码。ADTS 头为解码器提供了帧的必要信息，如采样率、声道配置、数据大小等。

ADTS 头的格式如下：

Syncword (12 bits): 标志帧的开始，固定为 0xFFF。
ID (1 bit): 标记 MPEG 标准，0 表示 MPEG-4，1 表示 MPEG-2。
Layer (2 bits): 固定为 00。
Protection_absent (1 bit): 标记是否有 CRC 校验。
Profile (2 bits): 表示 AAC 的编解码器配置。
Sampling_frequency_index (4 bits): 表示采样率的索引值。
Private_bit (1 bit): 私有位，通常忽略。
Channel_configuration (3 bits): 表示声道配置。
Original/copy (1 bit): 标记原始音频还是拷贝音频。
Home (1 bit): 通常忽略。
Copyright_identification_bit (1 bit): 通常忽略。
Copyright_identification_start (1 bit): 通常忽略。
Aac_frame_length (13 bits): 包括 ADTS 头和 AAC 数据在内的帧总长度。
Adts_buffer_fullness (11 bits): 表示缓冲区的占用情况。
Number_of_raw_data_blocks_in_frame (2 bits): 表示当前帧包含的原始数据块数量。

从 MP4 中提取 AAC 并添加 ADTS 头

在提取 AAC 音频数据时，MP4 文件本身不会包含 ADTS 头。因此，为了将音频数据保存为独立的 .aac 文件并保持其可解码性，我们需要手动为每个音频数据包添加 ADTS 头。

代码实现

以下是实现从 MP4 文件中提取 AAC 数据并为每个数据包添加 ADTS 头的完整代码：

cpp 复制代码

#include <iostream>
#include <string>
#include <fstream>
extern "C"
{
#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include <libavutil/log.h>
#include <libavutil/error.h>
}

#define GET_DIR(file) (std::string(file)).substr(0, std::string(file).find_last_of("/\\"))

// 获取音频采样率索引
int getSampleRateIndex(int sample_rate) {
    switch (sample_rate) {
        case 96000: return 0;
        case 88200: return 1;
        case 64000: return 2;
        case 48000: return 3;
        case 44100: return 4;
        case 32000: return 5;
        case 24000: return 6;
        case 22050: return 7;
        case 16000: return 8;
        case 12000: return 9;
        case 11025: return 10;
        case 8000: return 11;
        case 7350: return 12;
        default: return 15; // reserved
    }
}

// ADTS header generator
void writeADTSHeader(std::ofstream &outFile, const AVCodecParameters *codecPar, const int size)
{
    uint8_t adts[7] = {0};

    int profile = codecPar->profile + 1; // AAC LC (Low Complexity) profile is 1
    int freqIdx = getSampleRateIndex(codecPar->sample_rate);
    int chanCfg = codecPar->ch_layout.nb_channels;

    adts[0] = 0xFF; // Syncword
    adts[1] = 0xF1; // Syncword, MPEG-2 Layer (0 for MPEG-4), protection absent
    adts[2] = (profile << 6) + (freqIdx << 2) + (chanCfg >> 2);
    adts[3] = ((chanCfg & 3) << 6) + ((size + 7) >> 11);
    adts[4] = ((size + 7) & 0x7FF) >> 3;
    adts[5] = (((size + 7) & 7) << 5) + 0x1F;
    adts[6] = 0xFC; // Number of raw data blocks in frame

    outFile.write((char *)adts, 7);
}

int main(int argc, char *argv[])
{
    AVFormatContext *fmt_ctx = nullptr;
    av_log_set_level(AV_LOG_INFO);

    std::string input_file = GET_DIR(__FILE__) + "/../Res/video.mp4";
    std::string output_file = GET_DIR(__FILE__) + "/video.aac";

    av_log(nullptr, AV_LOG_INFO, "%s\n", input_file.c_str());
    char buf[1024];

    int ret = avformat_open_input(&fmt_ctx, input_file.c_str(), nullptr, nullptr);
    if (ret < 0)
    {
        av_strerror(ret, buf, sizeof(buf));
        av_log(nullptr, AV_LOG_ERROR, "can't open input : %s\n", buf);
        return -1;
    }
    av_dump_format(fmt_ctx, 0, input_file.c_str(), 0);

    avformat_find_stream_info(fmt_ctx, nullptr);

    int audio_index = av_find_best_stream(fmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, nullptr, 0);
    if (audio_index < 0)
    {
        av_log(nullptr, AV_LOG_ERROR, "can't find best stream");
        avformat_close_input(&fmt_ctx);
        return -1;
    }

    AVPacket *packet = av_packet_alloc();
    if (!packet)
    {
        av_log(nullptr, AV_LOG_ERROR, "Memory allocation failed");
        return -1;
    }

    const AVCodec *codec = avcodec_find_decoder(fmt_ctx->streams[audio_index]->codecpar->codec_id);
    if (!codec)
    {
        av_log(nullptr, AV_LOG_WARNING, "Decoder not found");
        return -1;
    }

    AVCodecContext *codecCtx = avcodec_alloc_context3(codec);
    avcodec_parameters_to_context(codecCtx, fmt_ctx->streams[audio_index]->codecpar);

    if (avcodec_open2(codecCtx, codec, nullptr) < 0)
    {
        av_log(nullptr, AV_LOG_ERROR, "Failed to open codec");
        return -1;
    }

    std::ofstream outputFile(output_file.c_str(), std::ofstream::binary);
    if (!outputFile.is_open())
    {
        av_log(nullptr, AV_LOG_ERROR, "Failed to open output file");
        return -1;
    }

    while (av_read_frame(fmt_ctx, packet) >= 0)
    {
        if (packet->stream_index == audio_index)
        {
            // Write ADTS header
            writeADTSHeader(outputFile, fmt_ctx->streams[audio_index]->codecpar, packet->size);
            outputFile.write((char *)packet->data, packet->size);
            
        }
        av_packet_unref(packet);
    }
    avformat_close_input(&fmt_ctx);
    av_packet_free(&packet);
    avcodec_free_context(&codecCtx);
    outputFile.close();
    return 0;
}

代码解析

获取音频采样率索引 ： ADTS 头的采样率使用索引来表示，不同的采样率对应不同的索引值。通过 getSampleRateIndex 函数，可以根据采样率返回相应的索引。
生成 ADTS 头 ： writeADTSHeader 函数用于生成 ADTS 头并写入到输出文件。ADTS 头包含了音频数据的各种信息，如编码配置、采样率、声道数以及当前帧的数据大小。
从 MP4 文件提取 AAC 数据 ：使用 FFmpeg 的 avformat_open_input 和 av_find_best_stream 函数打开 MP4 文件并找到音频流。然后通过 av_read_frame 函数逐帧读取音频数据。
写入 ADTS 头与 AAC 数据：每读取一帧音频数据，就为其添加 ADTS 头，并将其写入到输出文件中。输出文件将保存为带有 ADTS 头的 AAC 格式文件，可直接播放或进一步处理。

结语

通过上述步骤，你可以轻松从 MP4 文件中提取原始的 AAC 数据并为其添加 ADTS 头，确保其可解码性。这种方法不仅能够保留音频数据的原始质量，还能确保在各种播放器和解码器中正常播放。

了解并掌握如何手动处理 ADTS 头，对于音频开发人员以及多媒体应用开发者来说，具有重要意义。