音视频开发—FFmpeg 音频重采样详解

音频重采样（audio resampling）是指改变音频信号的采样率的过程。采样率（sample rate）是指每秒钟采集的音频样本数，通常以赫兹（Hz）或每秒样本数（samples per second）表示。例如，CD音频的标准采样率是44.1kHz，表示每秒钟采集44,100个样本。

文章目录

- 什么时候会用到重采样
- 重采样的技术
- 使用FFmpeg命令行对MP3文件进行重采样
- - 参数说明
  - 更详细的命令
- [FFmpeg 对MP3文件重采样代码实现](#FFmpeg 对MP3文件重采样代码实现)

什么时候会用到重采样

设备兼容性 ：

不同的音频设备可能支持不同的采样率。例如，某些音频接口或播放设备可能只支持特定的采样率。在这种情况下，需要将音频文件重采样到兼容的采样率。
节省存储空间 ：

高采样率的音频文件占用更多的存储空间。如果需要节省存储空间或带宽，可以将音频文件重采样到较低的采样率。
音频质量调整 ：

有时候需要在高质量和低质量之间进行平衡。例如，音频工程师可能会将录音采样率设置得很高，以捕捉更多的细节，但在混音或发行时，可能会选择稍低的采样率以便于处理和分发。
多媒体项目 ：

在多媒体项目中，不同来源的音频文件可能具有不同的采样率。为了保持项目的一致性，通常需要对音频文件进行重采样，使它们具有相同的采样率。
音频处理和分析 ：

某些音频处理和分析工具对输入的采样率有特定的要求。在使用这些工具之前，可能需要对音频进行重采样。

重采样的技术

重采样涉及插值算法，例如：

线性插值 ：

是最简单的一种插值方法，但可能会引入较多的失真。
多项式插值（如三次样条插值） ：

能提供更高的精度，但计算复杂度较高。
窗函数插值（如Lanczos窗） ：

常用于高质量音频重采样，能够有效减少伪影和失真。

通过这些算法，重采样过程可以在不同的采样率之间平滑过渡，尽量保持原音频信号的质量。

使用FFmpeg命令行对MP3文件进行重采样

一个名为input.mp3的MP3文件，并且你想将其采样率更改为48kHz（48000Hz），可以使用以下命令：

复制代码

ffmpeg -i input.mp3 -ar 48000 output.mp3

参数说明

-i input.mp3：指定输入文件。
-ar 48000：指定新的采样率，这里是48000Hz。
output.mp3：指定输出文件名。

更详细的命令

如果你还想保留原始的比特率（bitrate）和声道数（channels），可以使用更多参数：

复制代码

ffmpeg -i input.mp3 -ar 48000 -ab 192k -ac 2 output.mp3

-ab 192k：指定音频比特率为192kbps。
-ac 2：指定音频通道数为2（立体声）。

FFmpeg 对MP3文件重采样代码实现

流程图

关键步骤：

对MP3文件进行解码

具体流程如下：

打开输入文件 ：调用 avformat_open_input 函数打开输入文件，并获得格式上下文 fmt_ctx。

读取流信息 ：调用 avformat_find_stream_info 函数读取文件的流信息，以便后续处理。

查找最佳音频流 ：调用 av_find_best_stream 函数查找输入文件中的最佳音频流，并返回其索引 audio_stream_index 和对应的解码器 codec。

分配解码器上下文 ：调用 avcodec_alloc_context3 函数为找到的解码器分配解码器上下文 codec_ctx。

将流的参数复制到解码器上下文 ：调用 avcodec_parameters_to_context 函数将输入文件中音频流的参数复制到解码器上下文中。

打开解码器 ：调用 avcodec_open2 函数打开解码器，准备解码音频数据。

代码实现：

复制代码

int initialize_decoder(const char *input_filename, AVFormatContext **fmt_ctx, AVCodecContext **codec_ctx, int *audio_stream_index)
{
    AVCodec *codec = NULL;

    if (avformat_open_input(fmt_ctx, input_filename, NULL, NULL) < 0)
    {
        fprintf(stderr, "Could not open input file\n");
        return -1;
    }

    if (avformat_find_stream_info(*fmt_ctx, NULL) < 0)
    {
        fprintf(stderr, "Could not find stream information\n");
        return -1;
    }

    *audio_stream_index = av_find_best_stream(*fmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, &codec, 0);
    if (*audio_stream_index < 0)
    {
        fprintf(stderr, "Could not find audio stream in input file\n");
        return -1;
    }

    *codec_ctx = avcodec_alloc_context3(codec);
    if (!(*codec_ctx))
    {
        fprintf(stderr, "Could not allocate audio codec context\n");
        return -1;
    }

    if (avcodec_parameters_to_context(*codec_ctx, (*fmt_ctx)->streams[*audio_stream_index]->codecpar) < 0)
    {
        fprintf(stderr, "Could not copy codec parameters to codec context\n");
        return -1;
    }

    if (avcodec_open2(*codec_ctx, codec, NULL) < 0)
    {
        fprintf(stderr, "Could not open codec\n");
        return -1;
    }


    return 0; // 成功
}

初始化重采样参数

具体流程如下

代码实现

复制代码

int initialize_resampler(SwrContext **swr_ctx, AVCodecContext *codec_ctx, AVFrame **swr_frame)
{
    *swr_ctx = swr_alloc_set_opts(NULL,                      // ctx
                                  AV_CH_LAYOUT_STEREO,       // 输出的channel 的布局
                                  AV_SAMPLE_FMT_FLTP,        // 输出的采样格式
                                  48000,                     // 输出的采样率
                                  codec_ctx->channel_layout, // 输入的channel布局
                                  codec_ctx->sample_fmt,     // 输入的采样格式
                                  codec_ctx->sample_rate,    // 输入的采样率
                                  0, NULL);
    if (!(*swr_ctx))
    {
        fprintf(stderr, "Could not allocate resampler context\n");
        return -1;
    }
    if (swr_init(*swr_ctx) < 0)
    {
        fprintf(stderr, "Could not initialize the resampling context\n");
        swr_free(swr_ctx);
        return -1;
    }
    *swr_frame = av_frame_alloc();
    if (!(*swr_frame))
    {
        fprintf(stderr, "Could not allocate resampled frame\n");
        swr_free(swr_ctx);
        return -1;
    }

    return 0; // 成功
}

初始化编码器

大体流程如下：

设置输出文件格式和路径 ：调用 av_guess_format 函数猜测输出文件的格式，并根据输出文件名设置格式。

分配输出格式的上下文 ：调用 avformat_alloc_output_context2 函数为输出文件分配格式上下文。

创建输出流 ：调用 avformat_new_stream 函数为输出文件创建一个新的音频流。

查找编码器 ：调用 avcodec_find_encoder 函数查找MP3编码器。

创建编码器上下文 ：调用 avcodec_alloc_context3 函数为编码器分配上下文。

设置编码器参数：设置编码器的比特率、采样格式、采样率和声道布局等参数。

打开编码器 ：调用 avcodec_open2 函数打开编码器。

给输出流设置编码器参数 ：调用 avcodec_parameters_from_context 函数将编码器的参数复制到输出流。

打开输出文件 ：如果输出格式不支持直接输出到文件，则调用 avio_open 函数打开输出文件。

写入文件头部 ：调用 avformat_write_header 函数写入输出文件的头部信息。

初始化输出数据包 ：调用 av_packet_alloc 函数初始化一个数据包，用于存储编码后的音频数据。

代码实现

复制代码

int initialize_encoder(const char *output_filename, AVFormatContext **out_fmt_ctx, AVCodecContext **enc_ctx, AVStream **out_stream,AVPacket **outpack)
{
    AVOutputFormat *out_fmt = NULL;
    AVCodec *encoder = NULL;
    AVCodecContext *encoder_ctx = NULL;
    AVStream *stream = NULL;
    // 设置输出文件的格式与路径
    out_fmt = av_guess_format(NULL, output_filename, NULL);
    if (!out_fmt)
    {
        fprintf(stderr, "could not guess file format\n");
        return -1;
    }
    // 打开输出格式的上下文
    if (avformat_alloc_output_context2(out_fmt_ctx, out_fmt, NULL, output_filename) < 0)
    {
        fprintf(stderr, "could not create output context\n");
        return -1;
    }
    // 创建输出流
    stream = avformat_new_stream(*out_fmt_ctx, NULL);
    if (!stream)
    {
        fprintf(stderr, "could not create output stream\n");
        return -1;
    }
    // 查找编码器
    encoder = avcodec_find_encoder(AV_CODEC_ID_MP3);
    if (!encoder)
    {
        fprintf(stderr, "Codec not found\n");
        return -1;
    }
    // 创建编码器上下文
    encoder_ctx = avcodec_alloc_context3(encoder);
    if (!encoder_ctx)
    {
        fprintf(stderr, "Could not allocate audio codec context\n");
        return -1;
    }

    // 设置编码器参数
    encoder_ctx->bit_rate = 192000;                    // 比特率为192kbps
    encoder_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;      // 采样格式
    encoder_ctx->sample_rate = 48000;                  // 采样率
    encoder_ctx->channel_layout = AV_CH_LAYOUT_STEREO; // 双声道布局
    encoder_ctx->channels = 2;

    // 打开编码器
    if (avcodec_open2(encoder_ctx, encoder, NULL) < 0)
    {
        fprintf(stderr, "Could not open codec\n");
        avcodec_free_context(&encoder_ctx);
        return -1;
    }
    // 给输出流设置编码器
    if (avcodec_parameters_from_context(stream->codecpar, encoder_ctx) < 0)
    {
        fprintf(stderr, "Failed to copy encoder parameters to output stream\n");
        avcodec_free_context(&encoder_ctx);
        return -1;
    }
    // 打开输出文件
    if (!((*out_fmt_ctx)->oformat->flags & AVFMT_NOFILE))
    {
        if (avio_open(&(*out_fmt_ctx)->pb, output_filename, AVIO_FLAG_WRITE) < 0)
        {
            fprintf(stderr, "Could not open output file\n");
            return -1;
        }
    }
    // 写入文件头部
    if (avformat_write_header((*out_fmt_ctx), NULL) < 0)
    {
        fprintf(stderr, "Error occurred when opening output file\n");
        return -1;
    }
    *enc_ctx = encoder_ctx;
    *out_stream = stream;
    //初始化输出的数据包
    *outpack = av_packet_alloc();
    return 0; // 成功
}

音频帧转换

需要转换的帧要与编码器的参数注意对应，否则编码器无法编码

重点是swr_convert_frame

swr_convert_frame 是 FFmpeg 中用于将一个音频帧从一种格式转换为另一种格式的函数。它使用 SwrContext（重采样上下文）来执行转换，包括采样率、通道布局和采样格式的转换。

函数定义

复制代码

int swr_convert_frame(SwrContext *swr_ctx, AVFrame *out_frame, const AVFrame *in_frame);

参数说明

swr_ctx：指向 SwrContext 的指针，包含重采样所需的上下文信息。这个上下文在之前需要通过 swr_alloc_set_opts 和 swr_init 等函数进行初始化。
out_frame：指向目标 AVFrame 的指针，存储转换后的音频数据。这个 AVFrame 需要预先分配并设置好其参数，例如采样率、通道布局和采样格式等。
in_frame：指向源 AVFrame 的指针，包含需要转换的音频数据。

返回值

函数返回一个整数，成功时返回0，失败时返回负数。

使用步骤

初始化 SwrContext ：使用 swr_alloc_set_opts 和 swr_init 函数初始化重采样上下文，设置输入和输出的采样率、通道布局和采样格式等。
分配并设置 AVFrame ：分配并设置输入和输出 AVFrame，并确保 out_frame 的参数与 SwrContext 的输出参数一致。
调用 swr_convert_frame ：调用 swr_convert_frame 函数进行音频数据的转换。

处理转换后的音频数据 ：转换完成后，可以使用 out_frame 中的数据进行后续处理。

复制代码

 				// 确保有足够的缓冲区来容纳转换后的数据
                 swr_frame->channel_layout = AV_CH_LAYOUT_STEREO;
                 swr_frame->format = AV_SAMPLE_FMT_FLTP;
                 swr_frame->sample_rate = 48000;
                 swr_frame->nb_samples = encodec_ctx->frame_size;  //与编码器帧大小保持一致

                 // 分配缓冲区
                 if (av_frame_get_buffer(swr_frame, 0) < 0)
                 {
                     fprintf(stderr, "Could not allocate output frame samples\n");
                     av_frame_free(&swr_frame);
                     return -1;
                 }

                 // 执行重采样
                 if (swr_convert_frame(swr_ctx, swr_frame, frame) < 0)
                 {
                     fprintf(stderr, "Error while converting\n");
                     av_frame_free(&swr_frame);
                     return -1;
                 }

                 // 将重采样后的帧发送给编码器
                 if (avcodec_send_frame(encodec_ctx, swr_frame) == 0)
                 {
                     while (avcodec_receive_packet(encodec_ctx, out_packet) == 0)
                     {
                         // 正确设置数据包中的流索引
                         out_packet->stream_index = out_stream->index;

                         // 调整时间戳，使其基于输出流的时间基
                         av_packet_rescale_ts(out_packet, encodec_ctx->time_base, out_stream->time_base);

                         // 写入一个编码的数据包到输出文件
                         if (av_interleaved_write_frame(out_fmt_ctx, out_packet) < 0)
                         {
                             fprintf(stderr, "Error while writing output packet\n");
                             break;
                         }
                     }
                 }
                 av_frame_unref(swr_frame); // 清理帧数据以便重用

完整代码

复制代码

extern "C"
{
#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include <libavutil/channel_layout.h>
#include <libswresample/swresample.h>
}
#include <string>
#include <iostream>
using namespace std;

int initialize_encoder(const char *output_filename, AVFormatContext **out_fmt_ctx, AVCodecContext **enc_ctx, AVStream **out_stream,AVPacket **outpack)
{
    AVOutputFormat *out_fmt = NULL;
    AVCodec *encoder = NULL;
    AVCodecContext *encoder_ctx = NULL;
    AVStream *stream = NULL;

    // 设置输出文件的格式与路径
    out_fmt = av_guess_format(NULL, output_filename, NULL);
    if (!out_fmt)
    {
        fprintf(stderr, "could not guess file format\n");
        return -1;
    }

    // 打开输出格式的上下文
    if (avformat_alloc_output_context2(out_fmt_ctx, out_fmt, NULL, output_filename) < 0)
    {
        fprintf(stderr, "could not create output context\n");
        return -1;
    }

    // 创建输出流
    stream = avformat_new_stream(*out_fmt_ctx, NULL);
    if (!stream)
    {
        fprintf(stderr, "could not create output stream\n");
        return -1;
    }

    // 查找编码器
    encoder = avcodec_find_encoder(AV_CODEC_ID_MP3);
    if (!encoder)
    {
        fprintf(stderr, "Codec not found\n");
        return -1;
    }

    // 创建编码器上下文
    encoder_ctx = avcodec_alloc_context3(encoder);
    if (!encoder_ctx)
    {
        fprintf(stderr, "Could not allocate audio codec context\n");
        return -1;
    }

    // 设置编码器参数
    encoder_ctx->bit_rate = 192000;                    // 比特率为192kbps
    encoder_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;      // 采样格式
    encoder_ctx->sample_rate = 48000;                  // 采样率
    encoder_ctx->channel_layout = AV_CH_LAYOUT_STEREO; // 双声道布局
    encoder_ctx->channels = 2;

    // 打开编码器
    if (avcodec_open2(encoder_ctx, encoder, NULL) < 0)
    {
        fprintf(stderr, "Could not open codec\n");
        avcodec_free_context(&encoder_ctx);
        return -1;
    }

    // 给输出流设置编码器
    if (avcodec_parameters_from_context(stream->codecpar, encoder_ctx) < 0)
    {
        fprintf(stderr, "Failed to copy encoder parameters to output stream\n");
        avcodec_free_context(&encoder_ctx);
        return -1;
    }

    // 打开输出文件
    if (!((*out_fmt_ctx)->oformat->flags & AVFMT_NOFILE))
    {
        if (avio_open(&(*out_fmt_ctx)->pb, output_filename, AVIO_FLAG_WRITE) < 0)
        {
            fprintf(stderr, "Could not open output file\n");
            return -1;
        }
    }

    // 写入文件头部
    if (avformat_write_header((*out_fmt_ctx), NULL) < 0)
    {
        fprintf(stderr, "Error occurred when opening output file\n");
        return -1;
    }

    *enc_ctx = encoder_ctx;
    *out_stream = stream;
    //初始化输出的数据包
    *outpack = av_packet_alloc();
    return 0; // 成功
}


int initialize_decoder(const char *input_filename, AVFormatContext **fmt_ctx, AVCodecContext **codec_ctx, int *audio_stream_index)
{
    AVCodec *codec = NULL;

    if (avformat_open_input(fmt_ctx, input_filename, NULL, NULL) < 0)
    {
        fprintf(stderr, "Could not open input file\n");
        return -1;
    }

    if (avformat_find_stream_info(*fmt_ctx, NULL) < 0)
    {
        fprintf(stderr, "Could not find stream information\n");
        return -1;
    }

    *audio_stream_index = av_find_best_stream(*fmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, &codec, 0);
    if (*audio_stream_index < 0)
    {
        fprintf(stderr, "Could not find audio stream in input file\n");
        return -1;
    }

    *codec_ctx = avcodec_alloc_context3(codec);
    if (!(*codec_ctx))
    {
        fprintf(stderr, "Could not allocate audio codec context\n");
        return -1;
    }

    if (avcodec_parameters_to_context(*codec_ctx, (*fmt_ctx)->streams[*audio_stream_index]->codecpar) < 0)
    {
        fprintf(stderr, "Could not copy codec parameters to codec context\n");
        return -1;
    }

    if (avcodec_open2(*codec_ctx, codec, NULL) < 0)
    {
        fprintf(stderr, "Could not open codec\n");
        return -1;
    }


    return 0; // 成功
}

int initialize_resampler(SwrContext **swr_ctx, AVCodecContext *codec_ctx, AVFrame **swr_frame)
{
    *swr_ctx = swr_alloc_set_opts(NULL,                      // ctx
                                  AV_CH_LAYOUT_STEREO,       // 输出的channel 的布局
                                  AV_SAMPLE_FMT_FLTP,        // 输出的采样格式
                                  48000,                     // 输出的采样率
                                  codec_ctx->channel_layout, // 输入的channel布局
                                  codec_ctx->sample_fmt,     // 输入的采样格式
                                  codec_ctx->sample_rate,    // 输入的采样率
                                  0, NULL);
    if (!(*swr_ctx))
    {
        fprintf(stderr, "Could not allocate resampler context\n");
        return -1;
    }

    if (swr_init(*swr_ctx) < 0)
    {
        fprintf(stderr, "Could not initialize the resampling context\n");
        swr_free(swr_ctx);
        return -1;
    }

    *swr_frame = av_frame_alloc();
    if (!(*swr_frame))
    {
        fprintf(stderr, "Could not allocate resampled frame\n");
        swr_free(swr_ctx);
        return -1;
    }

    return 0; // 成功
}


int main()
{


    AVFormatContext *fmt_ctx = NULL;
    AVOutputFormat *out_fmt = NULL;      // 输出格式
    AVFormatContext *out_fmt_ctx = NULL; // 输出格式上下文
    AVCodecContext *codec_ctx = NULL;    // 解码器上下文
    AVCodec *codec = NULL;               // 解码器
    AVCodec *encodec = NULL;             // 编码器
    AVCodecContext *encodec_ctx = NULL;  // 编码器上下文
    AVPacket *packet;
    AVFrame *frame;
    AVStream *out_stream = NULL; // 输出流
    AVPacket *out_packet;
    int audio_stream_index;
    SwrContext *swr_ctx;
    AVFrame *swr_frame;
    int ret;

    ret = initialize_encoder("output.mp3", &out_fmt_ctx, &encodec_ctx, &out_stream,&out_packet);
    if (ret != 0)
    {
        fprintf(stderr, "init encode failed\n");
        return -1;
    }
 

     // 初始化解码器
    ret = initialize_decoder("test.mp3", &fmt_ctx, &codec_ctx, &audio_stream_index);
    if (ret != 0)
    {
        fprintf(stderr, "init decode failed\n");
        return -1;
    }

    packet = av_packet_alloc(); // 初始化数据包
    frame = av_frame_alloc();
   
     // 初始化重采样上下文
    ret = initialize_resampler(&swr_ctx, codec_ctx, &swr_frame);
    if (ret != 0)
    {
        fprintf(stderr, "init resampler failed\n");
        return -1;
    }
   
    // 解码 ---- 重采样  -----编码
    while (av_read_frame(fmt_ctx, packet) >= 0)
    {
        if (packet->stream_index == audio_stream_index)
        {
            if (avcodec_send_packet(codec_ctx, packet) == 0)
            {
                while (avcodec_receive_frame(codec_ctx, frame) == 0)
                {
                    // 确保有足够的缓冲区来容纳转换后的数据
                    swr_frame->channel_layout = AV_CH_LAYOUT_STEREO;
                    swr_frame->format = AV_SAMPLE_FMT_FLTP;
                    swr_frame->sample_rate = 48000;
                    swr_frame->nb_samples = encodec_ctx->frame_size;  //与编码器帧大小保持一致

                    // 分配缓冲区
                    if (av_frame_get_buffer(swr_frame, 0) < 0)
                    {
                        fprintf(stderr, "Could not allocate output frame samples\n");
                        av_frame_free(&swr_frame);
                        return -1;
                    }

                    // 执行重采样
                    if (swr_convert_frame(swr_ctx, swr_frame, frame) < 0)
                    {
                        fprintf(stderr, "Error while converting\n");
                        av_frame_free(&swr_frame);
                        return -1;
                    }

                    // 将重采样后的帧发送给编码器
                    if (avcodec_send_frame(encodec_ctx, swr_frame) == 0)
                    {
                        while (avcodec_receive_packet(encodec_ctx, out_packet) == 0)
                        {
                            // 正确设置数据包中的流索引
                            out_packet->stream_index = out_stream->index;

                            // 调整时间戳，使其基于输出流的时间基
                            av_packet_rescale_ts(out_packet, encodec_ctx->time_base, out_stream->time_base);

                            // 写入一个编码的数据包到输出文件
                            if (av_interleaved_write_frame(out_fmt_ctx, out_packet) < 0)
                            {
                                fprintf(stderr, "Error while writing output packet\n");
                                break;
                            }
                        }
                    }
                    av_frame_unref(swr_frame); // 清理帧数据以便重用
                }
            }
        }
        av_packet_unref(packet);
    }


// 收尾工作
    // 写入文件尾部信息
    if (av_write_trailer(out_fmt_ctx) < 0)
    {
        fprintf(stderr, "Error writing trailer of the output file\n");
    }

    // 关闭输出文件和释放输出上下文
    if (!(out_fmt_ctx->oformat->flags & AVFMT_NOFILE))
    {
        avio_closep(&out_fmt_ctx->pb);
    }
    avformat_free_context(out_fmt_ctx);
    // 其他资源清理
    av_packet_free(&out_packet);
    av_frame_free(&frame);
    av_frame_free(&swr_frame);
    av_packet_free(&packet);
    avcodec_close(codec_ctx);
    avcodec_close(encodec_ctx);
    avformat_close_input(&fmt_ctx);
    swr_free(&swr_ctx);

    return 0;
}

效果展示

ffmpeg 可以查看MP3文件的具体参数

命令：ffmpeg -i test.mp3

Stream #0:0：

Stream：表示这是一个流。
#0:0：表示这是第一个输入文件中的第一个流（通常第一个流是视频流，第二个流是音频流，依此类推）。

Audio: mp3：

Audio：表示这个流是一个音频流。
mp3：表示音频编码格式是 MP3（MPEG Audio Layer III）。

44100 Hz：

表示音频采样率为 44.1 kHz（44100 赫兹），即每秒钟采集 44100 次音频样本。

stereo：

表示音频是立体声（双声道），即音频信号包含两个独立的声道。

fltp：

表示音频采样格式是 float planar（浮点平面格式），即每个音频样本用浮点数表示，并且每个声道的数据存储在单独的平面中。

320 kb/s：

表示音频比特率为 320 kbps（千比特每秒），即每秒钟有 320000 比特的数据传输速率。

经过重采样之后,再次查看参数

音频采样率改为了 48 kHz（48000 赫兹），并且正常播放