FFmpeg学习（五）：音视频数据转换

0. 前言

在音视频开发过程中，我们经常会碰到这样的场景，如解码得到的视频帧是YUV420P格式，而某一类处理算法输入是RGBA格式，又或者某个媒体文件音频是四声道，但是播放设备只支持双声道输出。对此，往往需要将原数据进行转换，使得其符合后续处理流程的输入。今天我们主要介绍FFmpeg中对于音视频帧进行数据转换的方法。

1. 相关介绍

1.1 libswscale

libswscale是ffmpeg提供的专门用于图像数据转换的库，它主要支持两类功能：

（1）色彩空间（或格式）的转换，如RGBA、YUV420P的图像转换。

（2）图像大小的转换，并且支持不同的缩放采样/插值方式

使用libswscale主要依赖于一个上下文结构体SwsContext，这个结构体的具体成员FFmpeg并没有在API中开放，内部实现比较复杂，需要在FFmpeg的源码中才能查看到具体细节。不过我们可以从他的创建方法中简单了解到其包含的一些信息。

C++ 复制代码

// 常用的创建方法
struct SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
                                  int dstW, int dstH, enum AVPixelFormat dstFormat,
                                  int flags, SwsFilter *srcFilter,
                                  SwsFilter *dstFilter, const double *param);

// 包含的一些信息
struct SwsContext {
    // ...
    int srcW;   // 源图像的宽度
    int srcH;   // 源图像的高度
    int dstH;   // 目标图像的高度
    int dstW;   // 目标图像的宽度
    enum AVPixelFormat dstFormat;  // 目标图像的格式
    enum AVPixelFormat srcFormat;  // 源图像的格式
    // ...
}

1.2 libswresample

libswresample是FFmpeg中专门为音频PCM数据做转换的库。它对音频数据支持以下几个维度的转换：（1）采样位数：如将一个采样点从16位(2字节)转换为32位(4字节)。

（2）声音布局/声道数：如从立体声（双声道）转换为单声道。

（3）采样点的大小端：如将PCM的采样点数据从大端转换成小端。

（4）采样点的数据类型：如从有符号数signed转换成无符号数unsigned，从整型转换成浮点型。

（5）声音的采样率：如从48000Hz转换到16000Hz。

libswresample库的声音转换方法主要依赖SwrContext这个数据结构，它也一样并不向外曝露具体的结构信息，在构造方法中需要我们设置输入和输出音频的信息。

C++ 复制代码

struct SwrContext {
    ...
    enum AVSampleFormat  in_sample_fmt;  // 输入的采样点格式
    enum AVSampleFormat out_sample_fmt;  // 输出的采样点格式
    AVChannelLayout  in_ch_layout;       // 输入的声音布局
    AVChannelLayout out_ch_layout;       // 输出的声音布局
    int      in_sample_rate;             // 输入的采样率
    int     out_sample_rate;             // 输出的采样率
    ...
}

2. 相关接口

2.1 图像转换

C++ 复制代码

// 创建SwsContext结构体
// srcW : 原始图像的宽度
// srcH : 原始图像的高度
// srcFormat : 原始图像的格式
// dstW : 目标图像的宽度
// dstH : 目标图像的高度
// dstFormat : 目标图像的格式
// flags : 缩放时使用的采样/插值算法，如SWS_BILINEAR双线性插值，SWS_BICUBIC双三次插值
// srcFilter : 输入图像的滤波信息，不常用，可以设为NULL
// dstFilter : 输出图像的滤波信息，不常用，可以设为NULL
// param : 缩放时采样/插值算法需要的额外调节参数
struct SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
                                  int dstW, int dstH, enum AVPixelFormat dstFormat,
                                  int flags, SwsFilter *srcFilter,
                                  SwsFilter *dstFilter, const double *param);

// 使用SwsContext进行转换的方法
// c : SwsContext结构体，转换上下文
// srcSlice : 指向输入源图像的不同平面，对于AVFrame，这里是data字段
// srcStride : 指向每个平面数据的不同跨度，对于AVFrame，这里是linesize
// srcSliceY : 待处理图像切片的第一行的行号，如果需要从完整图像中的一部分开始，可以设置该参数，一般默认为0，用完整图像缩放
// srcSliceH : 待处理图像的切片的高度，类似srcSliceY，一般完整图像缩放，设置源图像高度则可。
// dst : 转换后的目标图像不同平面的数据，对于AVFrame，这里是data字段
// dstStride : 转换后的目标图像不同平面的跨度，对于AVFrame，这里是linesize
int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
              const int srcStride[], int srcSliceY, int srcSliceH,
              uint8_t *const dst[], const int dstStride[]);

2.2 音频转换

C++ 复制代码

// 创建SwrContext
// ps : 待设置的SwrContext指针
// out_ch_layout : 输出的声音布局
// out_sample_fmt : 输出的音频采样位数
// out_sample_rate : 输出的采样率
// in_ch_layout : 输入的声音布局
// in_sample_fmt : 输入的音频采样位数
// in_sample_rate : 输入的采样率
// log_offset, log_ctx : log相关，可以设置0和NULL
int swr_alloc_set_opts2(struct SwrContext **ps,
                        const AVChannelLayout *out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
                        const AVChannelLayout *in_ch_layout, enum AVSampleFormat  in_sample_fmt, int  in_sample_rate,
                        int log_offset, void *log_ctx);

// 初始化SwrContext
int swr_init(struct SwrContext *s);

// 使用SwrContext对数据进行转换的方法
// s : 已经创建并初始化的SwrContext
// out : 输出buffer，uint8_t **, 二维指针，对于packet模式的数据，这里只处理out[0]，对于planar格式的数据，需要设置多个。如果设置AVFrame，则这里对应的AVFrame::data
// out_count : 输出处理的采样点数量，对应的AVFrame中的nb_samples
// in : 输入的buffer，类似out，当设置为NULL时，会冲刷(flush)SwrContext中剩余的几帧
// in_count : 输入的采样点数量，当设置为0时，flush对应的SwrContext
int swr_convert(struct SwrContext *s, uint8_t **out, int out_count,
                                const uint8_t **in , int in_count);

// 对AVFrame进行转换
int swr_convert_frame(SwrContext *swr,
                      AVFrame *output, const AVFrame *input);

3. 使用流程

3.1 图像转换

图像转换的流程如下图所示，整体并不复杂，主要就是创建好SwsContext，并使用sws_scale()对图像进行转换。

3.2 音频转换

音频转换的流程如下图所示，需要注意的是最后需要调用swr_convert(swrCtx, &outBuf, OUT_SAMPLES_PER_CHANNEL, NULL, 0);，设置输入为空，来冲刷SwrContext中的缓存的帧。

4. 示例

4.1 图像转换

一如既往的，图像转换可以参考MyFFmpegDemo中10_SWScale文件夹下的例子。这个例子中，我们创建了一个nv12toRGBA()方法来实现nv12图像到rgba图像的转换。

C++ 复制代码

#include "swscale.h"

extern "C" {
#include "libavutil/log.h"
#include "libswscale/swscale.h"
#include "libavutil/frame.h"
}

const int FRAME_WIDTH = 640;
const int FRAME_HEIGHT = 360;

// nv12 可以用这条命令播放
// ffplay -pixel_format nv12 -f rawvideo -video_size 640x360 test_pic_640x360.nv12
void readNv12ToAVFrame(AVFrame* &frame, std::string srcNv12) {
    // 申请Frame的存储空间
    frame->width = FRAME_WIDTH;
    frame->height= FRAME_HEIGHT;
    frame->format= AV_PIX_FMT_NV12;

    auto ret = av_frame_get_buffer(frame, 0);
    if(ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "allocate frame buffer failed\n");
        return;
    }

    ret = av_frame_make_writable(frame);
    if(ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "av frame make writable failed\n");
        return;
    }
    
    FILE *file = fopen(srcNv12.c_str(), "rb"); // 以二进制模式打开文件
    if (file == NULL) {
        av_log(NULL, AV_LOG_ERROR, "Failed to open file");
        return;
    }

    fseek(file, 0, SEEK_END);
    int length = ftell(file);
    fseek(file, 0, SEEK_SET);

    size_t bytes_read = 0;

    // Y分量
    for(int i = 0; i < FRAME_HEIGHT; ++i) {
        fread(frame->data[0] + i * frame->linesize[0], frame->width, 1, file);
        bytes_read += frame->width;
    }
    // UV分量
    for(int i = 0; i < FRAME_HEIGHT / 2; ++i) {
        fread(frame->data[1] + i * frame->linesize[1], frame->width, 1, file);
        bytes_read += frame->width;
    }

    if (bytes_read != length) {
        av_log(NULL, AV_LOG_ERROR, "Failed to read file\n");
        fclose(file);
        return;
    }

    fclose(file);
}

// ffplay -pixel_format rgba -f rawvideo -video_size 1280x720 test_pic_1280x640.rgba
void writeRGBAToFile(const AVFrame *frame, std::string dstRGBA) {
    FILE *file = fopen(dstRGBA.c_str(), "wb");
    if (file == NULL) {
        av_log(NULL, AV_LOG_ERROR, "Failed to open rgba file\n");
        return;
    }

    size_t bytes_write = 0;
    for(int i = 0; i < frame->height; ++i) {
        fwrite(frame->data[0] + i * frame->linesize[0], frame->width * 4, 1, file);
        bytes_write += frame->width * 4;
    }

    av_log(NULL, AV_LOG_INFO, "write data : %d bytes\n", bytes_write);

    fclose(file);
}

// 将nv12的图片转换成rgba格式
void nv12toRGBA(std::string dst, std::string src) {
    SwsContext *swsCtx = sws_getContext(FRAME_WIDTH,      // 原始图像的宽度
                                        FRAME_HEIGHT,     // 原始图像的高度
                                        AV_PIX_FMT_NV12,  // 原始图像的格式
                                        FRAME_WIDTH * 2,  // 目标图像的宽度，这里放大2倍
                                        FRAME_HEIGHT * 2, // 目标图像的高度，放大2倍
                                        AV_PIX_FMT_RGBA,  // 目标图像格式
                                        SWS_BILINEAR,     // 缩放时使用的算法
                                        NULL, NULL, NULL);// 输入图像的滤波信息，输出图像的滤波信息，缩放算法调节参数
    if(swsCtx == nullptr) {
        av_log(NULL, AV_LOG_ERROR, "create sws context failed\n");
    }

    // 申请元数据NV12帧的空间，会从文件中读取数据
    AVFrame *srcFrame = av_frame_alloc();
    if(srcFrame == nullptr) {
        av_log(NULL, AV_LOG_ERROR, "allocate the src frame failed\n");
        return;
    }
    readNv12ToAVFrame(srcFrame, src);

    // 申请待转换的RGBA帧的空间
    AVFrame *dstFrame = av_frame_alloc();
    if(dstFrame == nullptr) {
        av_log(NULL, AV_LOG_ERROR, "allocate the dst frame failed\n");
        return;
    }
    dstFrame->width = 2 * FRAME_WIDTH;
    dstFrame->height = 2 * FRAME_HEIGHT;
    dstFrame->format = AV_PIX_FMT_RGBA;
    auto ret = av_frame_get_buffer(dstFrame, 0);
    if(ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "allocate frame buffer failed\n");
        return;
    }

    // 转换
    sws_scale(swsCtx,
              srcFrame->data,
              srcFrame->linesize,
              0,
              srcFrame->height,
              dstFrame->data,
              dstFrame->linesize);

    // 将rgba数据写入文件中
    writeRGBAToFile(dstFrame, dst);

    av_frame_free(&srcFrame);
    av_frame_free(&dstFrame);
    sws_freeContext(swsCtx);
}

4.2 音频转换

音频转换可以参考MyFFmpegDemo中10_SWScale文件夹下的例子。这个例子中，我们将一个s16le，48000Hz，双声道的PCM数据转换成s16le，16000Hz，单声道。

C++ 复制代码

#include "swresample.h"
#include "stdio.h"
extern "C" {
#include "libswresample/swresample.h"
#include "libavutil/log.h"
}

const int IN_CHANNEL_NB = 2;
const int OUT_CHANNEL_NB = 1;
const int IN_SAMPLE_RATE = 48000;
const int OUT_SAMPLE_RATE = 16000;
const int IN_SAMPLE_SIZE = 2;
const int OUT_SAMPLE_SIZE = 2;

const int IN_SAMPLES_PER_CHANNEL = 1024;
const int OUT_SAMPLES_PER_CHANNEL = 1024;

// 输入播放 : ffplay -ar 48000 -ac 2 -f s16le -i test_audio.pcm
// 输出播放 : ffplay -ar 16000 -ac 1 -f s16le -i output.pcm
void resample(const std::string &dst, const std::string &src) {
    SwrContext *swrCtx = nullptr;
    AVChannelLayout out_ch = AV_CHANNEL_LAYOUT_MONO;
    AVChannelLayout in_ch = AV_CHANNEL_LAYOUT_STEREO;
    int ret = swr_alloc_set_opts2(&swrCtx,
                                  &out_ch,              // 输出的声音布局
                                  AV_SAMPLE_FMT_S16,    // 输出的音频采样位数
                                  OUT_SAMPLE_RATE,      // 输出的采样率
                                  &in_ch,               // 输入的声音布局
                                  AV_SAMPLE_FMT_S16,    // 输入的音频采样位数
                                  IN_SAMPLE_RATE,       // 输入的采样率
                                  0, NULL);             // 日志相关
    if(ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "allocate the swrcontext failed\n");
        return;
    }

    ret = swr_init(swrCtx);
    if(ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "init swrctx failed\n");
        return;
    }

    FILE *srcFile = fopen(src.c_str(), "rb");
    FILE *dstFile = fopen(dst.c_str(), "wb");
    if(srcFile == nullptr || dstFile == nullptr) {
        av_log(NULL, AV_LOG_ERROR, "open file failed\n");
        return;
    }

    // 以下是计算buffer的大小，我们每次从输入中读取1024个采样点（每条声道）-------------
    // 每次读取1024个采样点
    // 一共是 1024 * 2(ch) * 2(bytes) = 4096bytes 
    const int IN_BUF_LEN = IN_SAMPLES_PER_CHANNEL * IN_SAMPLE_SIZE * IN_CHANNEL_NB;
    uint8_t *inBuf = (uint8_t *)malloc(IN_BUF_LEN);

    // 每次最多输出1024个采样点（因为循环中输入也是每次只取1024个）
    // 一共是 1024 * 1(ch) * 2(bytes) = 2048bytes
    const int OUT_BUF_LEN = OUT_SAMPLES_PER_CHANNEL * OUT_SAMPLE_SIZE * OUT_CHANNEL_NB;
    uint8_t *outBuf = (uint8_t *)malloc(OUT_BUF_LEN);
    // ------------------------------------------------------------------------

    int bytes_all_read = 0;
    int bytes_all_write = 0;
    // 循环，每次读取一定的采样点数做转换
    while(true) {
        // 读取完文件，退出，实际上SwrCtx中可能还有部分缓存的数据没输出
        if(feof(srcFile) != 0) {
            break;
        }
        // 每次最多读取IN_BUF_LEN大小，即最多4096个字节
        // 实际不一定是完整的4096个字节，每次实际读取的大小用bytes_read
        int bytes_read = fread(inBuf, 1, IN_BUF_LEN, srcFile);
        bytes_all_read += bytes_read;
        // 每次实际读取的采样点数，一般是4096 / 2 / 2 = 1024个
        // 在文件尾部时，可能读取的数量不满足4096的倍数
        int read_samples_per_channel = bytes_read / IN_CHANNEL_NB / IN_SAMPLE_SIZE;

        const uint8_t *p = inBuf;
        int convert_sampels = swr_convert(swrCtx, &outBuf, OUT_SAMPLES_PER_CHANNEL, &p, read_samples_per_channel);
        if(convert_sampels > 0) {
            int write_size = convert_sampels * OUT_CHANNEL_NB * OUT_SAMPLE_SIZE;
            int bytes_write = fwrite(outBuf, 1, write_size, dstFile);
            bytes_all_write += bytes_write;
        }
    }

    // flush, 为swr_convert的input设置NULL和0
    int convert_sampels = swr_convert(swrCtx, &outBuf, OUT_SAMPLES_PER_CHANNEL, NULL, 0);
    if(convert_sampels > 0) {
        int write_size = convert_sampels * OUT_CHANNEL_NB * OUT_SAMPLE_SIZE;
        int bytes_write = fwrite(outBuf, 1, write_size, dstFile);
        bytes_all_write += bytes_write;
    }

    av_log(NULL, AV_LOG_INFO, "bytes read : %d, bytes write : %d\n", bytes_all_read, bytes_all_write);

    swr_free(&swrCtx);
    free(inBuf);
    free(outBuf);

    return;
}

FFmpeg学习（五）：音视频数据转换

0. 前言

1. 相关介绍

1.1 libswscale

1.2 libswresample

2. 相关接口

2.1 图像转换

2.2 音频转换

3. 使用流程

3.1 图像转换

3.2 音频转换

4. 示例

4.1 图像转换

4.2 音频转换

5. 参考资料