音视频入门基础：AAC专题（11）——AudioSpecificConfig简介

一、引言

MPEG-4包括一个以统一方式处理不同音频格式组的系统。每种格式都用一个唯一的音频对象类型（Audio Object Type，简称AOT）来表示。所有Audio Object Type共享的通用格式全局header称为音频特定配置（Audio Specific Config）。简单来讲Audio Specific Config是MPEG-4音频的全局header，该header包含了音频编码器的重要信息，比如编码器类别，音频频率，音频通道数等。比如，如果FLV文件中的音频为AAC格式，那正常情况下它必定存在一个Audio Tag包含Audio Specific Config。

二、AudioSpecificConfig

《ISO14496-3-2009.pdf》第52页到第55页定义了AudioSpecificConfig，其包含的属性如下：

其中：

audioObjectType：音频对象类型，可以理解为音频压缩编码格式。该属性定义在《ISO14496-3-2009.pdf》第35页，其取值如下：

即：

0: Null

1: AAC Main

2: AAC LC (Low Complexity)

3: AAC SSR (Scalable Sample Rate)

4: AAC LTP (Long Term Prediction)

5: SBR (Spectral Band Replication)

6: AAC Scalable

7: TwinVQ

8: CELP (Code Excited Linear Prediction)

9: HXVC (Harmonic Vector eXcitation Coding)

10: Reserved

11: Reserved

12: TTSI (Text-To-Speech Interface)

13: Main Synthesis

14: Wavetable Synthesis

15: General MIDI

16: Algorithmic Synthesis and Audio Effects

17: ER (Error Resilient) AAC LC

18: Reserved

19: ER AAC LTP

20: ER AAC Scalable

21: ER TwinVQ

22: ER BSAC (Bit-Sliced Arithmetic Coding)

23: ER AAC LD (Low Delay)

24: ER CELP

25: ER HVXC

26: ER HILN (Harmonic and Individual Lines plus Noise)

27: ER Parametric

28: SSC (SinuSoidal Coding)

29: PS (Parametric Stereo)

30: MPEG Surround

31: (Escape value)

32: Layer-1

33: Layer-2

34: Layer-3

35: DST (Direct Stream Transfer)

36: ALS (Audio Lossless)

37: SLS (Scalable LosslesS)

38: SLS non-core

39: ER AAC ELD (Enhanced Low Delay)

40: SMR (Symbolic Music Representation) Simple

41: SMR Main

42: USAC (Unified Speech and Audio Coding) (no SBR)

43: SAOC (Spatial Audio Object Coding)

44: LD MPEG Surround

45: USAC

根据《ISO14496-3-2009.pdf》第55页，audioObjectType占5位或11位。如果其前5位的值为0到30，audioObjectType总共只占5位（8位等于1个字节）；如果前5位值为31（即0b11111），audioObjectType总共占11位，其前5位后面还需要加上6位（32 + audioObjectTypeExt）来进行扩展，audioObjectTypeExt的值为audioObjectType减去32。比如音频对象类型为DST (Direct Stream Transfer)，根据上面的表格，audioObjectType的值为35，35超过30，所以audioObjectType的前5位应为0b11111，后6位应为：35 - 32 = 3，也就是0b000011，所以audioObjectType的实际二进制存贮最终为：0b11111000011：

samplingFrequencyIndex：占4位。根据《ISO14496-3-2009.pdf》第59页，samplingFrequencyIndex表示音频的采样频率：

根据《ISO14496-3-2009.pdf》第52页和第59页，如果samplingFrequencyIndex的值为15（0x0F），在samplingFrequencyIndex属性后面还需增加24位（3字节）的samplingFrequency属性，实际音频采样率直接由samplingFrequency的值表示：

channelConfiguration：占4位。根据《ISO14496-3-2009.pdf》第60页。channel_configuration表示音频声道数。比如channel_configuration值为1表示是单声道（center front speaker）；值为2表示是双声道（left, right front speakers）；值为3：三声道（center, left, right front speakers）；值为4：四声道（center, left, right front speakers, rear surround speakers）；值为5：五声道（center, left, right front speakers, left surround, right surround rear speakers）；值为6： 5.1声道（center, left, right front speakers, left surround, right surround rear speakers, front low frequency effects speaker)；值为7：7.1声道（center, left, right center front speakers, left, right outside front speakers, left surround, right surround rear speakers, front low frequency effects speaker)；值为8到15：保留：

三、AudioSpecificConfig实例分析

按照《音视频入门基础：FLV专题（4）------使用flvAnalyser工具分析FLV文件》中介绍的方法，通过flvAnalyser工具打开一个音频压缩编码格式为AAC的FLV文件，分析其某个包含AudioSpecificConfig的Audio Tag。下面红框中所示的就是AudioSpecificConfig，可以看到AudioSpecificConfig为0x12 0x10，也就是二进制的0b0001001000010000：

audioObjectType：0b0001001000010000的前5位为：0b00010，也就是十进制的2。所以音频压缩编码格式为：AAC LC (Low Complexity)：

samplingFrequencyIndex：0b0100，也就是十进制的4。所以音频采样频率为44100Hz：

channelConfiguration：0b0010，也就是十进制的2。所以是双声道：

四、参考文章

《维基百科------MPEG-4 Audio》