webrtc neteq介绍

NetEq 是 WebRTC 中负责**音频抖动缓冲（Jitter Buffer）和丢包隐藏（Packet Loss Concealment, PLC）**的核心模块。它的主要任务是接收乱序、有延迟或丢失的 RTP 音频包，并输出平滑、连续的 PCM 音频数据供播放。

一，neteq 主要功能

**抖动缓冲（Jitter Buffer）**‌：动态调整缓冲区大小，根据网络包到达时间间隔（IAT）的统计（如95分位延迟）智能预测最优播放延迟，平滑网络抖动。
‌**丢包隐藏（PLC, Packet Loss Concealment）**‌：当数据包丢失时，通过插值、基音周期延拓（如WSOLA）、噪声填充或AR模型预测生成替代音频，避免"爆破音"或静音。
‌**时间伸缩（Time Stretching）**‌：在不改变音调前提下加速（<10ms）或减速（>30ms）播放已解码音频，用于消耗积压或补充缓冲区空缺，维持低延迟与连续性。
‌自适应决策控制‌：由 DelayManager（估算网络延迟）和 DecisionLogic（基于缓冲水位、丢包历史、前一操作状态）动态选择 Normal/Expand/Accelerate/PreemptiveExpand/Merge 等操作。
‌无缝拼接与舒适噪声‌：对PLC生成帧与正常帧间做交叉淡化（Merge），静音期生成背景舒适噪声（CN），提升听感。‌

二，neteq数据结构

2.1 neteq config

主用用于初始化neteq对象，为neteq实例化的相关参数。

• sample_rate_hz: 初始采样率（如 16000, 48000）。

• enable_post_decode_vad: 是否启用解码后的语音活动检测。

• max_packets_in_buffer: 缓冲区最大包数量。

• max_delay_ms / min_delay_ms: 允许的最大/最小缓冲延迟。

• enable_fast_accelerate: 是否允许快速加速播放以追赶延迟。

• enable_muted_state: 如果长时间处于丢包扩展状态，是否将输出标记为静音。

• enable_rtx_handling: 是否处理重传包（RTX）。

cpp 复制代码

 struct Config {
   Config();
   Config(const Config&);
   Config(Config&&);
   ~Config();
   Config& operator=(const Config&);
   Config& operator=(Config&&);

   std::string ToString() const;

   int sample_rate_hz = 16000;  // Initial value. Will change with input data.
   bool enable_post_decode_vad = false;
   size_t max_packets_in_buffer = 200;
   int max_delay_ms = 0;
   int min_delay_ms = 0;
   bool enable_fast_accelerate = false;
   bool enable_muted_state = false;
   bool enable_rtx_handling = false;
   absl::optional<AudioCodecPairId> codec_pair_id;
   bool for_test_no_time_stretching = false;  // Use only for testing.
 };

2.2 NetEqNetworkStatistics

周期性统计信息（每次调用 NetworkStatistics 后重置）。用于监控当前的网络状况和 NetEq 的行为。

• current_buffer_size_ms: 当前抖动缓冲区的大小（毫秒）。

• preferred_buffer_size_ms: 目标缓冲区大小。

• packet_loss_rate: 丢包率（包括网络丢失和迟到包）。

• expand_rate / speech_expand_rate: 通过丢包隐藏（扩展）合成的音频比例。

• accelerate_rate: 通过加速播放丢弃的音频比例（用于降低延迟）。

• preemptive_rate: 预防性扩展的比例（在预测到即将丢包时提前拉伸音频）。

• mean_waiting_time_ms: 数据包在缓冲区中的平均等待时间。

cpp 复制代码

struct NetEqNetworkStatistics {
  uint16_t current_buffer_size_ms;    // Current jitter buffer size in ms.
  uint16_t preferred_buffer_size_ms;  // Target buffer size in ms.
  uint16_t jitter_peaks_found;        // 1 if adding extra delay due to peaky
                                      // jitter; 0 otherwise.
  uint16_t packet_loss_rate;          // Loss rate (network + late) in Q14.
  uint16_t expand_rate;         // Fraction (of original stream) of synthesized
                                // audio inserted through expansion (in Q14).
  uint16_t speech_expand_rate;  // Fraction (of original stream) of synthesized
                                // speech inserted through expansion (in Q14).
  uint16_t preemptive_rate;     // Fraction of data inserted through pre-emptive
                                // expansion (in Q14).
  uint16_t accelerate_rate;     // Fraction of data removed through acceleration
                                // (in Q14).
  uint16_t secondary_decoded_rate;    // Fraction of data coming from FEC/RED
                                      // decoding (in Q14).
  uint16_t secondary_discarded_rate;  // Fraction of discarded FEC/RED data (in
                                      // Q14).
  size_t added_zero_samples;  // Number of zero samples added in "off" mode.
  // Statistics for packet waiting times, i.e., the time between a packet
  // arrives until it is decoded.
  int mean_waiting_time_ms;
  int median_waiting_time_ms;
  int min_waiting_time_ms;
  int max_waiting_time_ms;
};

2.3 NetEqLifetimeStatistics

生命周期统计信息（从创建开始累积，永不重置）。用于长期质量评估和上报给 getStats() API。

• total_samples_received: 接收到的总样本数。

• concealed_samples: 通过丢包隐藏合成的样本总数。

• jitter_buffer_delay_ms: 累积的抖动缓冲延迟。

• fec_packets_received/discarded: 接收/丢弃的前向纠错（FEC）包数量。

• interruption_count: 超过 150ms 的长时间中断次数。

cpp 复制代码

// NetEq statistics that persist over the lifetime of the class.
// These metrics are never reset.
struct NetEqLifetimeStatistics {
  // Stats below correspond to similarly-named fields in the WebRTC stats spec.
  // https://w3c.github.io/webrtc-stats/#dom-rtcmediastreamtrackstats
  uint64_t total_samples_received = 0;
  uint64_t concealed_samples = 0;
  uint64_t concealment_events = 0;
  uint64_t jitter_buffer_delay_ms = 0;
  uint64_t jitter_buffer_emitted_count = 0;
  uint64_t jitter_buffer_target_delay_ms = 0;
  uint64_t inserted_samples_for_deceleration = 0;
  uint64_t removed_samples_for_acceleration = 0;
  uint64_t silent_concealed_samples = 0;
  uint64_t fec_packets_received = 0;
  uint64_t fec_packets_discarded = 0;
  // Below stats are not part of the spec.
  uint64_t delayed_packet_outage_samples = 0;
  // This is sum of relative packet arrival delays of received packets so far.
  // Since end-to-end delay of a packet is difficult to measure and is not
  // necessarily useful for measuring jitter buffer performance, we report a
  // relative packet arrival delay. The relative packet arrival delay of a
  // packet is defined as the arrival delay compared to the first packet
  // received, given that it had zero delay. To avoid clock drift, the "first"
  // packet can be made dynamic.
  uint64_t relative_packet_arrival_delay_ms = 0;
  uint64_t jitter_buffer_packets_received = 0;
  // An interruption is a loss-concealment event lasting at least 150 ms. The
  // two stats below count the number os such events and the total duration of
  // these events.
  int32_t interruption_count = 0;
  int32_t total_interruption_duration_ms = 0;
};

2.4 NetEqOperationsAndState

内部操作状态统计。

• preemptive_samples / accelerate_samples: 累积的预防性扩展和加速样本数。

• packet_buffer_flushes: 缓冲区清空次数（通常发生在严重乱序或跳变时）。

• current_buffer_size_ms: 当前缓冲区大小。

• next_packet_available: 下一个包是否已就绪。

cpp 复制代码

// Metrics that describe the operations performed in NetEq, and the internal
// state.
struct NetEqOperationsAndState {
  // These sample counters are cumulative, and don't reset. As a reference, the
  // total number of output samples can be found in
  // NetEqLifetimeStatistics::total_samples_received.
  uint64_t preemptive_samples = 0;
  uint64_t accelerate_samples = 0;
  // Count of the number of buffer flushes.
  uint64_t packet_buffer_flushes = 0;
  // The number of primary packets that were discarded.
  uint64_t discarded_primary_packets = 0;
  // The statistics below are not cumulative.
  // The waiting time of the last decoded packet.
  uint64_t last_waiting_time_ms = 0;
  // The sum of the packet and jitter buffer size in ms.
  uint64_t current_buffer_size_ms = 0;
  // The current frame size in ms.
  uint64_t current_frame_size_ms = 0;
  // Flag to indicate that the next packet is available.
  bool next_packet_available = false;
};

三、核心接口

3.1 创建与销毁

1， static NetEq* Create(...):

工厂方法，创建一个新的 NetEq 实例。需要传入配置、时钟对象和解码器工厂。

2， virtual ~NetEq():

虚析构函数，释放资源。

3.2 数据输入 (Input)

1， virtual int InsertPacket(...):

• 核心入口。将一个 RTP 包（包含头部和 payload）插入 NetEq 的抖动缓冲区。

• NetEq 会根据序列号对包进行排序、去重，并处理乱序。

• 返回 kOK (0) 或 kFail (-1)。

2， virtual void InsertEmptyPacket(...):

• 插入一个空 payload 的包。通常用于带宽探测（Probe），让 NetEq 知道这些序列号已被占用，但不进行解码。

3.3 数据输出 (Output)

1， virtual int GetAudio(...):

• 核心出口。请求 NetEq 输出 10ms 的音频数据。

• audio_frame: 输出参数，填充 PCM 数据。

• muted: 输出参数，如果为 true，表示当前处于静音状态（长时间丢包后）。

• action_override: 测试用，强制 NetEq 执行特定操作（如强制扩展、加速等）。

• 内部逻辑：如果缓冲区有包，则解码；如果没包（丢包），则执行 PLC（丢包隐藏，如波形叠加或噪声生成）；如果延迟太高，则加速播放；如果延迟太低，则减速播放。

3.4 编解码器管理 (Codec Management)

1， virtual void SetCodecs(...):

• 批量替换当前的解码器映射表。

2， virtual bool RegisterPayloadType(...):

• 注册一个 RTP Payload Type 到具体的音频格式（如 OPUS, PCMU）。当收到该 PT 的包时，NetEq 会实例化对应的解码器。

3， virtual int RemovePayloadType(...):

• 移除指定的 Payload Type。

4， virtual void RemoveAllPayloadTypes():

• 清空所有注册的编解码器。

5， virtual absl::optional<DecoderFormat> GetDecoderFormat(...):

• 查询指定 Payload Type 对应的解码器格式信息（采样率、通道数等）

3.5 延迟控制 (Delay Control)

NetEq 的核心目标是平衡延迟和流畅度。

1， virtual bool SetMinimumDelay(int delay_ms):

• 设置最小缓冲延迟。NetEq 会尝试保持至少这么多延迟，以应对抖动。

2， virtual bool SetMaximumDelay(int delay_ms):

• 设置最大缓冲延迟。即使网络抖动很大，延迟也不会超过这个值（此时可能会增加丢包或加速播放）。

3， virtual bool SetBaseMinimumDelayMs(int delay_ms):

• 设置基础最小延迟。SetMinimumDelay 设置的值不能低于此值。

4， virtual int GetBaseMinimumDelayMs() const:

• 获取当前基础最小延迟。

5， virtual int TargetDelayMs() const:

• 获取当前目标延迟。这是 NetEq 根据网络状况动态计算出的理想延迟值。

6 virtual int FilteredCurrentDelayMs() const:

• 获取当前实际延迟（经过平滑滤波处理）。包含数据包缓冲和同步缓冲的总和。

3.6 统计信息 (Statistics)

1， virtual int NetworkStatistics(...):

• 获取当前的网络统计数据（NetEqNetworkStatistics）。注意：调用后统计数据会重置。

2， virtual NetEqLifetimeStatistics GetLifetimeStatistics() const:

• 获取生命周期内的累积统计数据。

3， virtual NetEqOperationsAndState GetOperationsAndState() const:

• 获取内部操作状态和累积计数。

3.7 VAD (语音活动检测)

1， virtual void EnableVad() / DisableVad():

• 启用/禁用解码后的 VAD。如果启用，GetAudio 可以在没有语音时返回特殊标记，帮助上层节省电量或带宽。

3.8 其他辅助功能

1， virtual absl::optional<uint32_t> GetPlayoutTimestamp() const:

• 获取最后输出音频帧对应的 RTP 时间戳。用于音视频同步。

2， virtual int last_output_sample_rate_hz() const:

• 获取最后输出音频的采样率。

3， virtual void FlushBuffers():

• 清空缓冲区。通常在切换流、seek 或严重错误时调用，丢弃所有未播放的数据。

4， virtual void EnableNack(size_t max_nack_list_size) / DisableNack():

• 启用/禁用 NACK（负确认）机制。如果启用了 NACK，NetEq 会跟踪丢失的包。

5， virtual std::vector<uint16_t> GetNackList(int64_t round_trip_time_ms) const:

• 根据估算的 RTT，返回需要重传的 RTP 序列号列表。上层会将此列表通过 RTCP NACK 包发送给对端。

6， virtual std::vector<uint32_t> LastDecodedTimestamps() const:

• （测试用）返回上次 GetAudio 调用中解码的包的时间戳。

7， virtual int SyncBufferSizeMs() const:

• （测试用）返回同步缓冲区中剩余待播放的音频长度。

四，工作流

网络层收到 RTP 包 -> 调用 InsertPacket。
音频线程每 10ms 请求一次音频 -> 调用 GetAudio。
GetAudio 内部判断：

• 有包？-> 解码 -> 输出。

• 没包（丢包）？-> PLC (Expand/Merge) -> 输出合成音频。

• 延迟太高？-> Accelerate (加速播放，丢弃少量样本)。

• 延迟太低？-> Preemptive Expand (拉伸音频，增加延迟)。

上层通过 NetworkStatistics 监控质量，通过 SetMinimumDelay 调整策略。