uni-app 实现sse流式音频技术方案

小程序流式音频播放技术方案

1. 背景与目标

在 AI 对话场景中，需要实现 边生成边播放 的流式音频体验：

服务端通过 SSE 流式推送 PCM 音频数据
小程序实时接收并即时播放，实现"边说边听"效果

2. 技术选型

方案	优点	缺点
InnerAudioContext（直接URL播放）	实现简单	依赖完整音频文件，无法流式播放
WebAudioContext + PCM	支持流式、可精确调度、无缝拼接	需要处理二进制数据

最终选择：WebAudioContext + 流式 PCM 数据

3. 系统架构

复制代码

┌─────────────────────────────────────────────────────────────────┐
│                           服务端                                  │
│  ┌──────────┐    ┌─────────────────────────────────────────┐     │
│  │  AI 模型 │───►│  SSE 流输出 (text/event-stream)         │     │
│  └──────────┘    │  ├── event: reply (文本)                │     │
│                  │  ├── event: audio (base64 PCM)         │     │
│                  │  └── event: audio_done (完整音频URL)    │     │
│                  └─────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ TCP Chunked Transfer
┌─────────────────────────────────────────────────────────────────┐
│                         小程序端                                  │
│  ┌────────────────┐    ┌──────────────────┐    ┌──────────────┐  │
│  │ uni.request    │───►│ StreamDecoder    │───►│ SSE Parser   │  │
│  │ enableChunked  │    │ ArrayBuffer→String│   │              │  │
│  │ onChunkReceived │    │ 处理中文截断      │    │              │  │
│  └────────────────┘    └──────────────────┘    └──────┬───────┘  │
│                                                        │          │
│                      ┌─────────────────────────────────┘          │
│                      ▼                                            │
│  ┌────────────────────────────────────────────────────────────────┤
│  │                      音频播放器 (useAudioPlayer)                │
│  │  ┌──────────────────┐         ┌─────────────────────────────┐  │
│  │  │ WebAudioContext  │         │ InnerAudioContext (降级方案) │  │
│  │  │ - 流式播放        │         │ - 合并chunk为WAV后播放       │  │
│  │  │ - 精确时间调度    │         │ - 用于WebAudio不可用时      │  │
│  │  │ - 16kHz PCM解码  │         │                             │  │
│  │  └──────────────────┘         └─────────────────────────────┘  │
│  └────────────────────────────────────────────────────────────────┤
└─────────────────────────────────────────────────────────────────┘

4. 核心实现

4.1 SSE 流式请求

使用 uni.request 的分块模式接收数据：

typescript 复制代码

const task = uni.request({
  url: `${baseUrl}chat`,
  method: 'POST',
  header: {
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream',
    'Authorization': token,
  },
  data: JSON.stringify({ ...params, stream: true }),
  enableChunked: true,           // 开启分块传输
  responseType: 'arraybuffer',   // 以二进制接收
})

// 流式数据回调
task.onChunkReceived((res: { data: ArrayBuffer }) => {
  const chunk = streamDecoder.decode(res.data)
  // 解析SSE事件...
})

4.2 StreamDecoder 解决多字节字符截断

SSE 数据在块边界可能被截断，导致中文/emoji显示乱码。

typescript 复制代码

export class StreamDecoder {
  private buffer: Uint8Array = new Uint8Array(0)

  public decode(data: ArrayBuffer): string {
    const chunk = new Uint8Array(data)

    // 优先使用原生 TextDecoder（支持 stream 参数）
    if (this.decoder) {
      return this.decoder.decode(chunk, { stream: true })
    }

    // 降级：手动拼接buffer，按UTF-8边界切分
    const fullBuffer = new Uint8Array(this.buffer.length + chunk.length)
    fullBuffer.set(this.buffer)
    fullBuffer.set(chunk, this.buffer.length)

    // 逐字节解析UTF-8字符...
    this.buffer = fullBuffer.slice(i)  // 保留未完整的字节
    return out
  }
}

4.3 WebAudioContext 流式播放

初始化（仅一次）

typescript 复制代码

let mpAudioCtx: any = null
let mpNextStartTime = 0  // 下一个chunk的播放时间点

// 首次创建
if (!mpAudioCtx) {
  mpAudioCtx = wx.createWebAudioContext()
  mpNextStartTime = mpAudioCtx.currentTime
}

PCM 播放函数

typescript 复制代码

function mpPlayPcmViaWebAudio(pcm: ArrayBuffer) {
  // 1. Int16 PCM → Float32Array
  const int16 = new Int16Array(pcm)
  const samples = new Float32Array(int16.length)
  for (let i = 0; i < int16.length; i++) {
    samples[i] = int16[i] / 32768  // 归一化到 [-1, 1]
  }

  // 2. 首尾淡入/淡出（消除chunk边界爆点）
  applyFade(samples, 80)  // 5ms @ 16kHz

  // 3. 创建 AudioBuffer
  const buf = mpAudioCtx.createBuffer(1, samples.length, 16000)
  const ch = buf.getChannelData(0)
  ch.set(samples)

  // 4. 创建 AudioBufferSourceNode 并调度播放
  const src = mpAudioCtx.createBufferSource()
  src.buffer = buf
  src.connect(mpAudioCtx.destination)
  src.start(mpNextStartTime)  // 精确时间调度

  // 5. 累加下一个播放时间点
  mpNextStartTime += samples.length / 16000
}

淡入淡出处理

typescript 复制代码

function applyFade(data: Float32Array, fadeLen = 80) {
  const f = Math.min(fadeLen, Math.floor(data.length / 4))
  if (f <= 0) return

  for (let i = 0; i < f; i++) {
    const gain = i / f
    data[i] *= gain                    // 开头淡入
    data[data.length - 1 - i] *= gain  // 结尾淡出
  }
}

4.4 降级方案

当 wx.createWebAudioContext() 不可用时，将所有 chunk 合并为 WAV 文件播放：

typescript 复制代码

function onStreamDone() {
  if (mpAudioCtx) {
    // WebAudioContext 模式：估算剩余时间后清除状态
    const remaining = Math.max(0, mpNextStartTime - mpAudioCtx.currentTime)
    setTimeout(() => {
      activeAudioId.value = null
    }, remaining * 1000 + 300)
  }
  else if (mpFallbackChunks.length > 0) {
    // 降级模式：合并chunk为WAV文件
    const wavBuffer = createWavFile(mpFallbackChunks)
    const path = `${wx.env.USER_DATA_PATH}/audio_${Date.now()}.wav`

    wx.getFileSystemManager().writeFile({
      filePath: path,
      data: wavBuffer,
      success: () => {
        const audio = wx.createInnerAudioContext()
        audio.obeyMuteSwitch = false
        audio.src = path
        audio.play()
      }
    })
  }
}

5. SSE 事件类型

事件	数据格式	说明
`session`	`{ session_id, msg_id }`	会话创建
`reply`	`{ content: string }`	文本内容增量
`reasoning`	`{ content: string }`	思考过程
`audio`	`{ chunk: base64 }`	PCM 音频块
`audio_done`	`{ audio_url: string }`	完整音频URL
`audio_error`	`{ error: string }`	音频错误
`done`	-	文本流结束
`error`	`{ content: string }`	错误信息

6. 状态管理

typescript 复制代码

// 流式播放状态
let streamStopped = false   // 是否主动停止
let streamDone = false       // 流是否结束

// WebAudioContext 状态
let mpAudioCtx: any = null    // 单例context
let mpNextStartTime = 0      // 播放调度时间
let mpFallbackChunks: ArrayBuffer[] = []  // 降级模式缓冲区

7. 生命周期

复制代码

┌────────────────────────────────────────────────────────────────┐
│                      SSE 音频生命周期                            │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  sendChat()                                                     │
│     │                                                           │
│     ▼                                                           │
│  resetStream()  ──► 重置所有状态                                │
│     │                                                           │
│     ▼                                                           │
│  sendChatSSE() ──► 发起请求，onChunkReceived 开始接收          │
│     │                                                           │
│     ├──► onAudio(chunk) ──► onStreamChunk() ──► 播放PCM        │
│     │                                                           │
│     └──► onAudioDone ──► onStreamDone() ──► 估算时长后关闭      │
│                                                                 │
│  stopAll() / cancelGeneration()                                │
│     │                                                           │
│     ▼                                                           │
│  cleanupStream() ──► 中止播放，清理资源                         │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

8. 关键参数

参数	值	说明
采样率	16000 Hz	16kHz 音频采样
采样精度	16-bit	Int16 PCM 格式
声道数	1	单声道
淡入淡出时长	5ms (80 samples)	消除边界爆点
状态清除延迟	remaining * 1000 + 300ms	等待最后chunk播放完成

9. 注意事项

9.1 iOS 兼容

iOS 需要用户交互后才能播放音频
首次播放前需调用 mpAudioCtx.resume()

9.2 Android 兼容

不同设备 WebAudioContext 可用性不同
降级方案确保基本可用

9.3 内存管理

chunk 播放后立即释放 Buffer
SSE 结束时清理 WebAudioContext
降级模式临时文件播放后删除

9.4 中断处理

页面切换时保存 msg_id
返回时尝试恢复 SSE 流
恢复失败则标记消息状态

10. 相关文件

文件	职责
`src/api/chat.ts`	SSE 请求发起与解析
`src/utils/streamDecoder.ts`	ArrayBuffer → String 编解码
`src/composables/useAudioPlayer.ts`	音频播放器核心逻辑
`src/composables/useChatSSE.ts`	SSE 事件与播放器联动
`src/pages/chat/index.vue`	聊天页面集成