Web Speech API 语音识别与合成详解

当浏览器能「开口说话、听懂人言」，Web 应用的交互边界将被彻底打破

一、引言

1.1 Web Speech API 是什么

Web Speech API 是 W3C 标准化的浏览器原生 JavaScript 接口，它让网页应用具备了语音识别 （Speech Recognition，将语音转为文本）和语音合成（Speech Synthesis，将文本转为语音）能力。与需要后端服务、AI 模型或第三方 SDK 的语音方案不同，Web Speech API 完全运行在客户端，无需 API Key、无需服务器、零依赖。

javascript 复制代码

// 三行代码，让网页开口说话
const utterance = new SpeechSynthesisUtterance("你好，世界！");
utterance.lang = 'zh-CN';
speechSynthesis.speak(utterance);

这个 API 由两个核心模块构成：

模块	功能	类比
SpeechRecognition	语音 → 文本	耳朵，倾听用户
SpeechSynthesis	文本 → 语音	嘴巴，回应用户

1.2 解决了什么问题

在 Web Speech API 出现之前，开发者想要实现语音功能只有两条路：

调用云端 ASR/TTS API（如 Google Cloud Speech、Azure Speech）：需要服务器、需要付费、需要处理网络延迟和隐私问题
使用 Flash/Applet 插件：已被现代浏览器废弃

Web Speech API 打破了这两难选择，为 Web 应用提供了一种轻量、免费、即时响应的语音交互方案。

1.3 为什么值得关注

这是一个典型的「原生 API 冷知识 」------功能强大却鲜为人知。大多数前端开发者不知道浏览器已经内置了完整的语音能力，即便知道也可能因为兼容性顾虑而止步。但对于以下场景，Web Speech API 堪称零成本解决方案：

快速验证语音交互原型
构建无障碍辅助功能
Electron 桌面应用的本地语音交互
PWA 的离线语音能力

二、核心功能详解

2.1 Speech Synthesis（语音合成）

2.1.1 API 结构和核心方法

语音合成的 API 非常简洁，核心入口是 window.speechSynthesis：

typescript 复制代码

// TypeScript 类型定义
interface SpeechSynthesisUtterance extends EventTarget {
  text: string;           // 要朗读的文本
  lang: string;           // BCP-47 语言标签，如 'zh-CN'
  voice: SpeechSynthesisVoice | null;  // 指定语音
  volume: number;         // 音量 0-1，默认 1
  rate: number;           // 语速 0.1-10，默认 1
  pitch: number;          // 音调 0-2，默认 1
  
  // 事件
  onstart: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
  onend: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
  onerror: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisErrorEvent) => void) | null;
  onpause: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
  onresume: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
  onmark: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
  onboundary: ((this: SpeechSynthesisUtterance, ev: SpeechSynthesisEvent) => void) | null;
}

interface SpeechSynthesisVoice {
  voiceURI: string;
  name: string;
  lang: string;
  localService: boolean;  // 是否本地语音
  default: boolean;       // 是否默认语音
}

2.1.2 基本使用示例

typescript 复制代码

// 基础用法
const synth = window.speechSynthesis;

// 检查支持
if (!synth) {
  console.error('当前浏览器不支持语音合成');
}

// 创建语音实例
const utterance = new SpeechSynthesisUtterance('欢迎使用语音合成功能');

// 配置参数
utterance.lang = 'zh-CN';
utterance.rate = 1.0;   // 正常语速
utterance.pitch = 1.0;  // 正常音调
utterance.volume = 1.0; // 最大音量

// 开始朗读
synth.speak(utterance);

// 监听事件
utterance.onstart = () => console.log('开始朗读');
utterance.onend = () => console.log('朗读完成');
utterance.onerror = (e) => console.error('朗读失败:', e.error);

2.1.3 Vue 3 组合式 API 封装

typescript 复制代码

// composables/useSpeechSynthesis.ts
import { ref, onMounted, onUnmounted } from 'vue';

export interface SpeechOptions {
  lang?: string;
  rate?: number;
  pitch?: number;
  volume?: number;
  voice?: SpeechSynthesisVoice;
}

export function useSpeechSynthesis() {
  const isSupported = ref(false);
  const isSpeaking = ref(false);
  const voices = ref<SpeechSynthesisVoice[]>([]);
  
  // 兼容性与初始化
  onMounted(() => {
    isSupported.value = 'speechSynthesis' in window;
    if (!isSupported.value) return;
    
    // 加载可用语音列表
    const loadVoices = () => {
      voices.value = window.speechSynthesis.getVoices();
    };
    
    loadVoices();
    // Chrome 需要等待voiceschanged事件
    window.speechSynthesis.onvoiceschanged = loadVoices;
  });
  
  // 朗读方法
  const speak = (text: string, options: SpeechOptions = {}) => {
    if (!isSupported.value) {
      console.warn('语音合成不可用');
      return;
    }
    
    // 停止当前朗读
    window.speechSynthesis.cancel();
    
    const utterance = new SpeechSynthesisUtterance(text);
    
    if (options.lang) utterance.lang = options.lang;
    if (options.rate !== undefined) utterance.rate = options.rate;
    if (options.pitch !== undefined) utterance.pitch = options.pitch;
    if (options.volume !== undefined) utterance.volume = options.volume;
    if (options.voice) utterance.voice = options.voice;
    
    utterance.onstart = () => { isSpeaking.value = true; };
    utterance.onend = () => { isSpeaking.value = false; };
    utterance.onerror = () => { isSpeaking.value = false; };
    
    window.speechSynthesis.speak(utterance);
  };
  
  // 停止朗读
  const stop = () => {
    window.speechSynthesis?.cancel();
    isSpeaking.value = false;
  };
  
  // 暂停/恢复
  const pause = () => window.speechSynthesis?.pause();
  const resume = () => window.speechSynthesis?.resume();
  
  // 清理
  onUnmounted(() => {
    stop();
  });
  
  return {
    isSupported,
    isSpeaking,
    voices,
    speak,
    stop,
    pause,
    resume
  };
}

vue 复制代码

<!-- 组件中使用 -->
<template>
  <div class="speech-demo">
    <textarea v-model="text" placeholder="输入要朗读的文本..."></textarea>
    <select v-model="selectedVoice">
      <option v-for="v in voices" :key="v.voiceURI" :value="v">
        {{ v.name }} ({{ v.lang }})
      </option>
    </select>
    <div class="controls">
      <button @click="handleSpeak" :disabled="isSpeaking">🔊 朗读</button>
      <button @click="stop">⏹ 停止</button>
    </div>
  </div>
</template>

<script setup lang="ts">
import { ref } from 'vue';
import { useSpeechSynthesis } from './composables/useSpeechSynthesis';

const text = ref('你好，这是一段测试文本。');
const selectedVoice = ref<SpeechSynthesisVoice | null>(null);

const { isSupported, isSpeaking, voices, speak, stop } = useSpeechSynthesis();
</script>

2.1.4 可配置参数详解

参数	范围	默认值	说明
`rate`	0.1 - 10	1	语速，0.5 为半速，2 为两倍速
`pitch`	0 - 2	1	音调，0.5 为低沉，2 为高亢
`volume`	0 - 1	1	音量，0 为静音
`lang`	BCP-47	浏览器默认	如 'zh-CN', 'en-US', 'ja-JP'
`voice`	SpeechSynthesisVoice	系统默认	可指定具体语音

2.2 Speech Recognition（语音识别）

2.2.1 API 结构和核心方法

语音识别相比合成更复杂，因为它涉及麦克风访问和状态管理：

typescript 复制代码

// TypeScript 类型定义
interface SpeechRecognition extends EventTarget {
  grammars: SpeechGrammarList;
  lang: string;                    // 识别语言
  continuous: boolean;             // 是否连续识别
  interimResults: boolean;         // 是否返回临时结果
  maxAlternatives: number;          // 最大候选数量
  abort(): void;
  
  // 事件
  onaudiostart: ((this: SpeechRecognition, ev: Event) => void) | null;
  onaudioend: ((this: SpeechRecognition, ev: Event) => void) | null;
  onspeechstart: ((this: SpeechRecognition, ev: Event) => void) | null;
  onspeechend: ((this: SpeechRecognition, ev: Event) => void) | null;
  onstart: ((this: SpeechRecognition, ev: Event) => void) | null;
  onend: ((this: SpeechRecognition, ev: Event) => void) | null;
  onerror: ((this: SpeechRecognition, ev: SpeechRecognitionErrorEvent) => void) | null;
  onnomatch: ((this: SpeechRecognition, ev: SpeechRecognitionEvent) => void) | null;
  onresult: ((this: SpeechRecognition, ev: SpeechRecognitionEvent) => void) | null;
}

interface SpeechRecognitionEvent extends Event {
  resultIndex: number;
  results: SpeechRecognitionResultList;
}

interface SpeechRecognitionAlternative {
  transcript: string;      // 识别的文本
  confidence: number;       // 置信度 0-1
}

注意：Chrome 使用 webkitSpeechRecognition，Firefox 和 Safari 尚未实现此 API。

2.2.2 基本使用示例

typescript 复制代码

// 兼容性处理
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

if (!SpeechRecognition) {
  console.error('当前浏览器不支持语音识别');
}

// 创建识别实例
const recognition = new SpeechRecognition();

// 配置参数
recognition.lang = 'zh-CN';           // 设置语言
recognition.continuous = false;       // 单次识别
recognition.interimResults = true;    // 返回临时结果
recognition.maxAlternatives = 1;     // 只取最佳结果

// 开始识别
recognition.start();

// 处理识别结果
recognition.onresult = (event: SpeechRecognitionEvent) => {
  for (let i = event.resultIndex; i < event.results.length; i++) {
    const transcript = event.results[i][0].transcript;
    const isFinal = event.results[i].isFinal;
    
    console.log(`识别结果 [${isFinal ? '最终' : '临时'}]: ${transcript}`);
    
    if (isFinal) {
      // 最终结果，可以进行后续处理
      handleCommand(transcript);
    }
  }
};

// 错误处理
recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
  console.error('识别错误:', event.error);
  
  switch (event.error) {
    case 'no-speech':
      console.warn('未检测到语音');
      break;
    case 'audio-capture':
      console.error('无法访问麦克风');
      break;
    case 'not-allowed':
      console.error('麦克风权限被拒绝');
      break;
    case 'network':
      console.error('网络错误');
      break;
  }
};

// 识别结束
recognition.onend = () => {
  console.log('识别结束');
};

2.2.3 Vue 3 组合式 API 封装（带完整错误处理）

typescript 复制代码

// composables/useSpeechRecognition.ts
import { ref, onUnmounted } from 'vue';

export type RecognitionStatus = 'idle' | 'listening' | 'processing' | 'error';

export interface UseSpeechRecognitionOptions {
  lang?: string;
  continuous?: boolean;
  interimResults?: boolean;
  onResult?: (transcript: string, isFinal: boolean) => void;
  onError?: (error: string) => void;
}

export function useSpeechRecognition(options: UseSpeechRecognitionOptions = {}) {
  const status = ref<RecognitionStatus>('idle');
  const transcript = ref('');
  const interimTranscript = ref('');
  const error = ref<string | null>(null);
  const isSupported = ref(false);
  
  // 兼容性处理
  const SpeechRecognitionAPI = window.SpeechRecognition || window.webkitSpeechRecognition;
  let recognition: SpeechRecognition | null = null;
  
  if (SpeechRecognitionAPI) {
    isSupported.value = true;
    recognition = new SpeechRecognitionAPI();
    
    // 默认配置
    recognition.lang = options.lang || 'zh-CN';
    recognition.continuous = options.continuous ?? false;
    recognition.interimResults = options.interimResults ?? true;
    recognition.maxAlternatives = 1;
    
    // 事件处理
    recognition.onstart = () => {
      status.value = 'listening';
      error.value = null;
    };
    
    recognition.onresult = (event: SpeechRecognitionEvent) => {
      interimTranscript.value = '';
      
      for (let i = event.resultIndex; i < event.results.length; i++) {
        const result = event.results[i];
        const text = result[0].transcript;
        
        if (result.isFinal) {
          transcript.value += text;
          interimTranscript.value = '';
          options.onResult?.(text, true);
        } else {
          interimTranscript.value += text;
          options.onResult?.(text, false);
        }
      }
    };
    
    recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
      status.value = 'error';
      error.value = event.error;
      options.onError?.(event.error);
    };
    
    recognition.onend = () => {
      if (status.value !== 'error') {
        status.value = 'idle';
      }
    };
  }
  
  // 控制方法
  const start = () => {
    if (!recognition) {
      error.value = 'not-supported';
      return;
    }
    
    try {
      interimTranscript.value = '';
      transcript.value = '';
      recognition.start();
    } catch (e) {
      console.error('启动识别失败:', e);
    }
  };
  
  const stop = () => {
    recognition?.stop();
  };
  
  const abort = () => {
    recognition?.abort();
  };
  
  const reset = () => {
    transcript.value = '';
    interimTranscript.value = '';
    error.value = null;
    status.value = 'idle';
  };
  
  // 清理
  onUnmounted(() => {
    abort();
  });
  
  return {
    status,
    transcript,
    interimTranscript,
    error,
    isSupported,
    start,
    stop,
    abort,
    reset
  };
}

vue 复制代码

<!-- 组件中使用 -->
<template>
  <div class="voice-input">
    <div class="status">
      状态: <span :class="status">{{ statusText }}</span>
    </div>
    
    <div class="transcript-display">
      <p class="final">{{ transcript }}</p>
      <p class="interim" v-if="interimTranscript">{{ interimTranscript }}</p>
    </div>
    
    <div class="controls">
      <button 
        @click="isListening ? stop() : start()" 
        :disabled="!isSupported"
        :class="{ recording: isListening }"
      >
        {{ isListening ? '⏹ 停止' : '🎤 开始识别' }}
      </button>
      <button @click="reset" :disabled="!transcript">清除</button>
    </div>
    
    <p v-if="error" class="error">{{ errorMessage }}</p>
  </div>
</template>

<script setup lang="ts">
import { computed } from 'vue';
import { useSpeechRecognition } from './composables/useSpeechRecognition';

const {
  status,
  transcript,
  interimTranscript,
  error,
  isSupported,
  start,
  stop,
  reset
} = useSpeechRecognition({
  lang: 'zh-CN',
  continuous: false,
  interimResults: true,
  onResult: (text, isFinal) => {
    console.log(`[${isFinal ? '最终' : '临时'}] ${text}`);
  },
  onError: (err) => {
    console.error('识别错误:', err);
  }
});

const isListening = computed(() => status.value === 'listening');
const statusText = computed(() => {
  const map = {
    idle: '空闲',
    listening: '正在聆听...',
    processing: '处理中',
    error: '错误'
  };
  return map[status.value];
});

const errorMessage = computed(() => {
  const map: Record<string, string> = {
    'no-speech': '未检测到语音，请重试',
    'audio-capture': '无法访问麦克风',
    'not-allowed': '麦克风权限被拒绝，请允许访问',
    'network': '网络错误，请检查网络连接',
    'not-supported': '当前浏览器不支持语音识别'
  };
  return map[error.value || ''] || '发生未知错误';
});
</script>

2.2.4 高级用法：语法约束与上下文偏向

对于特定领域词汇，Web Speech API 支持通过语法规则提高识别准确率：

typescript 复制代码

// 定义颜色识别语法（JSGF 格式）
const colors = ['red', 'blue', 'green', 'yellow', 'black', 'white', 'orange'];
const grammar = `#JSGF V1.0; grammar colors; public <color> = ${colors.join(' | ')};`;

const speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);  // 权重 1

const recognition = new SpeechRecognition();
recognition.grammars = speechRecognitionList;

// 对于 Chrome，还支持 phrases 进行上下文偏向
if ('phrases' in recognition) {
  (recognition as any).phrases = [
    { phrase: 'TypeScript', boost: 5.0 },
    { phrase: 'JavaScript', boost: 3.0 },
    { phrase: 'Vue', boost: 4.0 }
  ];
}

三、使用场景

3.1 无障碍访问（屏幕阅读器增强）

对于视力障碍用户，语音合成可以将网页内容朗读出来：

typescript 复制代码

// 无障碍内容朗读
const readPageContent = () => {
  const mainContent = document.querySelector('main')?.textContent || document.body.textContent;
  if (mainContent) {
    const utterance = new SpeechSynthesisUtterance(mainContent);
    utterance.lang = 'zh-CN';
    utterance.rate = 0.9;  // 稍慢的语速便于理解
    speechSynthesis.speak(utterance);
  }
};

3.2 语音搜索和指令

实现类似 Google Assistant 的语音搜索功能：

typescript 复制代码

const handleVoiceSearch = async (query: string) => {
  // 解析语音命令
  if (query.includes('搜索')) {
    const keyword = query.replace('搜索', '').trim();
    window.location.href = `/search?q=${encodeURIComponent(keyword)}`;
  } else if (query.includes('打开')) {
    const page = query.replace('打开', '').trim();
    router.push(`/${page}`);
  }
};

3.3 多语言内容朗读

支持内容的多语言朗读，特别是对于多语言站点：

typescript 复制代码

const speakLocalized = (content: Record<string, string>) => {
  const lang = navigator.language || 'en';
  const text = content[lang] || content.en;
  
  const utterance = new SpeechSynthesisUtterance(text);
  utterance.lang = lang;
  
  // 根据语言调整语速
  if (lang.startsWith('ja')) {
    utterance.rate = 0.8;  // 日语稍慢
  }
  
  speechSynthesis.speak(utterance);
};

3.4 实时字幕生成

将会议或视频的音频实时转录为字幕：

typescript 复制代码

const generateLiveCaption = () => {
  const recognition = new SpeechRecognition();
  recognition.continuous = true;
  recognition.interimResults = true;
  
  const captions: HTMLElement[] = [];
  
  recognition.onresult = (event) => {
    let interim = '';
    
    for (let i = event.resultIndex; i < event.results.length; i++) {
      const transcript = event.results[i][0].transcript;
      
      if (event.results[i].isFinal) {
        // 最终结果，添加到字幕历史
        captions.push(createCaptionElement(transcript, true));
      } else {
        interim = transcript;
      }
    }
    
    // 更新临时字幕
    updateInterimCaption(interim);
  };
  
  recognition.start();
};

3.5 Electron 桌面应用中的语音交互

Electron 应用天然支持完整的 Web Speech API 功能，且没有 Safari 兼容性问题：

typescript 复制代码

// Electron 主进程可以配置麦克风权限
// BrowserWindow 创建时添加权限
const mainWindow = new BrowserWindow({
  webPreferences: {
    nodeIntegration: false,
    contextIsolation: true,
    // 启用麦克风
    permissions: ['microphone']
  }
});

// 渲染进程中使用
const enableVoiceControl = () => {
  const recognition = new webkitSpeechRecognition();
  recognition.continuous = true;
  recognition.interimResults = false;
  
  // 可以使用更长的识别超时
  recognition.onend = () => {
    // 自动重启识别
    if (appState.voiceEnabled) {
      setTimeout(() => recognition.start(), 100);
    }
  };
  
  recognition.start();
};

3.6 其他场景

场景	说明
教育类应用	语言学习发音练习、听力材料朗读
游戏/娱乐	语音控制游戏角色、互动故事
表单输入	语音填表、语音搜索
通知提醒	重要消息语音播报

四、优点分析

4.1 浏览器原生支持，无需第三方库

这是 Web Speech API 最显著的优势。无需安装任何 SDK，无需引入外部依赖：

html 复制代码

<!-- 只需浏览器，原生 API 直接使用 -->
<script src="app.js"></script>
<!-- 没有 <script src="speech-sdk.js"></script> -->

4.2 跨平台能力

一次编写，运行在所有支持该 API 的平台：

Windows / macOS / Linux 桌面浏览器
Android Chrome
iOS Safari（仅语音合成）
Electron 应用

4.3 免费使用

与 Google Cloud Speech、Azure Speech 等付费服务不同，Web Speech API 完全免费，没有用量限制（Chrome 端有隐式限制但一般不会触发）。

4.4 实时性好

因为是浏览器原生实现，没有网络请求的延迟：

语音合成：延迟通常 < 50ms
语音识别：在 Chromium 浏览器中接近实时

4.5 与 Web 生态无缝集成

typescript 复制代码

// 可以与其他 Web API 组合使用
const demo = () => {
  // 语音识别 + WebSocket 实现实时翻译
  recognition.onresult = (event) => {
    const text = event.results[0][0].transcript;
    ws.send(JSON.stringify({ text, lang: 'zh' }));
  };
  
  // 语音合成 + Web Audio API 实现变声
  const utterance = new SpeechSynthesisUtterance(text);
  utterance.voice = selectedVoice;
  // 通过 AudioContext 处理输出
};

4.6 TypeScript 类型支持

MDN 提供了完整的类型定义，可以获得良好的开发体验：

typescript 复制代码

import type {
  SpeechSynthesisUtterance,
  SpeechSynthesisVoice,
  SpeechRecognition,
  SpeechRecognitionEvent
} from 'typescript舌';  // 内置类型

五、缺点与限制

5.1 浏览器兼容性问题

这是 Web Speech API 最大的痛点：

功能	Chrome	Edge	Firefox	Safari	iOS Safari
SpeechSynthesis	✅ 33+	✅ 14+	✅ 49+	✅ 7+	✅ 7+
SpeechRecognition	✅ 25+	✅ 79+	❌ 实验性	❌	❌

Speech Recognition 在 Safari 和 Firefox 上不可用，这是巨大的限制。

5.2 语音识别需要网络连接（部分情况）

Chrome 的语音识别依赖 Google 服务器：

⚠️ 在部分浏览器和设备上，语音识别需要将音频发送到云端处理。

解决方案：Chrome 已支持设备端语音识别（On-Device Speech Recognition）：

typescript 复制代码

const recognition = new SpeechRecognition();
recognition.processLocally = true;  // 设备端处理

// 检查语言包是否可用
SpeechRecognition.available({
  langs: ['zh-CN'],
  processLocally: true
}).then(status => {
  if (status === 'available') {
    // 可以离线使用
  }
});

5.3 识别准确率受环境影响

环境因素	影响
背景噪声	信噪比 < 15dB 时准确率显著下降
麦克风质量	定向麦克风优于全向麦克风
说话距离	> 1.5 米时识别率下降
口音差异	方言/外语口音可能导致误识别
专业术语	未登录词识别困难

5.4 语言支持有限

虽然 Chrome 支持 100+ 语言，但其他浏览器和设备可能只支持部分语言：

typescript 复制代码

// 检查支持的语言
const getSupportedLanguages = () => {
  const voices = speechSynthesis.getVoices();
  const langs = new Set(voices.map(v => v.lang.split('-')[0]));
  return Array.from(langs);
};

5.5 隐私和安全问题

语音数据可能涉及敏感信息：

Chrome：音频发送到 Google 服务器处理
数据保留：不确定服务端是否保留音频
监听风险：恶意网站可能滥用语音识别功能

最佳实践：务必在 HTTPS 环境下使用，并明确告知用户数据处理方式。

5.6 性能开销

麦克风访问：持续的麦克风访问会增加电池消耗
音频处理：长时间识别可能导致内存增长
CPU 占用：连续识别模式下 CPU 占用不可忽视

六、最佳实践与注意事项

6.1 特性检测和降级方案

typescript 复制代码

// 多层检测
const checkSpeechSupport = () => {
  // 第一层：检查全局对象
  const SpeechRecognitionAPI = window.SpeechRecognition || window.webkitSpeechRecognition;
  const SpeechSynthesisAPI = 'speechSynthesis' in window;
  
  // 第二层：检查实例化能力
  let canInstantiate = false;
  try {
    if (SpeechRecognitionAPI) {
      const test = new SpeechRecognitionAPI();
      canInstantiate = true;
    }
  } catch (e) {
    console.warn('无法实例化 SpeechRecognition');
  }
  
  return {
    synthesis: SpeechSynthesisAPI,
    recognition: canInstantiate
  };
};

// 降级策略
const speakWithFallback = (text: string, voice?: SpeechSynthesisVoice) => {
  if ('speechSynthesis' in window) {
    const utterance = new SpeechSynthesisUtterance(text);
    if (voice) utterance.voice = voice;
    speechSynthesis.speak(utterance);
  } else {
    // 降级：使用 Web Speech API 以外的方案
    // 例如：提示用户使用系统辅助功能
    console.warn('当前环境不支持语音合成');
  }
};

6.2 错误处理

typescript 复制代码

// 完整的错误处理
recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
  const errorMessages: Record<string, string> = {
    'no-speech': '未检测到语音输入，请靠近麦克风重试',
    'audio-capture': '无法访问麦克风，请检查设备连接',
    'not-allowed': '麦克风权限被拒绝，请在设置中允许访问',
    'network': '网络连接失败，请检查网络后重试',
    'bad-grammar': '语法格式错误',
    'language-not-supported': '当前语言不支持，请切换语言',
    'service-not-allowed': '语音服务不可用',
    'aborted': '识别被中断'
  };
  
  const message = errorMessages[event.error] || '发生未知错误';
  console.error(`语音识别错误 [${event.error}]:`, message);
  
  // 用户友好的错误提示
  showToast(message, 'error');
  
  // 某些错误后可以自动重试
  if (['no-speech', 'network'].includes(event.error)) {
    setTimeout(() => recognition.start(), 2000);
  }
};

6.3 用户体验优化

typescript 复制代码

// 1. 明确的用户引导
const VoicePrompt = {
  INITIAL: '点击按钮开始说话',
  LISTENING: '🎤 请说话...',
  PROCESSING: '正在识别...',
  RESULT: '识别结果',
  ERROR: '识别失败，请重试'
};

// 2. 视觉反馈
const visualFeedback = () => {
  // 监听音频事件，添加可视化效果
  recognition.onaudiostart = () => {
    document.body.classList.add('listening');
  };
  
  recognition.onaudioend = () => {
    document.body.classList.remove('listening');
  };
};

// 3. 取消确认
const confirmBeforeStop = () => {
  // 对于较长的识别会话，提示用户确认
  if (recognition.continuous && transcript.length > 100) {
    if (!confirm('确定要停止语音识别吗？')) {
      return;
    }
  }
  recognition.stop();
};

6.4 性能优化建议

typescript 复制代码

// 1. 参数调优
const optimizedRecognition = () => {
  const recognition = new SpeechRecognition();
  
  // 非连续模式降低资源消耗
  recognition.continuous = false;
  
  // 仅在必要时返回临时结果
  recognition.interimResults = true;
  
  // 限制候选数量
  recognition.maxAlternatives = 1;
  
  // 设置合理的语言
  recognition.lang = 'zh-CN';  // 精确指定，避免自动检测开销
};

// 2. 语音合成优化
const optimizedSynthesis = () => {
  const utterance = new SpeechSynthesisUtterance(text);
  
  // 选择本地语音，减少延迟
  const localVoice = voices.find(v => v.localService);
  if (localVoice) utterance.voice = localVoice;
  
  // 合理的语速
  utterance.rate = 1.0;  // 默认值，避免过快或过慢
};

// 3. 及时清理
const cleanup = () => {
  // 页面隐藏时暂停识别，节省资源
  document.addEventListener('visibilitychange', () => {
    if (document.hidden) {
      recognition?.stop();
    }
  });
  
  // 组件卸载时清理
  onUnmounted(() => {
    recognition?.abort();
    speechSynthesis?.cancel();
  });
};

七、浏览器兼容性说明

7.1 各浏览器支持情况

Speech Synthesis（语音合成）

浏览器	版本要求	备注
Chrome	33+	完整支持
Edge	14+	完整支持
Firefox	49+	完整支持
Safari	7+	完整支持
Opera	21+	完整支持
iOS Safari	7+	完整支持
Android Chrome	33+	完整支持

Speech Recognition（语音识别）

浏览器	版本要求	备注
Chrome	25+	完整支持，需网络（部分设备支持离线）
Edge	79+	完整支持
Firefox	实验性支持	需开启 flag
Safari	❌ 不支持	-
iOS Safari	❌ 不支持	-
Opera	15+	完整支持

7.2 移动端 vs 桌面端差异

平台	Speech Synthesis	Speech Recognition
Android Chrome	✅	✅
iOS Safari	✅	❌
macOS Safari	✅	✅ (macOS 12.3+)
Windows Chrome	✅	✅
Linux Chrome	✅	✅

7.3 兼容代码模板

typescript 复制代码

// 完整的兼容代码模板
const createSpeechObjects = () => {
  // 语音合成
  const synthesis = 'speechSynthesis' in window 
    ? window.speechSynthesis 
    : null;
  
  // 语音识别（带兼容前缀）
  const SpeechRecognitionAPI = 
    window.SpeechRecognition || 
    (window as any).webkitSpeechRecognition || 
    null;
  
  let recognition = null;
  if (SpeechRecognitionAPI) {
    try {
      recognition = new SpeechRecognitionAPI();
      // 默认配置
      recognition.continuous = false;
      recognition.interimResults = false;
    } catch (e) {
      console.error('无法创建语音识别实例:', e);
    }
  }
  
  return { synthesis, recognition };
};

// 使用示例
const { synthesis, recognition } = createSpeechObjects();

if (!synthesis) {
  console.warn('当前浏览器不支持语音合成');
}

if (!recognition) {
  console.warn('当前浏览器不支持语音识别，推荐使用 Chrome 或 Edge');
}

八、总结

8.1 核心要点回顾

要点	说明
API 组成	SpeechRecognition（听）+ SpeechSynthesis（说）
最大优势	浏览器原生、零依赖、免费
最大限制	Speech Recognition Safari/Firefox 不支持
适用场景	原型验证、无障碍功能、Electron 应用、Chrome 扩展
不适合场景	Safari 用户群体、精确度要求高的生产环境

8.2 选型建议

应该使用 Web Speech API 的场景：

✅ 快速验证语音交互概念
✅ Electron 桌面应用（无 Safari 问题）
✅ Chrome 扩展
✅ 无障碍辅助功能
✅ 内部工具（用户浏览器可控）

应该选择云端 API 的场景：

❌ 需要 Safari 支持
❌ 需要更高准确率
❌ 需要离线能力（部分云端 API 支持）
❌ 需要自定义语音模型
❌ 企业级生产环境

8.3 未来展望

Web Speech API 仍在持续演进：

设备端识别：Chrome 正在推进完全离线的语音识别
Firefox 支持：Mozilla 正在实验性支持 Speech Recognition
新事件模型：更细粒度的事件控制
增强的语法支持：更强大的上下文偏向能力

作为前端开发者，掌握 Web Speech API 意味着在语音交互这条赛道上占据了有利位置------无论是为了当下的原型开发，还是未来的交互创新，这个「藏在浏览器里的语音引擎」都值得你花时间去了解。

参考资料：

本文由AI辅助整理