在 Flutter 鸿蒙项目里接入语音识别的完整思路

适合谁看

想在 Flutter 鸿蒙项目里接入语音识别的人
已经会写 MethodChannel，但还没把语音链路跑通的人
想理解权限、引擎、回调和页面状态如何配合的人
想把语音识别和 AI 对话串联起来的开发者

问题背景

语音识别表面上像一个按钮能力，实际上是一条横跨三层的完整链路：

Flutter 页面层 要能发起开始和停止，还要在"正在听"时给用户视觉反馈
Flutter 协调层 要把识别结果拿回来后决定下一步（填入输入框？直接发给 AI？）
鸿蒙原生层 要先拿到麦克风权限，再创建识别引擎，注册回调，最后在结束时回收资源

如果这几层没有提前分开，最后就会变成"能识别，但不好用"的功能------要么权限没声明导致闪退，要么引擎没清理导致第二次调用失败，要么页面状态和识别状态混在一起导致 UI 闪烁。

项目中的真实场景

食界探味的语音识别服务于 AI 探味助手。用户在 AI 助手页面可以"按住说话"，松手后识别结果自动提交给 AI，整个体验就像和语音助手对话一样自然。

这条链路涉及的代码分布在五层：

复制代码

app/lib/features/ai_assistant/screens/ai_assistant_screen.dart   ← 页面层（UI + 交互）
app/lib/core/ai/ai_explore_coordinator.dart                      ← 协调层（状态编排）
app/lib/core/ai/models/ai_session_state.dart                     ← 状态模型
app/lib/core/platform/speech_recognition_channel.dart            ← 平台通道（Flutter 侧）
app/ohos/entry/src/main/ets/plugins/SpeechRecognitionPlugin.ets  ← 鸿蒙插件（ArkTS 侧）
app/ohos/entry/src/main/module.json5                             ← 工程配置（权限声明）

页面层并不直接碰 Core Speech Kit，甚至不直接调 channel，而是统一通过 AiExploreCoordinator 编排。

整体架构一览

复制代码

┌─────────────────────────────────────────────────────────────┐
│                    Flutter (Dart)                            │
│                                                             │
│  AiAssistantScreen                                          │
│  ┌──────────────┐    ┌─────────────────────────────────┐    │
│  │ 按住说话按钮  │───▶│ AiExploreCoordinator            │    │
│  │ 松手停止     │    │  startVoiceInput()              │    │
│  └──────────────┘    │  stopVoiceInput()               │    │
│                      │  拿到文本 → submitQuery()       │    │
│                      └──────────┬──────────────────────┘    │
│                                 │                            │
│                      ┌──────────▼──────────────────────┐    │
│                      │ SpeechRecognitionChannel        │    │
│                      │  startListening() → Future<Str> │    │
│                      │  stopListening() → Future<void> │    │
│                      └──────────┬──────────────────────┘    │
│ ────────────────────────────────┼───────────────────────────│
│                    MethodChannel('com.foodvoyage.speech_recognition')
│ ────────────────────────────────┼───────────────────────────│
│                   HarmonyOS (ArkTS)                          │
│                      ┌──────────▼──────────────────────┐    │
│                      │ SpeechRecognitionPlugin         │    │
│                      │  1. requestMicrophonePermission │    │
│                      │  2. createEngine()              │    │
│                      │  3. setupListener()             │    │
│                      │  4. startListening()            │    │
│                      │  5. onResult → success(text)    │    │
│                      │  6. shutdownEngine()            │    │
│                      └──────────┬──────────────────────┘    │
│                                 │                            │
│                      ┌──────────▼──────────────────────┐    │
│                      │ Core Speech Kit                  │    │
│                      │  speechRecognizer.createEngine  │    │
│                      │  RecognitionListener 回调        │    │
│                      └─────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

核心实现

整条链路的时序是这样的：

Flutter 页面检测到用户按下语音按钮，调用 coordinator.startVoiceInput()
协调器把状态切到 listening，同时调用 SpeechRecognitionChannel.startListening()
ArkTS 插件收到调用，先申请 ohos.permission.MICROPHONE
权限通过后，创建 speechRecognizer 引擎
引擎注册监听器，开始监听音频
用户松手，页面调用 coordinator.stopVoiceInput() → stopListening()
插件调用 asrEngine.finish()，触发引擎回调
onResult 回调里拿到最终文本，通过 pendingResult.success() 回传 Flutter
协调器拿到文本，自动调用 submitQuery() 提交给 AI
页面状态从 listening 切到 parsing → searching → responding → idle

这套设计里最重要的一点，是把"识别本身"和"识别后的业务动作"分开。语音识别层只负责把文本拿回来，至于要不要直接发给 AI、要不要展示在输入框里，应该由 Flutter 业务层决定。

状态流转

识别过程涉及的核心状态定义在 ai_session_state.dart 里：

复制代码

enum AiSessionStatus {
  idle,        // 空闲
  listening,   // 正在聆听（语音识别中）
  parsing,     // 正在理解用户需求
  searching,   // 正在搜索菜品
  responding,  // AI 正在回复
  speaking,    // TTS 播报中
  error,       // 出错
}

语音识别相关的状态流转：

复制代码

idle ──(按下说话)──▶ listening ──(拿到文本)──▶ parsing ──▶ searching ──▶ responding ──▶ idle
                       │                                                                        ▲
                       │──(未听清/出错)──▶ error ──(重试)──────────────────────────────────────┘
                       │
                       └──(用户松手)──▶ finish 识别 ──▶ onResult 回传

页面层只需要 watch 状态变化就能驱动 UI，不需要自己维护识别生命周期。

关键代码位置

app/lib/core/platform/speech_recognition_channel.dart --- Flutter 侧 MethodChannel 封装
app/lib/core/ai/ai_explore_coordinator.dart --- 协调器，串联语音输入和 AI 对话
app/lib/core/ai/models/ai_session_state.dart --- 会话状态枚举和模型
app/lib/features/ai_assistant/screens/ai_assistant_screen.dart --- AI 助手页面 UI
app/ohos/entry/src/main/ets/plugins/SpeechRecognitionPlugin.ets --- 鸿蒙侧识别插件
app/ohos/entry/src/main/module.json5 --- 麦克风权限声明

鸿蒙侧实现

鸿蒙侧的重点在 SpeechRecognitionPlugin.ets。整个插件围绕一个核心思路：一次识别请求对应一次引擎生命周期。

插件内部状态

复制代码

export default class SpeechRecognitionPlugin implements FlutterPlugin, MethodCallHandler {
  private channel: MethodChannel | null = null;
  private asrEngine: speechRecognizer.SpeechRecognitionEngine | null = null;
  private sessionId: string = '10000';
  private pendingResult: MethodResult | null = null;  // 关键：悬挂的 MethodResult
}

pendingResult 是整个插件的灵魂。Flutter 侧 startListening() 是一个 Future，它在 ArkTS 侧对应的就是这个 MethodResult。识别没完成时它一直挂着，直到 onResult 回调拿到最终文本，才通过 success() 把结果还给 Flutter。

方法入口：只暴露两个

复制代码

onMethodCall(call: MethodCall, result: MethodResult): void {
  switch (call.method) {
    case 'startListening':
      this.handleStartListening(call, result);
      break;
    case 'stopListening':
      this.handleStopListening(result);
      break;
    default:
      result.notImplemented();
      break;
  }
}

插件对外只暴露 startListening 和 stopListening，这意味着权限申请、引擎创建、监听器注册这些细节全部封闭在插件内部，Flutter 侧完全不需要感知。

启动流程：权限 → 引擎 → 监听 → 开始

复制代码

private async handleStartListening(call: MethodCall, result: MethodResult): Promise<void> {
  this.pendingResult = result;

  // 第一步：申请麦克风权限
  const hasPermission = await this.requestMicrophonePermission();
  if (!hasPermission) {
    this.pendingResult = null;
    result.error('PERMISSION_DENIED', '麦克风权限被拒绝', null);
    return;
  }

  // 第二步：创建引擎
  // 第三步：注册监听器
  // 第四步：开始监听
  try {
    await this.createEngine();
    this.setupListener();
    this.startListening();
  } catch (err) {
    this.pendingResult = null;
    const error = err as BusinessError;
    result.error('ASR_ERROR', `语音识别启动失败: ${error.message}`, null);
  }
}

这个顺序很重要：先权限、再引擎、再监听、最后开始。如果引擎创建失败了，权限已经申请过不会重复弹窗；如果权限被拒了，引擎根本不会被创建，避免浪费资源。

权限申请

复制代码

private async requestMicrophonePermission(): Promise<boolean> {
  const atManager = abilityAccessCtrl.createAtManager();
  const permissions: Permissions[] = ['ohos.permission.MICROPHONE'];
  const context = getContext(this);
  const grantResult = await atManager.requestPermissionsFromUser(context, permissions);
  return grantResult.authResults.every(
    status => status === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED
  );
}

注意：运行期申请权限和工程里声明权限是两件事。module.json5 里声明的是"我需要这个权限"，而这里 requestPermissionsFromUser 才是真正弹窗问用户。

引擎创建参数

复制代码

private createEngine(): Promise<void> {
  return new Promise((resolve, reject) => {
    const extraParam: Record<string, Object> = {
      'locate': 'CN',
      'recognizerMode': 'short'
    };
    const initParams: speechRecognizer.CreateEngineParams = {
      language: 'zh-CN',
      online: 1,
      extraParams: extraParam
    };

    speechRecognizer.createEngine(initParams, (err, engine) => {
      if (!err) {
        this.asrEngine = engine;
        resolve();
      } else {
        reject(err);
      }
    });
  });
}

关键参数说明：

language: 'zh-CN' --- 中文识别
online: 1 --- 使用在线识别（精度更高）
recognizerMode: 'short' --- 短语音模式，适合按住说话的场景

监听器：只在最终结果时回传

复制代码

private setupListener(): void {
  const listener: speechRecognizer.RecognitionListener = {
    onStart: (sessionId, eventMessage) => {
      console.info(TAG, `onStart sessionId: ${sessionId}`);
    },
    onResult: (sessionId, result) => {
      // 只有最终结果才回传 Flutter
      if (result.isLast && this.pendingResult) {
        this.pendingResult.success(result.result);
        this.pendingResult = null;
        this.shutdownEngine();  // 立即回收
      }
    },
    onComplete: (sessionId, eventMessage) => {
      // 兜底：如果 onResult 没拿到 isLast，这里也要收口
      if (this.pendingResult) {
        this.pendingResult.success('');
        this.pendingResult = null;
      }
      this.shutdownEngine();
    },
    onError: (sessionId, errorCode, errorMessage) => {
      if (this.pendingResult) {
        this.pendingResult.error('ASR_ERROR', errorMessage, null);
        this.pendingResult = null;
      }
      this.shutdownEngine();
    }
  };
  this.asrEngine.setListener(listener);
}

这里最关键的设计是：只在 result.isLast 时才回传。中间的识别片段（partial results）全部忽略，Flutter 侧拿到的就是一个干净的最终字符串。这比把所有中间片段都推给 Flutter 要简单得多------页面不需要维护"正在显示部分识别结果"的逻辑。

三个出口统一清理

无论成功、完成还是出错，最后都会调用 shutdownEngine()：

复制代码

private shutdownEngine(): void {
  if (this.asrEngine) {
    this.asrEngine.shutdown();
    this.asrEngine = null;
  }
}

这意味着每次识别结束后，引擎都会被销毁。下一次调用 startListening 时会重新创建。这种"用完即弃"的策略对于短语音输入场景非常合适------避免了引擎长期持有导致的资源泄漏和状态混乱。

停止识别

复制代码

private handleStopListening(result: MethodResult): void {
  if (this.asrEngine) {
    this.asrEngine.finish(this.sessionId);
  }
  result.success(null);
}

finish() 会通知引擎"用户已经说完"，引擎随后会通过 onResult 回调返回最终识别结果。注意 stopListening 本身不回传识别文本------它只是触发引擎结束，真正的文本还是通过 pendingResult.success() 回传的。

工程配置兜底

module.json5 里必须声明麦克风权限：

复制代码

{
  "name": "ohos.permission.MICROPHONE",
  "reason": "$string:mic_reason",
  "usedScene": {
    "abilities": ["EntryAbility"],
    "when": "inuse"
  }
}

when: "inuse" 表示只在应用使用期间申请，这比 always 更容易被用户接受，也符合鸿蒙的隐私规范。

Flutter 侧实现

Flutter 侧的设计原则是：保持接口极薄，复杂度全部留给鸿蒙插件。

平台通道封装

speech_recognition_channel.dart 只有 19 行：

复制代码

class SpeechRecognitionChannel {
  static const _channel = MethodChannel('com.foodvoyage.speech_recognition');

  /// 开始语音识别，返回最终识别结果文本
  static Future<String> startListening({String language = 'zh-CN'}) async {
    final result = await _channel.invokeMethod<String>(
      'startListening',
      {'language': language},
    );
    return result ?? '';
  }

  /// 停止语音识别
  static Future<void> stopListening() async {
    await _channel.invokeMethod<void>('stopListening');
  }
}

startListening() 是一个同步式的 Future 接口------调用方一行代码拿到最终文本，完全不需要理解回调和监听器的存在。

协调器：串联语音输入和 AI 对话

AiExploreCoordinator 是语音识别和 AI 之间的桥梁。它做的事情很简单：

复制代码

/// 语音输入
Future<void> startVoiceInput() async {
  if (!mounted) return;
  state = state.copyWith(
    status: AiSessionStatus.listening,
    errorMessage: null,
  );

  try {
    final text = await SpeechRecognitionChannel.startListening();
    if (!mounted) return;
    if (text.isEmpty) {
      state = state.copyWith(
        status: AiSessionStatus.error,
        errorMessage: '未听清，请再说一次',
      );
      return;
    }
    // 拿到文本后自动提交给 AI
    await submitQuery(text);
  } catch (e) {
    AppLogger.error('[AI助手] 语音识别出错: $e');
    if (!mounted) return;
    state = state.copyWith(
      status: AiSessionStatus.error,
      errorMessage: '语音识别出错，请手动输入',
    );
  }
}

/// 停止语音输入
Future<void> stopVoiceInput() async {
  try {
    await SpeechRecognitionChannel.stopListening();
  } catch (_) {}
}

这里有三层异常处理：

识别结果为空 → 提示"未听清，请再说一次"，状态回 error
识别过程异常 → 提示"语音识别出错，请手动输入"，降级到文字输入
停止时异常 → 静默吞掉（因为停止只是一个收尾动作，不应该影响流程）

还有一个 mounted 检查贯穿全程------因为 startListening() 是异步的，用户可能在识别过程中就退出了页面，此时 coordinator 已经被 dispose，所有状态更新都会被跳过。

页面层：按住说话，松手停止

在 ai_assistant_screen.dart 里，语音按钮用的是 GestureDetector 的 pan 手势：

复制代码

GestureDetector(
  onPanDown: (_) => onVoiceStart(),    // 按下 → 开始识别
  onPanEnd: (_) => onVoiceEnd(),       // 松手 → 停止识别
  onPanCancel: () => onVoiceEnd(),     // 手势取消 → 停止识别
  child: Container(
    // ... 按钮 UI
    child: const Text('按住说话'),
  ),
)

对应的回调：

复制代码

onVoiceStart: () => coordinator.startVoiceInput(),
onVoiceEnd: () => coordinator.stopVoiceInput(),

页面不需要关心识别引擎的细节，它只需要知道两件事：

按下时调 startVoiceInput()
松手时调 stopVoiceInput()

剩下的状态变化（listening → parsing → responding → idle）全部通过 Riverpod 的 ref.watch(aiExploreCoordinatorProvider) 自动驱动 UI 更新。

在输入栏里，isListening 状态会改变按钮的外观：

复制代码

AiInputBar(
  onVoiceStart: () => coordinator.startVoiceInput(),
  onVoiceEnd: () => coordinator.stopVoiceInput(),
  isListening: sessionState.status == AiSessionStatus.listening,
  // ...
)

这就是整个语音识别链路在页面层的全部代码------没有一行涉及引擎、权限或回调。

常见坑

只申请权限，不声明权限 --- 运行期 requestPermissionsFromUser 调了，但 module.json5 里没有写 ohos.permission.MICROPHONE，结果权限申请直接失败
中间结果和最终结果混在一起回传 --- onResult 里不做 isLast 判断，每次回调都 success()，导致 Flutter 侧收到多次结果或 pendingResult 被重复调用
页面层直接依赖识别引擎 --- 把 MethodChannel 调用散落在多个 widget 里，后续想替换实现（比如换成第三方 SDK）时无处下手
识别失败后不清理引擎 --- onError 里只报了错但没有 shutdownEngine()，导致引擎一直挂着，下次 createEngine 可能冲突
pendingResult 没有在所有出口置空 --- 成功路径置空了，但 onComplete 和 onError 路径忘了，导致 pendingResult 被复用时行为异常
异步过程中不做 mounted 检查 --- 用户在识别期间退出页面，coordinator 被 dispose 后还在更新 state，导致 StateError
停止识别时期望直接拿到结果 --- stopListening() 返回的是 void，真正的识别文本是通过之前的 pendingResult.success() 回传的，搞混了就会拿不到文本

可复用模板

Flutter 侧 Channel 封装

复制代码

import 'package:flutter/services.dart';

class SpeechRecognitionChannel {
  static const _channel = MethodChannel('com.yourapp.speech_recognition');

  static Future<String> startListening({String language = 'zh-CN'}) async {
    final result = await _channel.invokeMethod<String>(
      'startListening',
      {'language': language},
    );
    return result ?? '';
  }

  static Future<void> stopListening() async {
    await _channel.invokeMethod<void>('stopListening');
  }
}

Flutter 侧协调器调用

复制代码

Future<void> startVoiceInput() async {
  if (!mounted) return;
  state = state.copyWith(status: AiSessionStatus.listening);

  try {
    final text = await SpeechRecognitionChannel.startListening();
    if (!mounted) return;
    if (text.isEmpty) {
      state = state.copyWith(status: AiSessionStatus.error, errorMessage: '未听清');
      return;
    }
    await submitQuery(text);  // 由业务层决定后续行为
  } catch (e) {
    if (!mounted) return;
    state = state.copyWith(status: AiSessionStatus.error, errorMessage: '识别出错');
  }
}

ArkTS 侧插件骨架

复制代码

private async handleStartListening(call: MethodCall, result: MethodResult): Promise<void> {
  this.pendingResult = result;

  const hasPermission = await this.requestMicrophonePermission();
  if (!hasPermission) {
    this.pendingResult = null;
    result.error('PERMISSION_DENIED', '麦克风权限被拒绝', null);
    return;
  }

  try {
    await this.createEngine();
    this.setupListener();
    this.startListening();
  } catch (err) {
    this.pendingResult = null;
    result.error('ASR_ERROR', '启动失败', null);
  }
}

// 监听器核心：只在 isLast 时回传，所有出口都清理
private setupListener(): void {
  const listener: speechRecognizer.RecognitionListener = {
    onResult: (sessionId, result) => {
      if (result.isLast && this.pendingResult) {
        this.pendingResult.success(result.result);
        this.pendingResult = null;
        this.shutdownEngine();
      }
    },
    onError: (sessionId, code, msg) => {
      if (this.pendingResult) {
        this.pendingResult.error('ASR_ERROR', msg, null);
        this.pendingResult = null;
      }
      this.shutdownEngine();
    },
    // onComplete 兜底...
  };
  this.asrEngine.setListener(listener);
}

工程配置

复制代码

// module.json5
"requestPermissions": [
  {
    "name": "ohos.permission.MICROPHONE",
    "reason": "$string:mic_reason",
    "usedScene": {
      "abilities": ["EntryAbility"],
      "when": "inuse"
    }
  }
]

本篇总结

语音识别不是一个按钮能力，而是一条横跨 Flutter 和鸿蒙双端的完整调用链
鸿蒙侧 负责权限申请、引擎创建、监听器注册、结果回传和资源回收，复杂度全部封闭在插件内部
Flutter 侧 负责交互决策和状态编排，通过极薄的 Channel 封装暴露同步式接口
最关键的设计决策有三点：① pendingResult 把一次请求和一次回传绑定；② 只在 isLast 时回传最终文本；③ 所有出口统一 shutdownEngine()
先把这条链路的边界划清，再接 AI 或搜索这类业务，整体会稳很多