文字转语音 & 语音转文字

一、文字转语音的实现

应用 AVFoundation 中 Speech Kit 的 AVSpeechSynthesizer 能利用系统自带的引擎，无需网络请求，高效的实现（语音合成）文字实时转语音

使用比较简单，直接看实现步骤

导入 AVFoundation 框架
swift 复制代码
```
import AVFoundation
```
创建 AVSpeechSynthesizer 实例

AVSpeechSynthesizer 是 AVFoundation 框架中用于文本转语音的类,需要用实例来进行文本到语音的转换。
swift 复制代码
```
let speechSynthesizer = AVSpeechSynthesizer()
```

配置 AVSpeechUtterance 对象

AVSpeechUtterance 是 AVFoundation 中用于表示文本转语音过程的类。它用于接收需要转语音的文本内容，语音，音调，语速，声音性别等属性。

AVSpeechSynthesisVoice 是设置语音语音相关的类

AVSpeechSynthesisVoice.speechVoices() 是返回所有可用语音的数组

swift 复制代码

let speechUtterance = AVSpeechUtterance(string: text)
speechUtterance.rate = 0.5 // 语速
speechUtterance.pitchMultiplier = 1.2 // 音调

// 设置语音类型和性别
let availableVoices = AVSpeechSynthesisVoice.speechVoices()
if let englishVoice = availableVoices.first(where: { $0.language == "en-US" && $0.gender == .female }) {
   speechUtterance.voice = englishVoice
} else {
   print("No suitable voice found")
}

播放语音

配置好 AVSpeechUtterance 对象，你可以使用 AVSpeechSynthesizer 实例的 speak(_:) 方法来播放语音
swift 复制代码
```
speechSynthesizer.speak(speechUtterance)
```

语音播放控制和状态判断

状态判断：是否在语音播放中、暂停

swift 复制代码

open var isSpeaking: Bool { get }

open var isPaused: Bool { get }

暂停和停止

swift 复制代码

open func stopSpeaking(at boundary: AVSpeechBoundary) -> Bool

open func pauseSpeaking(at boundary: AVSpeechBoundary) -> Bool

停止方式的选择枚举

swift 复制代码

public enum AVSpeechBoundary : Int, @unchecked Sendable {
  // 立即停止    
  case immediate = 0
  // 播放完当前的词
  case word = 1
}

AVSpeechSynthesizerDelegate 监听语音播放的情况

例如：暂停，取消，完成，开始这4种状态的变化

swift 复制代码

// 暂停
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didPause utterance: AVSpeechUtterance) {    
}

// 完成
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
}

didCance,  didStart
......

注意点：

调用 stopSpeaking 和语音播放完成都会触发 didFinish 代理方法（需要自己区分）
AVSpeechSynthesizer 本身不能自动区分输入的文字是哪种语音，当于设置的不一致时，或含emoji 转语音后播放的内容会异常
设备设置了静音，不单独设置 AVAudioSession，AVSpeechSynthesizer 语音会没声音

二、录音的实现

系统实现录音的方式有多种，常见的3种各有特点，简单介绍如下:

`AVFAudio Framework` 的 `AVAudioRecorder`

单纯的录音，并将录音的文件写入指定沙盒目录，不能实时获取。

swift 复制代码

// 创建 AVAudioRecorder 实例
var audioRecorder: AVAudioRecorder

func setupAudioRecorder() {
    let audioFilename = getDocumentsDirectory().appendingPathComponent("recordedAudio.wav")

    let settings: [String: Any] = [
        AVFormatIDKey: kAudioFormatLinearPCM,
        AVSampleRateKey: 44100.0,
        AVNumberOfChannelsKey: 1,
        AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
    ]

    do {
        audioRecorder = try AVAudioRecorder(url: audioFilename, settings: settings)
        audioRecorder.delegate = self
        audioRecorder.prepareToRecord()
    } catch {
        // 处理创建 AVAudioRecorder 实例的错误
    }
}

// 开始和停止录音
func startRecording() {
    if !audioRecorder.isRecording {
        audioRecorder.record()
    }
}

func stopRecording() {
    if audioRecorder.isRecording {
        audioRecorder.stop()
    }
}

// 处理录音完成后的事件
extension YourViewController: AVAudioRecorderDelegate {
    func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully flag: Bool) {
        if flag {
            // 录音完成，可以处理录音文件，例如保存路径或处理录音数据
        } else {
            // 录音失败
        }
    }
}

`AVFoundation` 的 `AVCaptureAudioDataOutput`

通过代理来接收音频数据的 buffer，这个类似于相机捕获图片时的代理方法。

swift 复制代码

init() {
// 创建 AVCaptureSession
    captureSession = AVCaptureSession()

        // 设置音频输入设备（麦克风）
        if let audioDevice = AVCaptureDevice.default(for: .audio) {
            do {
                let audioInput = try AVCaptureDeviceInput(device: audioDevice)
                if captureSession.canAddInput(audioInput) {
                    captureSession.addInput(audioInput)
                }
            } catch {
                print("Error setting up audio input: \(error.localizedDescription)")
            }
        }

        // 设置音频数据输出
        let audioOutput = AVCaptureAudioDataOutput()
        audioOutput.setSampleBufferDelegate(self, queue: DispatchQueue.global(qos: .utility))
        if captureSession.canAddOutput(audioOutput) {
            captureSession.addOutput(audioOutput)
        }
}

func startCapturing() {
        if !captureSession.isRunning {
            captureSession.startRunning()
        }
    }

    func stopCapturing() {
        if captureSession.isRunning {
            captureSession.stopRunning()
        }
    }

    // AVCaptureAudioDataOutputSampleBufferDelegate 方法，获取音频数据的buffer
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        // 处理音频数据的sampleBuffer
        // 在这里可以获取并处理音频数据的buffer
    }

`AVFoundation` 的 `AVAudioEngine`

简洁的方式直接 block 回调音频数据的 buffer。

swift 复制代码

let audioEngine = AVAudioEngine()
audioEngine.inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: audioEngine.inputNode.outputFormat(forBus: 0)) { [weak self] (buffer, time) in
    // 处理音频数据
 }
//开启音频录入引擎
  try audioEngine.start()

func stopRecording() {
     audioEngine.stop()
     audioEngine.inputNode.removeTap(onBus: 0)
  }

总结：音频处理是一个非常复杂的技术，我们最终选择了代码简洁的 AVAudioEngine 来实现录音。虽然文章说及时性比AVCaptureAudioDataOutput 慢，由于我们没有对音频做很多处理，体验上差异不明显；

三、AVAudioSession 的设置

激活 AVAudioSession, 设置 Category， Option Category、Option枚举含义介绍

AVAudioSessionCategoryPlayAndRecord：支持音频录制和播放，声音在没有外设的情况下，默认为听筒播放，不受手机静音影响。

AVAudioSessionCategoryOptionDefaultToSpeaker：允许改变音频session默认选择内置扬声器（免提）；仅支持 AVAudioSessionCategoryPlayAndRecord

swift 复制代码

// AVAudioSession 是全局的，容易被其他的代码覆盖掉
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, options: .defaultToSpeaker)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)

注意： 设置错误 Category、Option 会 crash，且AVAudioSession是与硬件关联的，不管是你的APP其他地方重新给设置了，还是其他的 APP设置了 AVAudioSession 的Category， Option 都会影响

四、语音转文字

系统自带的 Speech Kit 实现语音识别能识别语音的内容，SFSpeechRecognizer 需要在有网下才能识别

简单认识语音识别相关的类

SFSpeechRecognizer（语音识别器）：负责处理语音识别任务，它将录制音频，并将其发送到苹果的服务器进行处理。它提供了语音识别请求创建，支持的语音识别语言功能。

SFSpeechAudioBufferRecognitionRequest（语音识别请求） ：继承自 SFSpeechRecognitionRequest 它代表了一个语音识别请求。它定义了识别任务的参数。

SFSpeechRecognitionTask（语音识别任务）：它来控制和管理识别任务的执行，控制任务的暂停、取消和结束。

SFSpeechRecognitionResult（语音识别结果）： 它包含了识别出的文本、置信度分数等信息。

语音转文字的实现步骤

申请权限

info.plist 中配置权限申请说明，代码中申请权限 Privacy - Speech Recognition Usage Description

swift 复制代码

SFSpeechRecognizer.requestAuthorization { [weak self] authStatus in
   switch authStatus {
   case .authorized:
     // 用户已授权，可以开始语音识别
   case .denied, .restricted:
    // 用户未授权或不可用，处理错误情况
    case .notDetermined:
      // 未授权
  }

导入头文件，创建实例

swift 复制代码

import Speech

let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
let request = SFSpeechAudioBufferRecognitionRequest()

音频导入到语音识别请求

AVAudioEngine 的录入的音频 buffer 加入到 SFSpeechAudioBufferRecognitionRequest 实例中。

swift 复制代码

audioEngine.inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: audioEngine.inputNode.outputFormat(forBus: 0)) { [weak self] (buffer, time) in
     self?.request.append(buffer)
 }

开启语音识别任务

SFSpeechRecognizer的实例开启语音识别，并返回一个 SFSpeechRecognitionTask， resultHandler 参数中返回识别的结果（SFSpeechRecognitionResult）
swift 复制代码
```
recognitionTask = speechRecognizer?.recognitionTask(with: request, resultHandler: { [weak self] result, error in
       //  request 是 SFSpeechRecognitionResult 的实例
  })
```
取出语音识别的结果

SFSpeechRecognitionResult内其实可能返回多个结果（用var transcriptions：[SFTranscription] 接收），我们通常用最可信的那一个 var bestTranscription: SFTranscription。
swift 复制代码
```
//取出识别的文字结果 
let transcription = result.bestTranscription.formattedString
```

注意点：

必须有网才能实现语音识别
录入的音频的语言和 SFSpeechRecognizer 不一致，会识别不出来

文字转语音 & 语音转文字

一、文字转语音的实现

二、录音的实现

AVFAudio Framework 的 AVAudioRecorder

AVFoundation 的 AVCaptureAudioDataOutput

AVFoundation 的 AVAudioEngine

三、AVAudioSession 的设置

四、语音转文字

简单认识语音识别相关的类

语音转文字的实现步骤

`AVFAudio Framework` 的 `AVAudioRecorder`

`AVFoundation` 的 `AVCaptureAudioDataOutput`

`AVFoundation` 的 `AVAudioEngine`