SenseVoice专有名词识别微调完整教程

适用于 Apple M系列 MacBookPro 新手指南

📚 目录

教程简介
前置知识
环境准备
项目结构搭建
数据准备
模型下载与测试
模型训练
模型评估
模型部署
实战案例
常见问题

📖 教程简介

什么是 SenseVoice？

SenseVoice 是阿里巴巴达摩院开源的语音识别模型，特别擅长中文识别，支持热词（专有名词）增强。

本教程目标

✅ 在Apple M系列 MacBookPro上成功配置开发环境
✅ 掌握数据准备的完整流程
✅ 训练一个能识别专有名词的定制模型
✅ 部署模型并进行实际测试

所需时间

环境配置：30分钟
数据准备：2-3小时（取决于数据量）
模型训练：2-8小时（取决于数据量）
部署测试：30分钟

硬件要求

✅ Apple Silicon 芯片MacBookPro
✅ 64GB 内存（32GB 也可以，需要调整batch size）
✅ 至少 50GB 可用存储空间

🎓 前置知识

必须了解的概念

1. 什么是模型微调（Fine-tuning）？

简单理解：就像教一个已经会说中文的人学习专业术语。

基础模型已经会识别普通对话
我们要教它识别特定领域的专有名词（如公司名、产品名、技术术语）

2. 什么是热词（Hotwords）？

热词是你希望模型优先识别的专有名词列表，例如：

公司名：阿里巴巴、腾讯、字节跳动
产品名：通义千问、ChatGPT、SenseVoice
技术术语：机器学习、神经网络、Transformer

3. Apple Silicon 与 CUDA 的区别

特性	NVIDIA CUDA	Apple MPS
适用硬件	NVIDIA GPU	Apple M系列芯片
加速框架	CUDA	Metal Performance Shaders
PyTorch支持	`device='cuda'`	`device='mps'`

重要：Apple Silicon 不支持 CUDA，但有专门的 MPS 加速，性能同样出色！

🔧 环境准备

第一步：安装 Homebrew（如果没有）

打开终端（Terminal），执行：

bash 复制代码

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

第二步：安装 Miniconda

bash 复制代码

# 下载 Miniconda（Apple Silicon 版本）
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh

# 安装
bash Miniconda3-latest-MacOSX-arm64.sh

# 按照提示完成安装，然后重启终端

第三步：创建 Python 环境

bash 复制代码

# 创建专用环境
conda create -n sensevoice python=3.10 -y

# 激活环境
conda activate sensevoice

# 验证 Python 版本
python --version  # 应该显示 Python 3.10.x

第四步：安装核心依赖

bash 复制代码

# 安装 PyTorch（支持 MPS）
pip install torch torchvision torchaudio

# 验证 MPS 是否可用
python -c "import torch; print('MPS可用:', torch.backends.mps.is_available())"
# 应该输出：MPS可用: True

# 安装 FunASR 和 ModelScope
pip install funasr
pip install modelscope

# 安装音频处理库
pip install librosa soundfile
pip install pydub

# 安装其他工具
pip install tqdm  # 进度条
pip install pandas  # 数据处理
pip install flask  # Web服务（部署用）

第五步：验证安装

创建测试脚本：

bash 复制代码

cat > test_env.py << 'EOF'
import torch
import librosa
import soundfile
from funasr import AutoModel

print("="*50)
print("环境检查")
print("="*50)
print(f"PyTorch 版本: {torch.__version__}")
print(f"MPS 可用: {torch.backends.mps.is_available()}")
print(f"Librosa 版本: {librosa.__version__}")
print("✅ 所有依赖安装成功！")
EOF

python test_env.py

如果没有报错，环境配置完成！

📁 项目结构搭建

创建项目目录

bash 复制代码

# 创建主项目文件夹
mkdir ~/SenseVoice_Project
cd ~/SenseVoice_Project

# 创建子目录
mkdir -p data/train/audio
mkdir -p data/train/text
mkdir -p data/dev/audio
mkdir -p data/dev/text
mkdir -p data/test/audio
mkdir -p data/test/text
mkdir -p models
mkdir -p output
mkdir -p scripts
mkdir -p logs

# 查看目录结构
tree -L 3

最终的目录结构

复制代码

SenseVoice_Project/
├── data/                      # 数据目录
│   ├── train/                 # 训练数据
│   │   ├── audio/            # 音频文件（.wav）
│   │   └── text/             # 文本标注
│   ├── dev/                   # 验证数据
│   └── test/                  # 测试数据
├── models/                    # 模型存储
│   └── pretrained/           # 预训练模型
├── output/                    # 训练输出
│   └── checkpoints/          # 模型检查点
├── scripts/                   # 脚本文件
├── logs/                      # 日志文件
├── hotwords.txt              # 热词文件
└── config.yaml               # 配置文件

📊 数据准备

数据要求说明

音频要求

格式：WAV（推荐）或 MP3
采样率：16000 Hz（16kHz）
声道：单声道（Mono）
时长：建议 2-30 秒每条
质量：清晰，无明显噪音

数据量建议

任务规模	最少样本	推荐样本	预期效果
测试验证	50	100	初步验证
小规模	200	500	基本可用
中等规模	1000	2000	较好效果
生产级	5000+	10000+	优秀效果

步骤1：准备原始音频

方法A：使用现有录音

如果你有录音文件：

bash 复制代码

# 将音频文件复制到项目
cp /path/to/your/audio/*.wav ~/SenseVoice_Project/data/raw/

方法B：录制新音频

使用 macOS 自带的"语音备忘录"或安装录音软件：

bash 复制代码

# 安装 SoX（音频处理工具）
brew install sox

# 录制音频（按 Ctrl+C 停止）
rec -r 16000 -c 1 output.wav

方法C：从视频提取音频

bash 复制代码

# 安装 ffmpeg
brew install ffmpeg

# 从视频提取音频
ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav

步骤2：音频预处理

创建音频处理脚本：

bash 复制代码

cat > scripts/preprocess_audio.py << 'EOF'
#!/usr/bin/env python3
"""
音频预处理脚本
功能：转换采样率、转单声道、格式转换
"""

import os
import librosa
import soundfile as sf
from pathlib import Path
from tqdm import tqdm

def preprocess_audio(input_path, output_path, target_sr=16000):
    """
    预处理单个音频文件
    
    参数:
        input_path: 输入音频路径
        output_path: 输出音频路径
        target_sr: 目标采样率（默认16kHz）
    """
    try:
        # 加载音频
        audio, sr = librosa.load(input_path, sr=target_sr, mono=True)
        
        # 保存为WAV格式
        sf.write(output_path, audio, target_sr)
        return True
    except Exception as e:
        print(f"处理失败 {input_path}: {e}")
        return False

def batch_preprocess(input_dir, output_dir, target_sr=16000):
    """
    批量处理音频文件
    
    参数:
        input_dir: 输入目录
        output_dir: 输出目录
        target_sr: 目标采样率
    """
    # 创建输出目录
    os.makedirs(output_dir, exist_ok=True)
    
    # 支持的音频格式
    audio_extensions = ['.wav', '.mp3', '.m4a', '.flac', '.ogg']
    
    # 查找所有音频文件
    input_path = Path(input_dir)
    audio_files = []
    for ext in audio_extensions:
        audio_files.extend(input_path.glob(f'**/*{ext}'))
    
    print(f"找到 {len(audio_files)} 个音频文件")
    
    # 批量处理
    success_count = 0
    for audio_file in tqdm(audio_files, desc="处理音频"):
        # 生成输出文件名
        output_file = Path(output_dir) / f"{audio_file.stem}.wav"
        
        # 处理音频
        if preprocess_audio(str(audio_file), str(output_file), target_sr):
            success_count += 1
    
    print(f"\n✅ 成功处理 {success_count}/{len(audio_files)} 个文件")
    print(f"输出目录: {output_dir}")

if __name__ == "__main__":
    # 使用示例
    batch_preprocess(
        input_dir="../data/raw",        # 原始音频目录
        output_dir="../data/processed", # 处理后的输出目录
        target_sr=16000                 # 16kHz采样率
    )
EOF

# 运行预处理
python scripts/preprocess_audio.py

步骤3：准备文本标注

标注格式说明

文本标注文件（如 train.txt）格式：

复制代码

音频文件名|对应的文本内容

示例：

复制代码

001.wav|欢迎使用阿里巴巴的通义千问
002.wav|SenseVoice是达摩院开发的语音识别模型
003.wav|我们公司使用ModelScope平台进行模型训练

创建标注文件

bash 复制代码

cat > data/train/text/train.txt << 'EOF'
001.wav|欢迎使用阿里巴巴的通义千问
002.wav|SenseVoice是达摩院开发的语音识别模型
003.wav|我们使用FunASR框架进行训练
004.wav|这是一个专有名词识别的示例
005.wav|Apple M4 Max芯片性能非常强大
EOF

步骤4：准备热词文件

创建专有名词列表：

bash 复制代码

cat > hotwords.txt << 'EOF'
阿里巴巴
达摩院
通义千问
SenseVoice
ModelScope
FunASR
ChatGPT
Transformer
Apple
M4 Max
机器学习
深度学习
神经网络
自然语言处理
语音识别
EOF

热词准备技巧：

每行一个词
包含中英文、数字组合
优先添加容易识别错误的专有名词
建议 50-200 个热词

步骤5：数据集划分

创建数据划分脚本：

bash 复制代码

cat > scripts/split_dataset.py << 'EOF'
#!/usr/bin/env python3
"""
数据集划分脚本
将数据分为：训练集(80%)、验证集(10%)、测试集(10%)
"""

import os
import shutil
import random
from pathlib import Path

def split_dataset(data_dir, train_ratio=0.8, dev_ratio=0.1, test_ratio=0.1):
    """
    划分数据集
    
    参数:
        data_dir: 数据目录
        train_ratio: 训练集比例
        dev_ratio: 验证集比例
        test_ratio: 测试集比例
    """
    # 确保比例和为1
    assert abs(train_ratio + dev_ratio + test_ratio - 1.0) < 0.01
    
    # 获取所有音频文件
    audio_dir = Path(data_dir) / "processed"
    audio_files = list(audio_dir.glob("*.wav"))
    
    # 随机打乱
    random.shuffle(audio_files)
    
    # 计算划分点
    total = len(audio_files)
    train_end = int(total * train_ratio)
    dev_end = train_end + int(total * dev_ratio)
    
    # 划分
    train_files = audio_files[:train_end]
    dev_files = audio_files[train_end:dev_end]
    test_files = audio_files[dev_end:]
    
    print(f"总文件数: {total}")
    print(f"训练集: {len(train_files)}")
    print(f"验证集: {len(dev_files)}")
    print(f"测试集: {len(test_files)}")
    
    # 复制文件到对应目录
    for split, files in [("train", train_files), ("dev", dev_files), ("test", test_files)]:
        target_dir = Path(data_dir) / split / "audio"
        target_dir.mkdir(parents=True, exist_ok=True)
        
        for audio_file in files:
            shutil.copy(audio_file, target_dir / audio_file.name)
    
    print("✅ 数据集划分完成！")

if __name__ == "__main__":
    split_dataset("../data")
EOF

python scripts/split_dataset.py

步骤6：生成训练清单

创建清单生成脚本：

bash 复制代码

cat > scripts/create_manifest.py << 'EOF'
#!/usr/bin/env python3
"""
生成训练清单文件（JSONL格式）
"""

import json
import os
from pathlib import Path
import librosa

def create_manifest(audio_dir, text_file, output_file):
    """
    创建训练清单
    
    参数:
        audio_dir: 音频目录
        text_file: 文本标注文件
        output_file: 输出清单文件
    """
    # 读取文本标注
    text_dict = {}
    with open(text_file, 'r', encoding='utf-8') as f:
        for line in f:
            parts = line.strip().split('|')
            if len(parts) == 2:
                text_dict[parts[0]] = parts[1]
    
    # 生成清单
    manifest_data = []
    audio_path = Path(audio_dir)
    
    for audio_file in audio_path.glob("*.wav"):
        filename = audio_file.name
        
        if filename in text_dict:
            # 获取音频时长
            duration = librosa.get_duration(path=str(audio_file))
            
            manifest_data.append({
                "audio_filepath": str(audio_file.absolute()),
                "text": text_dict[filename],
                "duration": round(duration, 2)
            })
    
    # 保存为JSONL
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in manifest_data:
            f.write(json.dumps(item, ensure_ascii=False) + '\n')
    
    print(f"✅ 生成清单文件: {output_file}")
    print(f"   共 {len(manifest_data)} 条数据")

if __name__ == "__main__":
    # 为训练集、验证集、测试集分别生成清单
    for split in ['train', 'dev', 'test']:
        create_manifest(
            audio_dir=f"../data/{split}/audio",
            text_file=f"../data/{split}/text/{split}.txt",
            output_file=f"../data/{split}_manifest.jsonl"
        )
EOF

python scripts/create_manifest.py

数据准备检查清单

完成后检查：

bash 复制代码

# 检查文件结构
tree data -L 2

# 检查清单文件
head -n 3 data/train_manifest.jsonl

# 统计数据量
wc -l data/train_manifest.jsonl
wc -l data/dev_manifest.jsonl
wc -l data/test_manifest.jsonl

🚀 模型下载与测试

步骤1：下载预训练模型

创建下载脚本：

bash 复制代码

cat > scripts/download_model.py << 'EOF'
#!/usr/bin/env python3
"""
下载 SenseVoice 预训练模型
"""

from modelscope import snapshot_download
import os

# 设置缓存目录
cache_dir = os.path.expanduser("~/SenseVoice_Project/models/pretrained")
os.makedirs(cache_dir, exist_ok=True)

print("开始下载 SenseVoice 模型...")
print("这可能需要几分钟，请耐心等待...")

# 下载模型
model_dir = snapshot_download(
    'iic/SenseVoiceSmall',  # 模型ID
    cache_dir=cache_dir,
    revision='master'
)

print(f"\n✅ 模型下载完成！")
print(f"模型位置: {model_dir}")
EOF

python scripts/download_model.py

步骤2：测试预训练模型

创建测试脚本：

bash 复制代码

cat > scripts/test_pretrained.py << 'EOF'
#!/usr/bin/env python3
"""
测试预训练模型
"""

from funasr import AutoModel
import torch

print("="*60)
print("SenseVoice 预训练模型测试")
print("="*60)

# 检测设备
device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"\n使用设备: {device}")

# 加载模型
print("\n正在加载模型...")
model = AutoModel(
    model="iic/SenseVoiceSmall",
    device=device,
    disable_update=True
)
print("✅ 模型加载成功！")

# 测试识别（需要一个测试音频文件）
print("\n" + "="*60)
print("测试识别功能")
print("="*60)

# 如果有测试音频
import os
test_audio = "../data/test/audio/001.wav"

if os.path.exists(test_audio):
    print(f"\n识别音频: {test_audio}")
    
    result = model.generate(
        input=test_audio,
        batch_size=1,
    )
    
    print(f"\n识别结果: {result[0]['text']}")
else:
    print(f"\n⚠️ 未找到测试音频文件: {test_audio}")
    print("请准备一个测试音频后再运行")

print("\n" + "="*60)
print("测试完成！")
print("="*60)
EOF

python scripts/test_pretrained.py

步骤3：测试热词功能

bash 复制代码

cat > scripts/test_hotword.py << 'EOF'
#!/usr/bin/env python3
"""
测试热词识别功能
"""

from funasr import AutoModel
import torch

# 设备
device = "mps" if torch.backends.mps.is_available() else "cpu"

# 加载模型
model = AutoModel(
    model="iic/SenseVoiceSmall",
    device=device
)

# 加载热词
with open("../hotwords.txt", 'r', encoding='utf-8') as f:
    hotwords = f.read()

print("热词列表:")
print(hotwords)
print("\n" + "="*60)

# 测试音频
test_audio = "../data/test/audio/001.wav"

# 不使用热词
print("不使用热词:")
result1 = model.generate(input=test_audio)
print(result1[0]['text'])

print("\n" + "="*60)

# 使用热词
print("使用热词:")
result2 = model.generate(
    input=test_audio,
    hotword=hotwords
)
print(result2[0]['text'])

print("\n" + "="*60)
print("对比识别结果，观察热词的影响")
EOF

python scripts/test_hotword.py

🎯 模型训练

训练配置文件

创建配置文件：

bash 复制代码

cat > config.yaml << 'EOF'
# SenseVoice 训练配置文件

# 模型配置
model:
  name: "iic/SenseVoiceSmall"
  device: "mps"  # M4 Max 使用 MPS

# 数据配置
data:
  train_manifest: "data/train_manifest.jsonl"
  dev_manifest: "data/dev_manifest.jsonl"
  test_manifest: "data/test_manifest.jsonl"
  hotwords: "hotwords.txt"

# 训练参数
training:
  output_dir: "output"
  batch_size: 4          # M4 Max 64GB可以用 4-8
  num_epochs: 20
  learning_rate: 1.0e-5
  warmup_steps: 100
  gradient_accumulation_steps: 2  # 梯度累积
  save_steps: 500        # 每500步保存一次
  logging_steps: 50      # 每50步记录一次
  
# 热词配置
hotword:
  weight: 10.0           # 热词权重（1-20）
  
# 优化器配置
optimizer:
  type: "AdamW"
  weight_decay: 0.01
  
# 学习率调度
scheduler:
  type: "cosine"
  num_warmup_steps: 100
EOF

训练脚本（完整版）

bash 复制代码

cat > scripts/train.py << 'EOF'
#!/usr/bin/env python3
"""
SenseVoice 训练脚本（M4 Max 优化版）
"""

import os
import yaml
import torch
import json
from pathlib import Path
from datetime import datetime
from funasr import AutoModel
from tqdm import tqdm

class SenseVoiceTrainer:
    def __init__(self, config_path="config.yaml"):
        # 加载配置
        with open(config_path, 'r', encoding='utf-8') as f:
            self.config = yaml.safe_load(f)
        
        # 设置设备
        self.device = self._setup_device()
        
        # 创建输出目录
        self.output_dir = self.config['training']['output_dir']
        os.makedirs(self.output_dir, exist_ok=True)
        
        # 设置日志
        self.log_file = f"{self.output_dir}/training_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
        
        self.log("="*60)
        self.log("SenseVoice 训练开始")
        self.log("="*60)
        self.log(f"配置文件: {config_path}")
        self.log(f"输出目录: {self.output_dir}")
        self.log(f"设备: {self.device}")
    
    def _setup_device(self):
        """设置计算设备"""
        if torch.backends.mps.is_available():
            device = "mps"
            self.log("✅ 使用 Apple MPS 加速")
        elif torch.cuda.is_available():
            device = "cuda"
            self.log("✅ 使用 CUDA 加速")
        else:
            device = "cpu"
            self.log("⚠️ 使用 CPU（速度较慢）")
        return device
    
    def log(self, message):
        """记录日志"""
        print(message)
        with open(self.log_file, 'a', encoding='utf-8') as f:
            f.write(f"{message}\n")
    
    def load_model(self):
        """加载模型"""
        self.log("\n正在加载模型...")
        
        model_config = self.config['model']
        self.model = AutoModel(
            model=model_config['name'],
            device=self.device,
            disable_update=True,
            disable_pbar=True
        )
        
        self.log("✅ 模型加载成功")
    
    def load_hotwords(self):
        """加载热词"""
        hotword_file = self.config['data']['hotwords']
        
        if os.path.exists(hotword_file):
            with open(hotword_file, 'r', encoding='utf-8') as f:
                self.hotwords = ' '.join([line.strip() for line in f if line.strip()])
            
            hotword_count = len(self.hotwords.split())
            self.log(f"✅ 加载 {hotword_count} 个热词")
        else:
            self.hotwords = ""
            self.log("⚠️ 未找到热词文件")
    
    def load_data(self):
        """加载数据"""
        self.log("\n正在加载数据...")
        
        data_config = self.config['data']
        
        # 统计数据量
        for split in ['train', 'dev', 'test']:
            manifest_file = data_config[f'{split}_manifest']
            if os.path.exists(manifest_file):
                with open(manifest_file, 'r') as f:
                    count = sum(1 for _ in f)
                self.log(f"{split.capitalize():5s} 集: {count:4d} 条")
            else:
                self.log(f"⚠️ 未找到 {split} 集数据")
    
    def train(self):
        """训练模型"""
        self.log("\n" + "="*60)
        self.log("开始训练")
        self.log("="*60)
        
        training_config = self.config['training']
        data_config = self.config['data']
        
        try:
            # 使用 FunASR 的 finetune 方法
            self.model.finetune(
                train_data=data_config['train_manifest'],
                dev_data=data_config.get('dev_manifest'),
                output_dir=self.output_dir,
                batch_size=training_config['batch_size'],
                max_epoch=training_config['num_epochs'],
                lr=training_config['learning_rate'],
                warmup_steps=training_config['warmup_steps'],
                save_checkpoint_steps=training_config['save_steps'],
                log_interval=training_config['logging_steps'],
                hotword=self.hotwords if self.hotwords else None,
            )
            
            self.log("\n✅ 训练完成！")
            
        except AttributeError:
            # 如果 finetune 方法不可用，使用自定义训练
            self.log("⚠️ FunASR finetune 不可用，使用简化训练流程")
            self.simple_train()
        except Exception as e:
            self.log(f"\n❌ 训练出错: {e}")
            import traceback
            self.log(traceback.format_exc())
    
    def simple_train(self):
        """简化的训练流程（备用方案）"""
        self.log("\n使用简化训练流程...")
        
        # 这里可以实现一个简单的训练循环
        # 或者使用其他训练方法
        
        self.log("⚠️ 简化训练流程需要根据具体情况实现")
        self.log("建议: 使用完整的 FunASR 训练流程")
    
    def evaluate(self):
        """评估模型"""
        self.log("\n" + "="*60)
        self.log("模型评估")
        self.log("="*60)
        
        test_manifest = self.config['data']['test_manifest']
        
        if not os.path.exists(test_manifest):
            self.log("⚠️ 未找到测试集")
            return
        
        # 加载测试数据
        with open(test_manifest, 'r', encoding='utf-8') as f:
            test_data = [json.loads(line) for line in f]
        
        self.log(f"测试样本数: {len(test_data)}")
        
        # 评估
        correct = 0
        total = len(test_data)
        
        for item in tqdm(test_data, desc="评估中"):
            result = self.model.generate(
                input=item['audio_filepath'],
                hotword=self.hotwords,
                batch_size=1,
            )
            
            predicted = result[0]['text']
            reference = item['text']
            
            if predicted == reference:
                correct += 1
        
        accuracy = correct / total * 100
        self.log(f"\n准确率: {accuracy:.2f}%")
    
    def save_config(self):
        """保存训练配置"""
        config_save_path = f"{self.output_dir}/config.yaml"
        with open(config_save_path, 'w', encoding='utf-8') as f:
            yaml.dump(self.config, f, allow_unicode=True)
        self.log(f"配置已保存: {config_save_path}")

def main():
    # 创建训练器
    trainer = SenseVoiceTrainer("../config.yaml")
    
    # 加载模型
    trainer.load_model()
    
    # 加载热词
    trainer.load_hotwords()
    
    # 加载数据
    trainer.load_data()
    
    # 保存配置
    trainer.save_config()
    
    # 开始训练
    trainer.train()
    
    # 评估模型
    # trainer.evaluate()

if __name__ == "__main__":
    main()
EOF

chmod +x scripts/train.py

开始训练

bash 复制代码

# 确保在正确的目录
cd ~/SenseVoice_Project

# 激活环境
conda activate sensevoice

# 开始训练
python scripts/train.py

# 训练过程中可以在另一个终端监控
tail -f output/training_*.log

训练监控

创建监控脚本：

bash 复制代码

cat > scripts/monitor.py << 'EOF'
#!/usr/bin/env python3
"""
训练过程监控
"""

import psutil
import time
import os

def monitor():
    print("="*60)
    print("系统资源监控")
    print("="*60)
    
    while True:
        # CPU
        cpu_percent = psutil.cpu_percent(interval=1)
        
        # 内存
        mem = psutil.virtual_memory()
        mem_used_gb = mem.used / 1024**3
        mem_total_gb = mem.total / 1024**3
        mem_percent = mem.percent
        
        # 显示
        print(f"\r"
              f"CPU: {cpu_percent:5.1f}% | "
              f"内存: {mem_used_gb:.1f}/{mem_total_gb:.1f}GB ({mem_percent:.1f}%)",
              end='', flush=True)
        
        time.sleep(2)

if __name__ == "__main__":
    try:
        monitor()
    except KeyboardInterrupt:
        print("\n监控结束")
EOF

# 在另一个终端运行
python scripts/monitor.py

📈 模型评估

评估脚本

bash 复制代码

cat > scripts/evaluate.py << 'EOF'
#!/usr/bin/env python3
"""
模型评估脚本
"""

import json
import os
from funasr import AutoModel
from tqdm import tqdm
import torch
from jiwer import wer, cer  # 需要安装: pip install jiwer

class ModelEvaluator:
    def __init__(self, model_path, hotwords_path=None):
        # 设备
        self.device = "mps" if torch.backends.mps.is_available() else "cpu"
        
        # 加载模型
        print(f"加载模型: {model_path}")
        self.model = AutoModel(
            model=model_path,
            device=self.device,
            disable_pbar=True
        )
        
        # 加载热词
        self.hotwords = ""
        if hotwords_path and os.path.exists(hotwords_path):
            with open(hotwords_path, 'r', encoding='utf-8') as f:
                self.hotwords = ' '.join([line.strip() for line in f])
    
    def evaluate(self, test_manifest):
        """评估模型"""
        # 加载测试数据
        with open(test_manifest, 'r', encoding='utf-8') as f:
            test_data = [json.loads(line) for line in f]
        
        print(f"\n测试样本数: {len(test_data)}")
        print("="*60)
        
        # 存储结果
        predictions = []
        references = []
        
        # 逐个评估
        for item in tqdm(test_data, desc="评估中"):
            # 识别
            result = self.model.generate(
                input=item['audio_filepath'],
                hotword=self.hotwords,
                batch_size=1,
            )
            
            pred_text = result[0]['text']
            ref_text = item['text']
            
            predictions.append(pred_text)
            references.append(ref_text)
            
            # 显示部分结果
            if len(predictions) <= 5:
                print(f"\n参考: {ref_text}")
                print(f"预测: {pred_text}")
                print(f"匹配: {'✅' if pred_text == ref_text else '❌'}")
        
        # 计算指标
        print("\n" + "="*60)
        print("评估结果")
        print("="*60)
        
        # 完全匹配率
        exact_match = sum(p == r for p, r in zip(predictions, references))
        exact_match_rate = exact_match / len(predictions) * 100
        
        print(f"完全匹配率: {exact_match_rate:.2f}% ({exact_match}/{len(predictions)})")
        
        # WER (Word Error Rate)
        try:
            word_error_rate = wer(references, predictions) * 100
            print(f"词错误率 (WER): {word_error_rate:.2f}%")
        except:
            print("无法计算 WER")
        
        # CER (Character Error Rate) - 对中文更有意义
        try:
            char_error_rate = cer(references, predictions) * 100
            print(f"字错误率 (CER): {char_error_rate:.2f}%")
        except:
            print("无法计算 CER")
        
        # 保存详细结果
        self.save_results(predictions, references, test_manifest)
        
        return {
            'exact_match_rate': exact_match_rate,
            'total': len(predictions),
            'correct': exact_match
        }
    
    def save_results(self, predictions, references, test_manifest):
        """保存评估结果"""
        output_file = test_manifest.replace('.jsonl', '_results.json')
        
        results = []
        for pred, ref in zip(predictions, references):
            results.append({
                'prediction': pred,
                'reference': ref,
                'correct': pred == ref
            })
        
        with open(output_file, 'w', encoding='utf-8') as f:
            json.dump(results, f, ensure_ascii=False, indent=2)
        
        print(f"\n详细结果已保存: {output_file}")

if __name__ == "__main__":
    # 评估预训练模型
    print("评估预训练模型:")
    evaluator1 = ModelEvaluator(
        model_path="iic/SenseVoiceSmall",
        hotwords_path="../hotwords.txt"
    )
    evaluator1.evaluate("../data/test_manifest.jsonl")
    
    # 评估微调后的模型（如果存在）
    finetuned_model = "../output/final_model"
    if os.path.exists(finetuned_model):
        print("\n" + "="*60)
        print("评估微调后的模型:")
        evaluator2 = ModelEvaluator(
            model_path=finetuned_model,
            hotwords_path="../hotwords.txt"
        )
        evaluator2.evaluate("../data/test_manifest.jsonl")
EOF

# 安装评估工具
pip install jiwer

# 运行评估
python scripts/evaluate.py

🚢 模型部署

方案1：命令行工具

bash 复制代码

cat > scripts/cli.py << 'EOF'
#!/usr/bin/env python3
"""
SenseVoice 命令行识别工具
"""

import argparse
import os
import torch
from funasr import AutoModel
import json
from datetime import datetime

def main():
    parser = argparse.ArgumentParser(
        description='SenseVoice 语音识别命令行工具',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
使用示例:
  python cli.py -a audio.wav                    # 基础识别
  python cli.py -a audio.wav -hw hotwords.txt   # 使用热词
  python cli.py -a audio.wav -o result.json     # 保存结果
  python cli.py -a audio.wav -m ./my_model      # 使用自定义模型
        """
    )
    
    parser.add_argument('-a', '--audio', required=True, 
                       help='音频文件路径')
    parser.add_argument('-m', '--model', 
                       default='iic/SenseVoiceSmall',
                       help='模型路径（默认: iic/SenseVoiceSmall）')
    parser.add_argument('-hw', '--hotwords', 
                       help='热词文件路径')
    parser.add_argument('-o', '--output', 
                       help='输出文件路径（JSON格式）')
    parser.add_argument('-v', '--verbose', 
                       action='store_true',
                       help='显示详细信息')
    
    args = parser.parse_args()
    
    # 检查音频文件
    if not os.path.exists(args.audio):
        print(f"❌ 音频文件不存在: {args.audio}")
        return
    
    # 设备检测
    device = "mps" if torch.backends.mps.is_available() else "cpu"
    if args.verbose:
        print(f"使用设备: {device}")
    
    # 加载模型
    if args.verbose:
        print(f"加载模型: {args.model}")
    
    model = AutoModel(
        model=args.model,
        device=device,
        disable_pbar=not args.verbose,
        disable_log=not args.verbose
    )
    
    # 加载热词
    hotwords = ""
    if args.hotwords and os.path.exists(args.hotwords):
        with open(args.hotwords, 'r', encoding='utf-8') as f:
            hotwords = ' '.join([line.strip() for line in f])
        if args.verbose:
            print(f"加载热词: {len(hotwords.split())} 个")
    
    # 识别
    if args.verbose:
        print(f"\n识别音频: {args.audio}")
    
    result = model.generate(
        input=args.audio,
        hotword=hotwords if hotwords else None,
        batch_size=1,
    )
    
    # 显示结果
    print("\n" + "="*60)
    print("识别结果:")
    print("="*60)
    print(result[0]['text'])
    print("="*60)
    
    # 保存结果
    if args.output:
        output_data = {
            'timestamp': datetime.now().isoformat(),
            'audio_file': args.audio,
            'model': args.model,
            'result': result[0]
        }
        
        with open(args.output, 'w', encoding='utf-8') as f:
            json.dump(output_data, f, ensure_ascii=False, indent=2)
        
        print(f"\n✅ 结果已保存: {args.output}")

if __name__ == '__main__':
    main()
EOF

chmod +x scripts/cli.py

# 使用示例
# python scripts/cli.py -a data/test/audio/001.wav -hw hotwords.txt

方案2：Web API 服务

bash 复制代码

cat > scripts/api_server.py << 'EOF'
#!/usr/bin/env python3
"""
SenseVoice Web API 服务
"""

from flask import Flask, request, jsonify, render_template_string
from funasr import AutoModel
import torch
import os
from werkzeug.utils import secure_filename
import uuid

app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024  # 50MB上传限制
app.config['UPLOAD_FOLDER'] = '/tmp/sensevoice_uploads'

# 创建上传目录
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

# 全局变量
MODEL = None
HOTWORDS = ""

def init_model(model_path='iic/SenseVoiceSmall', hotwords_path=None):
    """初始化模型"""
    global MODEL, HOTWORDS
    
    device = "mps" if torch.backends.mps.is_available() else "cpu"
    print(f"🚀 初始化模型，使用设备: {device}")
    
    MODEL = AutoModel(
        model=model_path,
        device=device,
        disable_pbar=True,
        disable_log=True
    )
    
    # 加载热词
    if hotwords_path and os.path.exists(hotwords_path):
        with open(hotwords_path, 'r', encoding='utf-8') as f:
            HOTWORDS = ' '.join([line.strip() for line in f])
        print(f"✅ 加载 {len(HOTWORDS.split())} 个热词")
    
    print("✅ 模型初始化完成")

# HTML 前端页面
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
    <title>SenseVoice 语音识别</title>
    <meta charset="utf-8">
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 50px auto;
            padding: 20px;
        }
        h1 { color: #333; }
        .upload-area {
            border: 2px dashed #ccc;
            border-radius: 10px;
            padding: 40px;
            text-align: center;
            margin: 20px 0;
        }
        .result {
            background: #f5f5f5;
            padding: 20px;
            border-radius: 5px;
            margin: 20px 0;
            min-height: 100px;
        }
        button {
            background: #007bff;
            color: white;
            border: none;
            padding: 10px 20px;
            border-radius: 5px;
            cursor: pointer;
            font-size: 16px;
        }
        button:hover { background: #0056b3; }
        .loading { display: none; }
    </style>
</head>
<body>
    <h1>🎤 SenseVoice 语音识别</h1>
    
    <div class="upload-area">
        <input type="file" id="audioFile" accept="audio/*">
        <br><br>
        <button onclick="transcribe()">开始识别</button>
        <div class="loading" id="loading">识别中...</div>
    </div>
    
    <div class="result" id="result">
        <strong>识别结果：</strong>
        <p id="resultText">等待上传音频...</p>
    </div>
    
    <script>
        async function transcribe() {
            const fileInput = document.getElementById('audioFile');
            const file = fileInput.files[0];
            
            if (!file) {
                alert('请先选择音频文件');
                return;
            }
            
            const formData = new FormData();
            formData.append('audio', file);
            
            document.getElementById('loading').style.display = 'block';
            document.getElementById('resultText').textContent = '识别中...';
            
            try {
                const response = await fetch('/api/transcribe', {
                    method: 'POST',
                    body: formData
                });
                
                const data = await response.json();
                
                if (data.success) {
                    document.getElementById('resultText').textContent = data.text;
                } else {
                    document.getElementById('resultText').textContent = '错误: ' + data.error;
                }
            } catch (error) {
                document.getElementById('resultText').textContent = '错误: ' + error;
            } finally {
                document.getElementById('loading').style.display = 'none';
            }
        }
    </script>
</body>
</html>
"""

@app.route('/')
def index():
    """首页"""
    return render_template_string(HTML_TEMPLATE)

@app.route('/api/transcribe', methods=['POST'])
def transcribe():
    """识别接口"""
    try:
        # 检查文件
        if 'audio' not in request.files:
            return jsonify({'success': False, 'error': '没有音频文件'}), 400
        
        file = request.files['audio']
        if file.filename == '':
            return jsonify({'success': False, 'error': '文件名为空'}), 400
        
        # 保存临时文件
        filename = secure_filename(file.filename)
        unique_filename = f"{uuid.uuid4()}_{filename}"
        filepath = os.path.join(app.config['UPLOAD_FOLDER'], unique_filename)
        file.save(filepath)
        
        # 识别
        result = MODEL.generate(
            input=filepath,
            hotword=HOTWORDS if HOTWORDS else None,
            batch_size=1,
        )
        
        # 删除临时文件
        os.remove(filepath)
        
        # 返回结果
        return jsonify({
            'success': True,
            'text': result[0]['text'],
            'details': result[0]
        })
    
    except Exception as e:
        return jsonify({'success': False, 'error': str(e)}), 500

@app.route('/api/health')
def health():
    """健康检查"""
    device = "mps" if torch.backends.mps.is_available() else "cpu"
    return jsonify({
        'status': 'healthy',
        'device': device,
        'model_loaded': MODEL is not None,
        'hotwords_count': len(HOTWORDS.split()) if HOTWORDS else 0
    })

if __name__ == '__main__':
    # 初始化模型
    init_model(
        model_path='iic/SenseVoiceSmall',  # 或你的微调模型路径
        hotwords_path='../hotwords.txt'
    )
    
    # 启动服务
    print("\n" + "="*60)
    print("🚀 SenseVoice API 服务启动")
    print("="*60)
    print("访问地址: http://localhost:5000")
    print("API文档: http://localhost:5000/api/health")
    print("="*60 + "\n")
    
    app.run(
        host='0.0.0.0',
        port=5000,
        debug=False  # 生产环境设为 False
    )
EOF

# 启动服务
python scripts/api_server.py

方案3：批量处理工具

bash 复制代码

cat > scripts/batch_process.py << 'EOF'
#!/usr/bin/env python3
"""
批量处理音频文件
"""

import os
import json
from pathlib import Path
from funasr import AutoModel
import torch
from tqdm import tqdm
from datetime import datetime

class BatchProcessor:
    def __init__(self, model_path, hotwords_path=None):
        # 设备
        device = "mps" if torch.backends.mps.is_available() else "cpu"
        print(f"使用设备: {device}")
        
        # 加载模型
        print("加载模型...")
        self.model = AutoModel(
            model=model_path,
            device=device,
            ncpu=10  # M4 Max 多核心
        )
        
        # 加载热词
        self.hotwords = ""
        if hotwords_path and os.path.exists(hotwords_path):
            with open(hotwords_path, 'r', encoding='utf-8') as f:
                self.hotwords = ' '.join([line.strip() for line in f])
    
    def process_directory(self, input_dir, output_file):
        """批量处理目录"""
        # 查找音频文件
        audio_files = []
        for ext in ['.wav', '.mp3', '.m4a']:
            audio_files.extend(Path(input_dir).glob(f'**/*{ext}'))
        
        print(f"\n找到 {len(audio_files)} 个音频文件")
        
        # 处理
        results = []
        for audio_file in tqdm(audio_files, desc="处理中"):
            try:
                result = self.model.generate(
                    input=str(audio_file),
                    hotword=self.hotwords if self.hotwords else None,
                    batch_size=1,
                )
                
                results.append({
                    'file': str(audio_file),
                    'filename': audio_file.name,
                    'text': result[0]['text'],
                    'success': True,
                    'timestamp': datetime.now().isoformat()
                })
            except Exception as e:
                results.append({
                    'file': str(audio_file),
                    'filename': audio_file.name,
                    'error': str(e),
                    'success': False,
                    'timestamp': datetime.now().isoformat()
                })
        
        # 保存结果
        with open(output_file, 'w', encoding='utf-8') as f:
            json.dump(results, f, ensure_ascii=False, indent=2)
        
        # 统计
        success = sum(1 for r in results if r['success'])
        print(f"\n✅ 处理完成！")
        print(f"   成功: {success}/{len(results)}")
        print(f"   结果保存: {output_file}")
        
        return results

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(description='批量音频识别')
    parser.add_argument('-i', '--input', required=True, help='输入目录')
    parser.add_argument('-o', '--output', required=True, help='输出文件')
    parser.add_argument('-m', '--model', default='iic/SenseVoiceSmall', help='模型路径')
    parser.add_argument('-hw', '--hotwords', help='热词文件')
    
    args = parser.parse_args()
    
    processor = BatchProcessor(
        model_path=args.model,
        hotwords_path=args.hotwords
    )
    
    processor.process_directory(
        input_dir=args.input,
        output_file=args.output
    )
EOF

# 使用示例
# python scripts/batch_process.py -i data/test/audio -o results.json -hw hotwords.txt

💡 实战案例

案例1：会议录音识别

bash 复制代码

cat > scripts/meeting_transcribe.py << 'EOF'
#!/usr/bin/env python3
"""
会议录音识别系统
支持长音频切分和识别
"""

from funasr import AutoModel
import torch
from pydub import AudioSegment
from pydub.silence import split_on_silence
import os

class MeetingTranscriber:
    def __init__(self, model_path, hotwords_path=None):
        device = "mps" if torch.backends.mps.is_available() else "cpu"
        
        self.model = AutoModel(
            model=model_path,
            device=device
        )
        
        # 加载热词
        self.hotwords = ""
        if hotwords_path:
            with open(hotwords_path, 'r', encoding='utf-8') as f:
                self.hotwords = ' '.join([line.strip() for line in f])
    
    def split_audio(self, audio_file, chunk_length=30000):
        """
        切分长音频
        chunk_length: 每段长度（毫秒），默认30秒
        """
        audio = AudioSegment.from_file(audio_file)
        
        # 按静音切分
        chunks = split_on_silence(
            audio,
            min_silence_len=500,  # 最小静音长度
            silence_thresh=-40,   # 静音阈值
            keep_silence=200      # 保留的静音
        )
        
        # 如果切分失败，按固定长度切分
        if not chunks:
            chunks = [audio[i:i+chunk_length] 
                     for i in range(0, len(audio), chunk_length)]
        
        return chunks
    
    def transcribe_meeting(self, audio_file, output_file):
        """识别会议录音"""
        print(f"处理会议录音: {audio_file}")
        
        # 切分音频
        print("切分音频...")
        chunks = self.split_audio(audio_file)
        print(f"切分为 {len(chunks)} 段")
        
        # 识别每一段
        full_text = []
        temp_dir = "temp_chunks"
        os.makedirs(temp_dir, exist_ok=True)
        
        for i, chunk in enumerate(chunks, 1):
            # 保存临时文件
            chunk_file = f"{temp_dir}/chunk_{i:03d}.wav"
            chunk.export(chunk_file, format="wav")
            
            # 识别
            result = self.model.generate(
                input=chunk_file,
                hotword=self.hotwords,
                batch_size=1
            )
            
            text = result[0]['text']
            full_text.append(f"[{i:03d}] {text}")
            
            print(f"段 {i}/{len(chunks)}: {text}")
        
        # 保存结果
        with open(output_file, 'w', encoding='utf-8') as f:
            f.write('\n'.join(full_text))
        
        # 清理临时文件
        import shutil
        shutil.rmtree(temp_dir)
        
        print(f"\n✅ 识别完成，结果保存至: {output_file}")

if __name__ == "__main__":
    transcriber = MeetingTranscriber(
        model_path="iic/SenseVoiceSmall",
        hotwords_path="../hotwords.txt"
    )
    
    transcriber.transcribe_meeting(
        audio_file="meeting.wav",
        output_file="meeting_transcript.txt"
    )
EOF

案例2：实时语音识别

bash 复制代码

cat > scripts/realtime_transcribe.py << 'EOF'
#!/usr/bin/env python3
"""
实时语音识别（从麦克风）
需要安装: pip install pyaudio
"""

import pyaudio
import wave
import os
from funasr import AutoModel
import torch

class RealtimeTranscriber:
    def __init__(self, model_path, hotwords_path=None):
        device = "mps" if torch.backends.mps.is_available() else "cpu"
        
        self.model = AutoModel(
            model=model_path,
            device=device
        )
        
        # 加载热词
        self.hotwords = ""
        if hotwords_path:
            with open(hotwords_path, 'r', encoding='utf-8') as f:
                self.hotwords = ' '.join([line.strip() for line in f])
        
        # 音频参数
        self.CHUNK = 1024
        self.FORMAT = pyaudio.paInt16
        self.CHANNELS = 1
        self.RATE = 16000
        self.RECORD_SECONDS = 3  # 每3秒识别一次
    
    def record_and_transcribe(self):
        """录音并识别"""
        p = pyaudio.PyAudio()
        
        print("开始实时识别...")
        print("按 Ctrl+C 停止")
        
        try:
            while True:
                # 录音
                stream = p.open(
                    format=self.FORMAT,
                    channels=self.CHANNELS,
                    rate=self.RATE,
                    input=True,
                    frames_per_buffer=self.CHUNK
                )
                
                print("\n🎤 录音中...")
                frames = []
                
                for _ in range(0, int(self.RATE / self.CHUNK * self.RECORD_SECONDS)):
                    data = stream.read(self.CHUNK)
                    frames.append(data)
                
                stream.stop_stream()
                stream.close()
                
                # 保存临时文件
                temp_file = "temp_realtime.wav"
                wf = wave.open(temp_file, 'wb')
                wf.setnchannels(self.CHANNELS)
                wf.setsampwidth(p.get_sample_size(self.FORMAT))
                wf.setframerate(self.RATE)
                wf.writeframes(b''.join(frames))
                wf.close()
                
                # 识别
                result = self.model.generate(
                    input=temp_file,
                    hotword=self.hotwords,
                    batch_size=1
                )
                
                print(f"📝 {result[0]['text']}")
        
        except KeyboardInterrupt:
            print("\n停止识别")
        finally:
            p.terminate()
            if os.path.exists("temp_realtime.wav"):
                os.remove("temp_realtime.wav")

if __name__ == "__main__":
    transcriber = RealtimeTranscriber(
        model_path="iic/SenseVoiceSmall",
        hotwords_path="../hotwords.txt"
    )
    
    transcriber.record_and_transcribe()
EOF

❓ 常见问题

Q1: MPS 不可用怎么办？

检查：

python 复制代码

import torch
print(torch.backends.mps.is_available())
print(torch.backends.mps.is_built())

解决：

确保 macOS >= 12.3
确保使用正确的 PyTorch 版本
重新安装 PyTorch

Q2: 内存不足错误

解决方案：

减小 batch_size

yaml 复制代码

training:
  batch_size: 2  # 从4减到2

使用梯度累积

yaml 复制代码

training:
  gradient_accumulation_steps: 4

清理内存

python 复制代码

import gc
import torch
gc.collect()
if torch.backends.mps.is_available():
    torch.mps.empty_cache()

Q3: 模型加载很慢

解决：

bash 复制代码

# 下载模型到本地
python -c "
from modelscope import snapshot_download
snapshot_download('iic/SenseVoiceSmall', cache_dir='./models')
"

# 使用本地路径
model = AutoModel(model='./models/iic/SenseVoiceSmall')

Q4: 识别效果不好

优化建议：

增加训练数据
- 至少 1000+ 样本
- 覆盖多种场景
提高数据质量
- 清晰的录音
- 准确的标注
- 16kHz 采样率
调整热词
- 添加更多专有名词
- 提高热词权重
增加训练轮次

yaml 复制代码

training:
  num_epochs: 30  # 增加到30

Q5: 如何加速训练？

M4 Max 优化：

yaml 复制代码

training:
  batch_size: 8      # 利用大内存
  num_workers: 8     # 多核心
  pin_memory: false  # MPS不需要

Q6: 如何导出模型？

python 复制代码

# 导出为 ONNX（可选）
model.export_onnx("model.onnx")

# 或直接使用 PyTorch 格式
model.model.save_pretrained("my_model")

Q7: 支持哪些音频格式？

支持的格式：

✅ WAV（推荐）
✅ MP3
✅ M4A
✅ FLAC
✅ OGG

Q8: 如何处理长音频？

python 复制代码

# 使用案例1中的会议识别脚本
# 自动切分长音频

📚 进阶学习资源

官方文档

FunASR: https://github.com/alibaba-damo-academy/FunASR
ModelScope: https://modelscope.cn
SenseVoice: https://github.com/FunAudioLLM/SenseVoice

SenseVoice专有名词识别微调完整教程

适用于 Apple M系列 MacBookPro 新手指南

📚 目录

📖 教程简介

什么是 SenseVoice？

本教程目标

所需时间

硬件要求

🎓 前置知识

必须了解的概念

1. 什么是模型微调（Fine-tuning）？

2. 什么是热词（Hotwords）？

3. Apple Silicon 与 CUDA 的区别

🔧 环境准备

第一步：安装 Homebrew（如果没有）

第二步：安装 Miniconda

第三步：创建 Python 环境

第四步：安装核心依赖

第五步：验证安装

📁 项目结构搭建

创建项目目录

最终的目录结构

📊 数据准备

数据要求说明

音频要求

数据量建议

步骤1：准备原始音频

方法A：使用现有录音

方法B：录制新音频

方法C：从视频提取音频

步骤2：音频预处理

步骤3：准备文本标注

标注格式说明

创建标注文件

步骤4：准备热词文件

步骤5：数据集划分

步骤6：生成训练清单

数据准备检查清单

🚀 模型下载与测试

步骤1：下载预训练模型

步骤2：测试预训练模型

步骤3：测试热词功能

🎯 模型训练

训练配置文件

训练脚本（完整版）

开始训练

训练监控

📈 模型评估

评估脚本

🚢 模型部署

方案1：命令行工具

方案2：Web API 服务

方案3：批量处理工具

💡 实战案例

案例1：会议录音识别

案例2：实时语音识别

❓ 常见问题

Q1: MPS 不可用怎么办？

Q2: 内存不足错误

Q3: 模型加载很慢

Q4: 识别效果不好

Q5: 如何加速训练？

Q6: 如何导出模型？

Q7: 支持哪些音频格式？

Q8: 如何处理长音频？

📚 进阶学习资源

官方文档

推荐阅读