OmniVoice:支持 600+ 语言的零样本语音克隆 TTS 系统
k2-fsa 团队在 GitHub 上开源了 OmniVoice 项目,这是一个基于扩散语言模型架构的多语言文本转语音(TTS)系统,具备零样本语音克隆能力,支持 600 多种语言的高质量语音生成。
核心功能
- 零样本语音克隆:仅需参考音频片段和对应文本即可克隆说话人声音
- 语音属性控制:通过自然语言指令调整性别、音高、口音等特征
- 多语言支持:覆盖 600+ 语言(完整列表见 docs/languages.md)
- 高效推理:实时因子(RTF)最低可达 0.025
- 精细控制:支持非言语符号标记和发音控制(如中文拼音声调)
项目采用扩散语言模型架构,在 Hugging Face 平台开源(k2-fsa/OmniVoice),采用 Apache-2.0 协议。
安装指南
建议使用 Python 虚拟环境,需先安装 PyTorch:
bash
# NVIDIA GPU
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
# Apple Silicon
pip install torch==2.8.0 torchaudio==2.8.0
安装 OmniVoice:
bash
pip install omnivoice
# 或安装最新版本
pip install git+https://github.com/k2-fsa/OmniVoice.git
使用示例
语音克隆:
python
from omnivoice import OmniVoice
import torchaudio
model = OmniVoice.from_pretrained("k2-fsa/OmniVoice", device_map="cuda:0", dtype=torch.float16)
audio = model.generate(
text="Hello, this is a test of zero-shot voice cloning.",
ref_audio="ref.wav",
ref_text="Transcription of the reference audio."
)
torchaudio.save("out.wav", audio[0], 24000)
语音设计:
python
audio = model.generate(
text="Hello, this is a test of zero-shot voice design.",
instruct="female, low pitch, british accent"
)
项目还提供 CLI 工具(omnivoice-infer、omnivoice-infer-batch)和本地 Web UI(omnivoice-demo)方便测试使用。
OmniVoice 适用于语音合成、虚拟主播等场景,开发者可通过 pip 快速集成。更多技术细节请参考项目文档。
Thinking about your requestUnderstanding project- OmniVoice focuses on real-time speech recognition across multiple languages.
Exploring CSDN style- CSDN articles are technical, informative, with code snippets and a professional tone.
- OmniVoice is a unified voice AI model supporting ASR, TTS, and more, built on k2-fsa tech.
browse_page {"url":"https://github.com/k2-fsa/OmniVoice","instructions":"Extract and summarize the entire README content factually and neutrally. Include: \n- Project title and high-level description (what OmniVoice is)\n- Key capabilities and features (e.g., supported tasks like ASR, TTS, voice conversion, etc.)\n- Supported languages or models\n- Architecture or technical details\n- Installation requirements and steps\n- Quick start / usage examples (include any code snippets exactly as shown)\n- Demo or benchmark information if present\n- Repository stats: stars, forks, watchers, last update date\n- Any contributors or license info\n- Links to related resources (papers, models, etc.)\nDo not add promotional language; stick to facts from the page. If there are images or sections like "Features", list them clearly."} - OmniVoice enables real-time multilingual speech recognition with a unified model.
browse_page {"url":"https://github.com/k2-fsa/OmniVoice","instructions":"Provide the full list of files in the repository, any subdirectories, and note if there are additional docs like docs/ folder or specific model cards. Also extract any information from the repository description, topics/tags, and release information if available."} - OmniVoice supports zero-shot multilingual TTS for over 600 languages with voice cloning.
**OmniVoice 项目介绍:支持 600+ 语言的零样本多语言语音克隆 TTS 模型**
最近 GitHub 上 k2-fsa 团队开源了 OmniVoice 项目,这是一个基于扩散语言模型(Diffusion Language Model)架构的零样本多语言文本转语音(TTS)系统,能够支持 600 多种语言的高质量语音生成、语音克隆和语音设计。
项目核心能力
OmniVoice 的主要特点包括:
- 零样本语音克隆:只需提供一段短参考音频和对应文本,即可克隆说话人声音。
- 语音设计:通过自然语言指令(如"female, low pitch, British accent")控制说话人性别、年龄、音高、方言/口音、耳语等属性。
- 自动语音模式:无需任何参考音频或指令,直接生成自然语音。
- 多语言支持:覆盖 600+ 语言,完整语言列表可在仓库 docs/languages.md 中查看。
- 推理速度快:实时因子(RTF)最低可达 0.025,约 40 倍实时速。
- 精细控制:支持行内非言语符号(如 [laughter]、[surprise-oh])以及发音控制(中文支持带声调的拼音,英文支持 CMU 字典)。
项目采用扩散语言模型架构,在保证生成质量的同时实现了较高的推理效率。模型已在 Hugging Face 上开源(k2-fsa/OmniVoice),仓库地址为 https://github.com/k2-fsa/OmniVoice(当前 Star 约 745,Apache-2.0 协议)。
安装方式
推荐使用干净的 Python 虚拟环境。PyTorch 需提前安装(支持 NVIDIA GPU 或 Apple Silicon):
bash
# NVIDIA GPU 示例
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
# Apple Silicon
pip install torch==2.8.0 torchaudio==2.8.0
安装 OmniVoice:
bash
pip install omnivoice
# 或从 GitHub 安装最新版
pip install git+https://github.com/k2-fsa/OmniVoice.git
快速使用示例
Python API 调用(语音克隆):
python
from omnivoice import OmniVoice
import torch
import torchaudio
model = OmniVoice.from_pretrained("k2-fsa/OmniVoice", device_map="cuda:0", dtype=torch.float16)
audio = model.generate(
text="Hello, this is a test of zero-shot voice cloning.",
ref_audio="ref.wav",
ref_text="Transcription of the reference audio."
)
torchaudio.save("out.wav", audio[0], 24000)
语音设计模式:
python
audio = model.generate(
text="Hello, this is a test of zero-shot voice design.",
instruct="female, low pitch, british accent"
)
此外,项目还提供 CLI 工具(omnivoice-infer、omnivoice-infer-batch)和本地 Web UI(omnivoice-demo),便于快速测试和批量处理。
总结
OmniVoice 以简洁的架构实现了多语言零样本 TTS 的实用落地,适合语音合成、虚拟主播、语音助手等场景的开发需求。开发者可直接通过 pip 安装并快速集成,更多细节可参考仓库 README 及 docs 目录下的文档。项目仍在活跃更新中,感兴趣的同学可以前往 GitHub 页面查看最新进展。