OmniVoice：支持 600+ 语言的零样本语音克隆 TTS 系统

OmniVoice：支持 600+ 语言的零样本语音克隆 TTS 系统

k2-fsa 团队在 GitHub 上开源了 OmniVoice 项目，这是一个基于扩散语言模型架构的多语言文本转语音（TTS）系统，具备零样本语音克隆能力，支持 600 多种语言的高质量语音生成。

核心功能

零样本语音克隆：仅需参考音频片段和对应文本即可克隆说话人声音
语音属性控制：通过自然语言指令调整性别、音高、口音等特征
多语言支持：覆盖 600+ 语言（完整列表见 docs/languages.md）
高效推理：实时因子（RTF）最低可达 0.025
精细控制：支持非言语符号标记和发音控制（如中文拼音声调）

项目采用扩散语言模型架构，在 Hugging Face 平台开源（k2-fsa/OmniVoice），采用 Apache-2.0 协议。

安装指南

建议使用 Python 虚拟环境，需先安装 PyTorch：

bash 复制代码

# NVIDIA GPU
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128

# Apple Silicon
pip install torch==2.8.0 torchaudio==2.8.0

安装 OmniVoice：

bash 复制代码

pip install omnivoice
# 或安装最新版本
pip install git+https://github.com/k2-fsa/OmniVoice.git

使用示例

语音克隆：

python 复制代码

from omnivoice import OmniVoice
import torchaudio

model = OmniVoice.from_pretrained("k2-fsa/OmniVoice", device_map="cuda:0", dtype=torch.float16)
audio = model.generate(
    text="Hello, this is a test of zero-shot voice cloning.",
    ref_audio="ref.wav",
    ref_text="Transcription of the reference audio."
)
torchaudio.save("out.wav", audio[0], 24000)

语音设计：

python 复制代码

audio = model.generate(
    text="Hello, this is a test of zero-shot voice design.",
    instruct="female, low pitch, british accent"
)

项目还提供 CLI 工具（omnivoice-infer、omnivoice-infer-batch）和本地 Web UI（omnivoice-demo）方便测试使用。

OmniVoice 适用于语音合成、虚拟主播等场景，开发者可通过 pip 快速集成。更多技术细节请参考项目文档。

Thinking about your requestUnderstanding project- OmniVoice focuses on real-time speech recognition across multiple languages.

Exploring CSDN style- CSDN articles are technical, informative, with code snippets and a professional tone.

OmniVoice is a unified voice AI model supporting ASR, TTS, and more, built on k2-fsa tech.
browse_page {"url":"https://github.com/k2-fsa/OmniVoice","instructions":"Extract and summarize the entire README content factually and neutrally. Include: \n- Project title and high-level description (what OmniVoice is)\n- Key capabilities and features (e.g., supported tasks like ASR, TTS, voice conversion, etc.)\n- Supported languages or models\n- Architecture or technical details\n- Installation requirements and steps\n- Quick start / usage examples (include any code snippets exactly as shown)\n- Demo or benchmark information if present\n- Repository stats: stars, forks, watchers, last update date\n- Any contributors or license info\n- Links to related resources (papers, models, etc.)\nDo not add promotional language; stick to facts from the page. If there are images or sections like "Features", list them clearly."}
OmniVoice enables real-time multilingual speech recognition with a unified model.
browse_page {"url":"https://github.com/k2-fsa/OmniVoice","instructions":"Provide the full list of files in the repository, any subdirectories, and note if there are additional docs like docs/ folder or specific model cards. Also extract any information from the repository description, topics/tags, and release information if available."}
OmniVoice supports zero-shot multilingual TTS for over 600 languages with voice cloning.

**OmniVoice 项目介绍：支持 600+ 语言的零样本多语言语音克隆 TTS 模型**

最近 GitHub 上 k2-fsa 团队开源了 OmniVoice 项目，这是一个基于扩散语言模型（Diffusion Language Model）架构的零样本多语言文本转语音（TTS）系统，能够支持 600 多种语言的高质量语音生成、语音克隆和语音设计。

项目核心能力

OmniVoice 的主要特点包括：

零样本语音克隆：只需提供一段短参考音频和对应文本，即可克隆说话人声音。
语音设计：通过自然语言指令（如"female, low pitch, British accent"）控制说话人性别、年龄、音高、方言/口音、耳语等属性。
自动语音模式：无需任何参考音频或指令，直接生成自然语音。
多语言支持：覆盖 600+ 语言，完整语言列表可在仓库 docs/languages.md 中查看。
推理速度快：实时因子（RTF）最低可达 0.025，约 40 倍实时速。
精细控制：支持行内非言语符号（如 $laughter$ 、 $surprise-oh$ ）以及发音控制（中文支持带声调的拼音，英文支持 CMU 字典）。

项目采用扩散语言模型架构，在保证生成质量的同时实现了较高的推理效率。模型已在 Hugging Face 上开源（k2-fsa/OmniVoice），仓库地址为 https://github.com/k2-fsa/OmniVoice（当前 Star 约 745，Apache-2.0 协议）。

安装方式

推荐使用干净的 Python 虚拟环境。PyTorch 需提前安装（支持 NVIDIA GPU 或 Apple Silicon）：

bash 复制代码

# NVIDIA GPU 示例
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128

# Apple Silicon
pip install torch==2.8.0 torchaudio==2.8.0

安装 OmniVoice：

bash 复制代码

pip install omnivoice
# 或从 GitHub 安装最新版
pip install git+https://github.com/k2-fsa/OmniVoice.git

快速使用示例

Python API 调用（语音克隆）：

python 复制代码

from omnivoice import OmniVoice
import torch
import torchaudio

model = OmniVoice.from_pretrained("k2-fsa/OmniVoice", device_map="cuda:0", dtype=torch.float16)

audio = model.generate(
    text="Hello, this is a test of zero-shot voice cloning.",
    ref_audio="ref.wav",
    ref_text="Transcription of the reference audio."
)
torchaudio.save("out.wav", audio[0], 24000)

语音设计模式：

python 复制代码

audio = model.generate(
    text="Hello, this is a test of zero-shot voice design.",
    instruct="female, low pitch, british accent"
)

此外，项目还提供 CLI 工具（omnivoice-infer、omnivoice-infer-batch）和本地 Web UI（omnivoice-demo），便于快速测试和批量处理。

总结

OmniVoice 以简洁的架构实现了多语言零样本 TTS 的实用落地，适合语音合成、虚拟主播、语音助手等场景的开发需求。开发者可直接通过 pip 安装并快速集成，更多细节可参考仓库 README 及 docs 目录下的文档。项目仍在活跃更新中，感兴趣的同学可以前往 GitHub 页面查看最新进展。