【S2ST】PolyVoice: Language Models for Speech to Speech Translation

cxxx172024-03-13 16:45

PolyVoice: Language Models for Speech to Speech Translation

- contributions
- [Overview of PolyVoice](#Overview of PolyVoice)

LM-based method in S2ST

contributions

Decoder-only model for speech2speech translation.
Unit-based audio LM predicts the SoundStream Codec

Overview of PolyVoice

two LM-based components: a S2UT front-end for translation and a U2S back-end for synthesis.

An extra language model for duration prediction.

Semantic unit are extracted by mhubert
Acoustic units are soundstream codec(residual vector quantizer), using a autoregressive model and a non-autoregressive model.

上一篇：大语言模型(LLM)过拟合问题

下一篇：植物病害识别：YOLO水稻病害识别数据集（1000多张，3个类别，yolo标注）

热门推荐

01KGG转MP3工具|非KGM文件|解密音频 02从零安装 LLaMA-Factory 微调 Qwen 大模型成功及所有的坑 03我决定放弃搞 Java 了 04Coze扣子平台完整体验和实践（附国内和国际版对比）05YOLOv8入门 | 重要性能衡量指标、训练结果评价及分析及影响mAP的因素【发论文关注的指标】06DeepSeek各版本说明与优缺点分析 07yolov8，yolo11，yolo12 服务器训练到部署全流程笔记 08苍穹外卖面试总结 09如何在WPS和Word/Excel中直接使用DeepSeek功能 10YOLOv5改进 | 添加CA注意力机制 + 增加预测层 + 更换损失函数之GIoU