VITS2来袭~

语音之家2023-08-20 19:45

**论文：**VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

**演示：**https://vits-2.github.io/demo/

**论文：**https://arxiv.org/abs/2307.16430

目前仍然存在的问题:

intermittent unnaturalness
low efficiency of the duration predictor
complex input format to alleviate the limitations of alignment and duration modeling (use of blank token)
insufficient speaker similarity in the multi-speaker model
slow training, and strong dependence on the phoneme conversion.

提出的方法：

a stochastic duration predictor trained through adversarial learning
normalizing flows improved by utilizing the transformer block
a speaker-conditioned text encoder to model multiple speakers' characteristics better.

上一篇：【类和对象】

下一篇：降噪自编码器(Denoising Autoencoder)

热门推荐

01UV安装并设置国内源 02【2025.08.06最新版】Android Studio下载、安装及配置记录（自动下载sdk）03Qwen3-Coder 快速上手教程 | Qwen Code + Claude Code 04KGG转MP3工具|非KGM文件|解密音频 052025最新国内服务器可用docker源仓库地址大全（2025年8月更新）06蜘蛛磁力搜索引擎大全，如何使用蜘蛛磁力查找磁力链接 07TRAE 规则（Rules）配置指南：个人习惯、团队规范与最佳实践 08NVIDIA显卡驱动、CUDA、cuDNN 和 TensorRT 版本匹配指南 09阿里开源首个图像生成基础模型——Qwen-Image本地部署教程，超强中文渲染能力刷新SOTA！10TRAE Rules 实践：为项目配置 6A 工作流