语音数据集2-VCTK多人音频

VCTK

VCTK全称是Centre for Speech Technology Voice Cloning Toolkit (CSTR's VCTK Corpus),即语音克隆工具包。

1. 简介

数据是109 位英语母语人士（不同口音）。每位朗读大约 400 句子

大部分句子都是从《先驱报》(格拉斯哥)选出，外加一段彩虹段落 和一段方言识别段落。报纸部分，每人朗读不同句子，句子是用贪婪算法选择，最大限度地覆盖上下文和语音。彩虹段落和引出段落对所有发言人来说都是相同的。

1.1 《彩虹段落》（The Rainbow Passage）

语音语言学的经典文本段落，广泛用于语音研究、语言治疗、语音合成和语音识别等。该文本具有多样的语音特征和语言内容。

音素 - 包含了英语中广泛的音素（即语音的最小单位），包括元音、辅音以及它们的各种组合。
发音变化 - 句子结构和内容设计使说话者需要不同的发音模式，展现语音多样性。
语法: 段落包含了复杂的句法结构，如复合句和从句，有助于研究语音在不同语法环境下的表现。
词汇: 涵盖了多种词汇和表达，适合测试语言的流畅性和准确性。
内容: 涵盖彩虹相关自然现象、文化传说，历史解释、隐喻等。

段落原文：

txt 复制代码

When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow. The rainbow is a division of white light into many beautiful colors.  These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon. There is , according to legend, a boiling pot of gold at one end. People look, but no one ever finds it.  When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.

Throughout the centuries people have explained the rainbow in various ways. Some have accepted it as a miracle without physical explanation. To the Hebrews it was a token that there would be no more universal floods. The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain. The Norsemen considered
the rainbow as a bridge over which the gods passed from earth to their home in the sky.  Others have tried to explain the phenomenon physically. Aristotle thought that the rainbow was caused by reflection of the sun's rays by the rain. Since then physicists have found that it is not reflection, but refraction by the raindrops which causes the rainbows. Many complicated ideas about the rainbow have been formed.

The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases. The actual primary rainbow observed is said to be the effect of super-imposition of a number of bows. If the red of the second bow falls upon the green of the first, the result is to give
a bow with an abnormally wide yellow band, since red and green light when mixed form yellow. This is a very common type of bow, one showing mainly red and yellow, with little or no green or blue.

1.2 方言识别段落（Elicitation Paragraph）

语音语言学家设计用于检测和分析说话者口音或方言特征的文本段落。

一个典型方言识别段落包括：

特定的词汇、缩写，具有标志性的方言特征发音。
音调和重音的变化，反映出特定方言的特征
特定的语法结构、短语和非正式用语，以评估口音或方言在不同语境中的表现。

段落原文:

txt 复制代码

Please call Stella. Ask her to bring these things with her from the store: 
six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. 

We also need a small plastic snake and a big toy frog for the kids. 
She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.

2. 数据细节

2.1 数据格式

录音
- 使用一支全向麦克风（DPA 4035）和一支宽带宽小振膜电容麦克风（Sennheiser MKH 800）。
- 录音的采样频率是96 kHz，24位深度，并在爱丁堡大学的半消声室中进行。
- 异常情况: 有两名说话者（p280 和 p315）在使用MKH 800录音时出现了技术问题。
转换
- 所有录音都被转换为16位，并降采样至48 kHz。
- 手动对录音进行了端点处理（即去掉了录音开始和结束的静音部分）。
文本标注
- 110个录音中的109个提供了对应的文本文件（转录文件），存储在'/txt'文件夹中。
- 异常情况: 由于硬盘错误，'p315'的文本丢失。

2.2 衍生版本

原版VCTK (2019-11-13)

CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

version 0.92: 10.94 GB

Device Recorded VCTK (Small subset version, 2018-03-06)

这个版本使用便宜的消费设备录制，而不是专业录音设备。

在普通的办公环境中，引入了设备和环境的影响，例如背景噪声、回声和设备自身的录音质量。

DR-VCTK , 1.671 GB

Noisy Reverberant Speech Database (2017-09-14)

用于噪声抑制和去混响及文本到语音（TTS）的数据集。

通过将干净的语音数据变得"混响和嘈杂"：

txt 复制代码

1. 将干净的语音信号与一个房间脉冲响应（Room Impulse Response, RIR）进行卷积。模拟语音在一个特定房间内的传播和反射，导致混响效应。

  2. 将干净的语音信号与一个RIR进行卷积，模拟噪声在房间内的传播和混响。

  3. 将经过混响处理的语音信号与经过混响处理的噪声信号相加，产生最终的"嘈杂和混响"的语音信号。

Noisy speech database

干净和带杂声平行的数据集
Reverberant speech database

专门用于去混响的数据集
96kHz version of the CSTR VCTK Corpus

高采样率

Reference

https://datashare.ed.ac.uk/handle/10283/3443