CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages (2019-03-27T00:00:00.000000Z)

TL;DR

The development of CSS10 is described, a collection of single speaker speech datasets for ten languages composed of short audio clips from LibriVox audiobooks and their aligned texts, and two neural text-to-speech models are trained.

Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pre-trained models, and test resources publicly available. We hope they will be used for future speech tasks.

Authors

Kyubyong Park

6 papers

Thomas Mulc

1 papers

TL;DR

Abstract

Authors

References33 items

Gentle

The Blizzard Challenge 2018

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

Voice Synthesis for in-the-Wild Speakers via a Phonological Loop

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Tacotron: Towards End-to-End Speech Synthesis

Deep Voice: Real-time Neural Text-to-Speech

Char2Wav: End-to-End Speech Synthesis

WaveNet: A Generative Model for Raw Audio

Librispeech: An ASR corpus based on public domain audio books

TED-LIUM: an Automatic Speech Recognition dedicated corpus

CROWDMOS: An approach for crowdsourcing mean opinion score studies

Audacity

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

“The m-ailabs speech dataset,”

“Librivox,”

“Cc-cedict,”

“A tensorﬂow implementation of dc-tts: yet another text-to-speech model,”

“Jieba,”

“The spoken wikipedia corpus collection,”

TUNDRA: a multilingual corpus of found data for TTS research created with light supervision

“python-romkan,”

“Tatoeba,”

World English Bible

“Pavoque corpus of expressive speech,”

“Voxforge,”

MeCab : Yet Another Part-of-Speech and Morphological Analyzer

Field of Study

Venue Information

Name

Type

URL

Alternate Names