Introduced in PromptTTS: Controllable Text-to-Speech with Text Descriptions2022
PromptSpeech is a dataset that consists of speech and the corresponding prompts. We synthesize speech with 5 different style factors (gender, pitch, speaking speed, volume, and emotion) from a commercial TTS API. The emotion factor has 5 categories and the gender factor has 2 categories.
Source: PromptTTS: Controllable Text-to-Speech with Text Descriptions
Image Source: https://arxiv.org/pdf/2211.12171v1.pdf