3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in zero-shot-multi-speaker-tts-9
No benchmarks available.
Use these libraries to find zero-shot-multi-speaker-tts-9 models and implementations
No datasets available.
No subtasks available.
The YourTTS model builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training, achieving state-of-the-art (SOTA) results in zero- shot multi- Speaker TTS and results comparable to SOTA in zero -shot voice conversion on the VCTK dataset.
This paper proposes to improve the diversity of utterances by explicitly learning the distribution of fundamental frequency sequences (pitch contours) of each speaker during training using a stochastic flow-based pitch predictor, then conditioning the model on generated pitch contours during inference.
Adding a benchmark result helps the community track progress.