3260 papers • 126 benchmarks • 313 datasets
A part of the TTS-front end framework which serves to predict the correct pronunciation for the input polyphone characters.
(Image credit: Papersgraph)
These leaderboards are used to track progress in polyphone-disambiguation-3
Use these libraries to find polyphone-disambiguation-3 models and implementations
No subtasks available.
Dict-TTS is proposed, a semantic-aware generative text-to-speech model with an online website dictionary that outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems.
This work proposes a novel approach, called g2pW, which adapts learnable softmax-weights to condition the outputs of BERT with the polyphonic character of interest and its POS tagging and shows that it outperforms existing methods on the public CPP dataset.
Adding a benchmark result helps the community track progress.