3260 papers • 126 benchmarks • 313 datasets
Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-to-text-translation-5
Use these libraries to find speech-to-text-translation-5 models and implementations
State-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes are implemented and seamlessly integrated into S2T workflows for multi-task learning or transfer learning.
CoVoST 2 is released, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages, which represents the largest open dataset available to date from total volume and language coverage perspective.
By projecting audio and text features to a common semantic representation, Chimera unifies MT and ST tasks and boosts the performance on ST benchmarks, MuST-C and Augmented Librispeech, to a new state-of-the-art.
This paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST) and shows that adapters can be used to efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters.
This work proposes Supervised Hybrid Audio Segmentation (SHAS), a method that can effectively learn the optimal segmentation from any manually segmented speech corpus, and which retains 95-98% of the manual segmentation's BLEU score, compared to the 87-93%" of the best existing methods.
The design philosophy and core architecture of PaddleSpeech is described to support several essential speech- to-text and text-to-speech tasks to achieve competitive or state-of-the-art performance on various speech datasets.
This paper augments an existing (monolingual) corpus: LibriSpeech with an existing corpus derived from read audiobooks from the LibriVox project, and shows that the automatic alignments scores are reasonably correlated with the human judgments of the bilingual alignment quality.
Adding a benchmark result helps the community track progress.