3260 papers • 126 benchmarks • 313 datasets
Simultaneous Speech-to-Text translation aims to translate concurrently with the source speech. It is crucial since it enables real-time interpretation of conversations, lectures and talks.
(Image credit: Papersgraph)
These leaderboards are used to track progress in simultaneous-speech-to-text-translation-10
No benchmarks available.
Use these libraries to find simultaneous-speech-to-text-translation-10 models and implementations
No datasets available.
No subtasks available.
The overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2 are described, which is publicly available at https://github.com/espnet/esp net.
Learned Proportions (LeaP) and LeaPformers are proposed, which replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns.
StreamSpeech is proposed, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning that can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an"All-in-One"seamless model.
Adding a benchmark result helps the community track progress.