speech-8

Sequence-To-Sequence Speech Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in sequence-to-sequence-speech-recognition-8

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find sequence-to-sequence-speech-recognition-8 models and implementations

Datasets

No datasets available.

Subtasks

No subtasks available.

Most implemented papers

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Colin Cherry, George F. Foster, M. Krikun, Melvin Johnson, Orhan Firat, Dmitry Lepikhin, HyoukJoong Lee, Dehao Chen, Yanping Huang, Z. Chen, Wolfgang Macherey, Rohan Anil, Xiaobing Liu, Suyog Gupta, Tara N. Sainath, Wei-Ning Hsu, M. Schuster, Ruoming Pang, Yuan Cao, N. Jaitly, Semih Yavuz, Llion Jones, Ron J. Weiss, H. Zen, Ye Jia, Yonghui Wu, Colin Raffel, Ciprian Chelba, Qi Ge, Uri Alon, Ankur Bapna, S. Sabour, Shankar Kumar, Isaac Caswell, William Chan, Chung-Cheng Chiu, Benoit Jacob, J. Chorowski, Sébastien Jean, Zongheng Yang, Jonathan Shen, Patrick Nguyen, Bo Li, Ekaterina Gonina, Zelin Wu, James Qin, Rohit Prabhavalkar, Anjuli Kannan, Kanishka Rao, M. Bacchiani, M. Chen, Yanzhang He, Smit Hinsu, Stella Laurenzo, Shuyuan Zhang, Qiao Liang, Bowen Liang, Rajat Tibrewal, Akiko Eriguchi, Naveen Ari, Parisa Haghani, Otavio Good, Youlong Cheng, R. Álvarez, Kuan Wang, K. Tomanek, Ben Vanik, Kazuki Irie, J. Richardson, Klaus Macherey, A. Bruguier, David Rybach, M. Murray, Vijayaditya Peddinti, T. Jablin, R. Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Yu Zhang, Ian McGraw, M. Galkin, G. Pundak, Chad Whipkey, Todd Wang, Ye Tian, Shubham Toshniwal, Baohua Liao, M. Nirschl, Pat Rondon

Content

•

Wed Feb 20 2019

This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the Framework.

214 0

Paper Graph

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

Z. Chen, N. Jaitly, Ron J. Weiss, Yonghui Wu, J. Chorowski•Thu Mar 23 2017

A recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another, illustrating the power of attention-based models.

363 0

Paper Graph

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Bo Xu, Shiyu Zhou, Linhao Dong, Shuang Xu•Fri Apr 27 2018

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of \emph{$28.77\%$}, which is competitive to the state-of-the-art CER of $28.0\%$ by the joint CTC-attention based encoder-decoder network.

124 0

Paper Graph

Multimodal Grounding for Sequence-to-sequence Speech Recognition

Loïc Barrault, Shruti Palaskar, Florian Metze, Ramon Sanabria, Ozan Caglayan•Thu Nov 08 2018

This paper proposes novel end-to-end multimodal ASR systems and compares them to the adaptive approach by using a range of visual representations obtained from state-of-the-art convolutional neural networks and shows that adaptive training is effective for S2S models leading to an absolute improvement of 1.4% in word error rate.

25 0

Paper Graph

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

Gerasimos Spanakis, J. Niehues, Danni Liu•Thu May 21 2020

This work proposes three latency reduction techniques for chunk-based incremental inference and evaluates their efficiency in terms of accuracy-latency trade-off and shows that their approach is also applicable to low-latencies speech translation.

60 0

Paper Graph

Adding a benchmark result helps the community track progress.