Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

speech-11

Speech Recognition

3260 papers • 126 benchmarks • 313 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise. ( Image credit: SpecAugment )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in speech-recognition-11

Trend

Dataset

Best Model

Actions

LibriSpeech test-clean

LibriSpeech test-clean

LibriSpeech test-other

LibriSpeech test-other

TIMIT

Libraries

i

Use these libraries to find speech-recognition-11 models and implementations

msalhab96/SpeeQ

13 papers 29

Datasets

MNIST

LibriSpeech

Speech Commands

Speech Commands

Common Voice

MuST-C

AISHELL-1

Subtasks

Automatic Speech Recognition (ASR)Visual Speech Recognition Robust Speech Recognition Distant Speech Recognition Distant Speech Recognition

Most implemented papers

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Quoc V. Le, E. D. Cubuk, Barret Zoph, Yu Zhang, William Chan, Daniel S. Park, Chung-Cheng Chiu•Thu Apr 18 2019

This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.

3885

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

TIMIT

Switchboard + Hub500

Switchboard + Hub500

Common Voice German

Common Voice German

WSJ eval92

WSJ eval92

MediaSpeech

MediaSpeech

TUDA

TUDA

SLUE

SLUE

swb_hub_500 WER fullSWBCH

swb_hub_500 WER fullSWBCH

AISHELL-1

AISHELL-1

Common Voice Spanish

Common Voice Spanish

Common Voice French

Common Voice French

WenetSpeech

WenetSpeech

Hub5'00 SwitchBoard

Hub5'00 SwitchBoard

Libri-Light test-clean

Libri-Light test-clean

Libri-Light test-other

Libri-Light test-other

EasyCom

EasyCom

WSJ dev93

WSJ dev93

WSJ eval93

WSJ eval93

Fongbe audio

Fongbe audio

CHiME-6 dev_gss12

CHiME-6 dev_gss12

Common Voice

Common Voice

VIVOS

VIVOS

Common Voice vi

Common Voice vi

AMI SDM1

AMI SDM1

Tedlium

Tedlium

Europarl-ASR EN Guest-test

Europarl-ASR EN Guest-test

Europarl-ASR EN MEP-test

Europarl-ASR EN MEP-test

Speech Commands

Speech Commands

LRS3-TED

LRS3-TED

Switchboard (300hr)

Switchboard (300hr)

Hub5'00 CallHome

Hub5'00 CallHome

Hub5'00 FISHER-SWBD

Hub5'00 FISHER-SWBD

LibriSpeech train-clean-100 test-clean

LibriSpeech train-clean-100 test-clean

LibriSpeech train-clean-100 test-other

LibriSpeech train-clean-100 test-other

Common Voice Portuguese

Common Voice Portuguese

Common Voice Italian

Common Voice Italian

SPGISpeech

SPGISpeech

GigaSpeech

GigaSpeech

GigaSpeech DEV

GigaSpeech DEV

GigaSpeech TEST

GigaSpeech TEST

AMI IMH

AMI IMH

Switchboard SWBD

Switchboard SWBD

Switchboard CallHome

Switchboard CallHome

CHiME-6 eval

CHiME-6 eval

Google Speech Commands - Musan

Google Speech Commands - Musan

Vox Populi

Vox Populi

Artie Bias Corpus

Artie Bias Corpus

Fleurs (English)

Fleurs (English)

CHiME6

CHiME6

WSJ

WSJ

AMI-IHM

AMI-IHM

CALLHOME

CALLHOME

Switchboard corpus

Switchboard corpus

CORAAL

CORAAL

LRS2

LRS2

LibriCSS

LibriCSS

12 papers 6,221

PaddlePaddle/PaddleSpeech

11 papers 6,410

pytorch/fairseq

10 papers 21,285

mravanelli/pytorch-kaldi

8 papers 2,276

huggingface/transformers

7 papers 85,805

TensorSpeech/TensorFlowASR

6 papers 816

6 papers 277

facebookresearch/fairseq

5 papers 21,269

Alexander-H-Liu/End-to-end-ASR-Pyto…

5 papers 1,082

4 papers 5,913

rwth-i6/returnn

4 papers 335

microsoft/speecht5

3 papers 401

microsoft/unilm

2 papers 11,172

2 papers 2,775

alibaba-damo-academy/FunASR

2 papers 270

AISHELL-1

Europarl

Libri-Light

LRS2

LibriCSS

Sequence-To-Sequence Speech Recognition

Target Speaker Extraction

Accented Speech Recognition

Noisy Speech Recognition

English Conversational Speech Recognition

0

Communication-Efficient Learning of Deep Networks from Decentralized Data

H. B. McMahan, Eider Moore, Daniel Ramage, S. Hampson, B. A. Y. Arcas•Tue Feb 16 2016

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

21041 0

Deep Speech: Scaling up end-to-end speech recognition

Awni Y. Hannun, Shubho Sengupta, A. Ng, Bryan Catanzaro, G. Diamos, Adam Coates, Carl Case, J. Casper, Erich Elsen, R. Prenger, S. Satheesh, Vinay Rao•Tue Dec 16 2014

Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

2216 0

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Pete Warden•Sun Apr 08 2018

An audio dataset of spoken words designed to help train and evaluate keyword spotting systems and suggests a methodology for reproducible and comparable accuracy metrics for this task.

1871 0

Recurrent Neural Network Regularization

O. Vinyals, I. Sutskever, Wojciech Zaremba•Sun Sep 07 2014

This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

2960 0

Conformer: Convolution-augmented Transformer for Speech Recognition

Niki Parmar, Ruoming Pang, Yu Zhang, Yonghui Wu, Wei Han, Chung-Cheng Chiu, Zhengdong Zhang, Anmol Gulati, James Qin, Jiahui Yu, Shibo Wang•Fri May 15 2020

This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

3831 0

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

A. Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, A. Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, F. Caltagirone, Thibaut Lavril, Maël Primet, J. Dureau•Thu May 24 2018

The machine learning architecture of the Snips Voice Platform is presented, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices that is fast and accurate while enforcing privacy by design, as no personal user data is ever collected.

888 0

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Michael Auli, Abdel-rahman Mohamed, Alexei Baevski, Henry Zhou•Fri Jun 19 2020

It is shown for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

7553 0

Adding a benchmark result helps the community track progress.

Speech Recognition | State-of-the-Art