Automatic Speech Recognition

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Benchmarks

Libraries

Datasets

Subtasks

Most implemented papers

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Content

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Conformer: Convolution-augmented Transformer for Speech Recognition

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models