speech-11

Accented Speech Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in accented-speech-recognition-11

Trend

Dataset

Best Model

Actions

VoxForge American-Canadian

VoxForge Commonwealth

Libraries

i

Use these libraries to find accented-speech-recognition-11 models and implementations

PaddlePaddle/PaddleSpeech

2 papers 7,636

Datasets

VoxForge

Subtasks

No subtasks available.

Most implemented papers

Deep Speech: Scaling up end-to-end speech recognition

Awni Y. Hannun, Shubho Sengupta, A. Ng, Bryan Catanzaro, G. Diamos, Adam Coates, Carl Case, J. Casper, Erich Elsen, R. Prenger, S. Satheesh, Vinay Rao•Tue Dec 16 2014

Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

2216

Content

VoxForge European

VoxForge Indian

msalhab96/SpeeQ

2 papers 42

0

Paper Graph

Goodness of Pronunciation Pipelines for OOV Problem

Ankit Grover•Wed Sep 07 2022

P pipelines for Goodness of Pronunciation (GoP) computation solving OOV problem at testing time using Vocab/Lexicon expansion techniques are proposed and methods to remove UNK and SPN phonemes in the GoP output are implemented.

0 0

Paper Graph

Coupled Training of Sequence-to-Sequence Models for Accented Speech Recognition

P. Jyothi, Nitish Joshi, Vinit Unni•Thu Apr 30 2020

Accented speech poses significant challenges for state-of-the-art automatic speech recognition (ASR) systems. Accent is a property of speech that lasts throughout an utterance in varying degrees of strength. This makes it hard to isolate the influence of accent on individual speech sounds. We propose coupled training for encoder-decoder ASR models that acts on pairs of utterances corresponding to the same text spoken by speakers with different accents. This training regime introduces an L2 loss between the attention-weighted representations corresponding to pairs of utterances with the same text, thus acting as a regularizer and encouraging representations from the encoder to be more accent-invariant. We focus on recognizing accented English samples from the Mozilla Common Voice corpus. We obtain significant error rate reductions on accented samples from a large set of diverse accents using coupled training. We also show consistent improvements in performance on heavily accented samples (as determined by a standalone accent classifier).

6 0

Paper Graph

Adding a benchmark result helps the community track progress.