speech-8

Distant Speech Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in distant-speech-recognition-8

Trend

Dataset

Best Model

Actions

DIRHA English WSJ

CHiME-4 real 6ch

Libraries

i

Use these libraries to find distant-speech-recognition-8 models and implementations

mravanelli/pytorch-kaldi

3 papers 2,228

Datasets

ReVerb Challenge

DIRHA

Subtasks

No subtasks available.

Most implemented papers

The Pytorch-kaldi Speech Recognition Toolkit

Yoshua Bengio, M. Ravanelli, Titouan Parcollet•Sun Nov 18 2018

Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

235

Content

0

Paper Graph

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee•Mon Jan 09 2017

A novel architecture for a deep recurrent neural network, residual LSTM is introduced, which separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows.

187 0

Paper Graph

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Chris Biemann, Stephan Radeck-Arneth, Benjamin Milde, Arvid Lange, E. Gouvêa, Stefan Radomski, M. Mühlhäuser•Sun Sep 13 2015

A new freely available corpus for German distant speech recognition is presented and speaker-independent word error rate WER results for two open source speech recognizers trained on this corpus are reported.

42 0

Paper Graph

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

M. Ravanelli, L. Cristoforetti, R. Gretter, Marco Pellin, Alessandro Sosi, M. Omologo•Mon Nov 30 2015

A first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level are reported.

60 0

Paper Graph

Contaminated speech training methods for robust DNN-HMM distant speech recognition

M. Ravanelli, M. Omologo•Mon Oct 09 2017

This paper revise this classical approach in the context of modern DNN-HMM systems, and proposes the adoption of three methods, namely, asymmetric context windowing, close- talk based supervision, and close-talk based pre-training, which show a significant advantage in using these three methods.

32 0

Paper Graph

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

N. Lane, M. Ravanelli, Titouan Parcollet, Xinchi Qiu, Mohamed Morchid•Sun May 17 2020

It is shown that a quaternion long-short term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-Channel distant speech recognition.

17 0

Paper Graph

Interpretable Convolutional Filters with SincNet

Yoshua Bengio, M. Ravanelli•Thu Nov 22 2018

This paper proposes SincNet, a novel Convolutional Neural Network that encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions, and shows that the proposed architecture converges faster, performs better, and is more interpretable than standard CNNs.

112 0

Paper Graph

Learning to Rank Microphones for Distant Speech Recognition

Samuele Cornell, S. Squartini, A. Brutti, M. Matassoni•Mon Apr 05 2021

This work proposes MicRank, a learning to rank framework where a neural network is trained to rank the available channels using directly the recognition performance on the training set, which is agnostic with respect to the array geometry and type of recognition back-end.

6 0

Paper Graph

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Yoshua Bengio, M. Ravanelli, J. Serrà, Santiago Pascual, A. Bonafonte•Fri Apr 05 2019

Experiments show that the proposed improved self-supervised method can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues.

254 0

Paper Graph

MESHRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods

Shoichi Koyama, Tomoya Nishida, Keisuke Kimura, Takumi Abe, Natsuki Ueno, Jesper Brunnström•Sun Jun 20 2021

A new impulse response (IR) dataset called MeshRIR is introduced, which consists of IRs measured at positions obtained by finely discretizing a spatial region and is suitable for evaluating sound field analysis and synthesis methods.

51 0

Paper Graph

Adding a benchmark result helps the community track progress.