3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in robust-speech-recognition-3
No benchmarks available.
Use these libraries to find robust-speech-recognition-3 models and implementations
No subtasks available.
The extension and optimisation of previous work on very deep convolutional neural networks for effective recognition of noisy speech in the Aurora 4 task are described and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective.
A hierarchical sampling training algorithm to address limitations in terms of runtime, memory, and hyperparameter optimization, and a new visualization method for qualitatively evaluating the performance with respect to the interpretability and disentanglement is presented.
Deep investigations in the use of GAN-based dereverberation front-end in ASR find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in the dataset and it is important to update the generator and the discriminator using the same mini-batch data during training.
A domain adaptation method based on generative adversarial nets (GANs) with disentangled representation learning to achieve robustness in ASR systems is proposed and can be used for gender adaptation in gender-mismatched recognition.
This work investigates the potential of stochastic neural networks for learning effective waveform-based acoustic models and proposes an effective approximation based on the Gauss–Hermite quadrature for regularization.
PASE+ is proposed, an improved version of PASE that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks and learns transferable representations suitable for highly mismatched acoustic conditions.
This paper proposes a novel adaptation method for DNN acoustic model using class similarity, which outperforms fine-tuning using one-hot labels on both accent and noise adaptation task, especially when source and target domain are highly mismatched.
A detailed comparison of speech enhancement-based techniques and three different model-based adaptation techniques covering data augmentation, multi-task learning, and adversarial learning for robust ASR suggests that knowledge of the underlying noise type can meaningfully inform the choice of adaptation technique.
An interactive feature fusion network (IFF-Net) is proposed for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature to complement some missing information in the over-suppressed enhanced feature.
This paper applies adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to their model, and shows that the strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
Adding a benchmark result helps the community track progress.