3260 papers • 126 benchmarks • 313 datasets
Removing reverberation from audio signals
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-dereverberation-23
Use these libraries to find speech-dereverberation-23 models and implementations
No subtasks available.
Deletable convolution is proposed as a solution to allow temporal convolutional networks models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.
This work proposes a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh and is the first neural-network-based approach to predict IRs from a given 3D scene mesh in real-time.
This work presents a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion, and shows that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude.
Deep investigations in the use of GAN-based dereverberation front-end in ASR find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in the dataset and it is important to update the generator and the discriminator using the same mini-batch data during training.
This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain.
A supervised dereverberation technique using fully-convolutional encoder-decoder networks with layers arranged in the form of an U and connections that skip some layers is proposed, which improves quality, intelligibility and other performance metrics compared to the original U-net method and it is on par with the state-of-the-art GAN-based approaches.
A novel front-end design that involves a recently proposed extension of the weighted prediction error (WPE) speech dereverberation algorithm, the virtual acoustic channel expansion (VACE)-WPE, demonstrating its superiority over fully neural front-ends and other TSO methods in various cases.
Analysis of dereverberation performance depending on the model size and the RF of Temporal convolutional net-works shows that a larger RF can have significant improvement in performance when training smaller TCN models.
Adding a benchmark result helps the community track progress.