3260 papers • 126 benchmarks • 313 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids. ( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-enhancement-23
Use these libraries to find speech-enhancement-23 models and implementations
A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
A novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure and achieves state-of-the-art performance in all metrics.
A new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex- valued operation.
The proposed network, Redundant Convolutional Encoder Decoder (R-CED), demonstrates that a convolutional network can be 12 times smaller than a recurrent network and yet achieves better performance, which shows its applicability for an embedded system: the hearing aids.
Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them, and exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).
A novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, which could not be fully optimized by Lp or conventional adversarial losses is proposed.
A novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs and perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency is presented.
Adding a benchmark result helps the community track progress.