3260 papers • 126 benchmarks • 313 datasets
AUDIO SUPER-RESOLUTION or speech bandwidth extension (Upsampling Ratio = 2)
(Image credit: Papersgraph)
These leaderboards are used to track progress in audio-super-resolution-25
Use these libraries to find audio-super-resolution-25 models and implementations
No subtasks available.
A new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, and demonstrates the effectiveness of feed-forward Convolutional architectures on an audio generation task.
NU-Wave 2 is introduced, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model and requires fewer parameters than other models.
NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on neural vocoders and generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test.
A data augmentation strategy is proposed which utilizes multiple low-pass filters during training and leads to improved generalization to unseen filtering conditions at test time, which results in a lower SNR than the band-limited input.
This paper takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices, and shows that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution.
Temporal Feature-Wise Linear Modulation (TFiLM) is proposed, a novel architectural component inspired by adaptive batch normalization and its extensions that uses a recurrent neural network to alter the activations of a convolutional model.
A network architecture for audio super-resolution that combines convolution and self-attention is proposed that outperforms existing approaches on standard benchmarks and allows for more parallelization resulting in significantly faster training.
A block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension that simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation.
A method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA), which locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate.
Adding a benchmark result helps the community track progress.