audio-8

Audio Super-Resolution

3260 papers • 126 benchmarks • 313 datasets

AUDIO SUPER-RESOLUTION or speech bandwidth extension (Upsampling Ratio = 2)

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in audio-super-resolution-25

Trend

Dataset

Best Model

Actions

VCTK Multi-Speaker

Voice Bank corpus (VCTK)

Piano

Libraries

i

Use these libraries to find audio-super-resolution-25 models and implementations

Datasets

VCTK

DSD100

MedleyDB 2.0

Subtasks

No subtasks available.

Most implemented papers

Audio Super Resolution using Neural Networks

Stefano Ermon, Volodymyr Kuleshov, S. Enam•Thu Feb 16 2017

A new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, and demonstrates the effectiveness of feed-forward Convolutional architectures on an audio generation task.

144

Content

Piano

DSD100

0

Paper Graph

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

Junhyeok Lee, Seungu Han•Thu Jun 16 2022

NU-Wave 2 is introduced, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model and requires fewer parameters than other models.

62 0

Paper Graph

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

Junhyeok Lee, Seungu Han•Mon Apr 05 2021

NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on neural vocoders and generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test.

91 0

Paper Graph

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Serkan Sulun, M. Davies•Fri Nov 13 2020

A data augmentation strategy is proposed which utilizes multiple low-pass filters during training and leads to improved generalization to unseen filtering conditions at test time, which results in a lower SNR than the band-limited input.

22 0

Paper Graph

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Sherif Abdulatif, Ru Cao, Bin Yang•Wed Sep 21 2022

This paper takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices, and shows that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution.

119 0

Paper Graph

Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations

Stefano Ermon, Volodymyr Kuleshov, Pang Wei Koh, S. Enam, Sawyer Birnbaum•Fri Sep 13 2019

Temporal Feature-Wise Linear Modulation (TFiLM) is proposed, a novel architectural component inspired by adaptive batch normalization and its extensions that uses a recurrent neural network to alter the activations of a convolutional model.

84 0

Paper Graph

Self-Attention for Audio Super-Resolution

Nathanaël Carraz Rakotonirina•Wed Aug 25 2021

A network architecture for audio super-resolution that combines convolution and self-attention is proposed that outperforms existing approaches on standard benchmarks and allows for more parallelization resulting in significantly faster training.

31 0

Paper Graph

Tunet: A Block-Online Bandwidth Extension Model Based On Transformers And Self-Supervised Pretraining

Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong•Mon Oct 25 2021

A block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension that simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation.

28 0

Paper Graph

Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution

Seunghoon Hong, Jungseul Ok, Jaechang Kim, Yunjoo Lee•Fri Oct 29 2021

A method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA), which locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate.

16 0

Paper Graph

Adding a benchmark result helps the community track progress.