music-9

Audio Generation

3260 papers • 126 benchmarks • 313 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech. ( Image credit: MelNet )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in audio-generation-29

Trend

Dataset

Best Model

Actions

AudioCaps

Classical music, 5 seconds at 12 kHz

Libraries

i

Use these libraries to find audio-generation-29 models and implementations

declare-lab/tango

3 papers 954

Datasets

Subtasks

Voice Cloning Audio Super-Resolution Room Impulse Response (RIR)

Most implemented papers

WaveNet: A Generative Model for Raw Audio

O. Vinyals, K. Simonyan, Nal Kalchbrenner, K. Kavukcuoglu, A. Senior, Alex Graves, S. Dieleman, Aäron van den Oord, H. Zen•Sun Sep 11 2016

WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

8007

Content

Symphony music

ADL Piano MIDI

dMelodies

Audio-alpaca

0

Paper Graph

GANSynth: Adversarial Neural Audio Synthesis

Jesse Engel, Chris Donahue, Adam Roberts, Ishaan Gulrajani, Shuo Chen, Kumar Krishna Agrawal•Thu Jan 31 2019

Through extensive empirical investigations on the NSynth dataset, it is demonstrated that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.

417 0

Paper Graph

It's Raw! Audio Generation with State-Space Models

Chris Donahue, Karan Goel, Christopher R'e, Albert Gu•Sat Feb 19 2022

SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling, is proposed, identifying that S4 can be unstable during autoregressive generation, and providing a simple improvement to its parameterization by drawing connections to Hurwitz matrices.

237 0

Paper Graph

MelNet: A Generative Model for Audio in the Frequency Domain

M. Lewis, Sean Vasquez•Mon Jun 03 2019

This work designs a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve, and applies it to a variety of audio generation tasks, showing improvements over previous approaches in both density estimates and human judgments.

140 0

Paper Graph

AudioLM: A Language Modeling Approach to Audio Generation

O. Pietquin, David Grangier, Zalán Borsos, M. Tagliasacchi, Matthew Sharifi, Neil Zeghidour, Damien Vincent, O. Teboul, Raphaël Marinier, E. Kharitonov, Dominik Roblek•Tue Sep 06 2022

This work uses the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis.

831 0

Paper Graph

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Yoshua Bengio, Aaron C. Courville, Soroush Mehri, Ishaan Gulrajani, Rithesh Kumar, Jose M. R. Sotelo, Kundan Kumar, Shubham Jain•Thu Nov 03 2016

It is shown that the model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature.

619 0

Paper Graph

Audio Super Resolution using Neural Networks

Stefano Ermon, Volodymyr Kuleshov, S. Enam•Thu Feb 16 2017

A new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, and demonstrates the effectiveness of feed-forward Convolutional architectures on an audio generation task.

144 0

Paper Graph

High-Fidelity Audio Compression with Improved RVQGAN

Prem Seetharaman, Rithesh Kumar, Alejandro Luebs, I. Kumar, Kundan Kumar•Sat Jun 10 2023

This work introduces a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth by combining advances in high- fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses.

596 0

Paper Graph

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

Antoine Caillon, Adrien Bitton, P. Esling, Martin Fouilleul•Sun Mar 31 2019

The proposed model generates notes as magnitude spectrograms from any probabilistic latent code samples, with expressive control of orchestral timbres and playing styles, and can be applied to other sound domains, including an user's libraries with custom sound tags that could be mapped to specific generative controls.

11 0

Paper Graph

Adding a benchmark result helps the community track progress.