3260 papers • 126 benchmarks • 313 datasets
( Image credit: R-Transformer )
(Image credit: Papersgraph)
These leaderboards are used to track progress in music-modeling-9
Use these libraries to find music-modeling-9 models and implementations
No subtasks available.
A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
A Pop Music Transformer is built that composes Pop piano music with better rhythmic structure than existing Transformer models, when the way a musical score is converted into the data fed to a Transformer model is improved.
By using notes as an intermediate representation, a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude are trained, a process the authors call Wave2Midi2Wave.
This model is an instance of orderless NADE, which allows more direct ancestral sampling, and finds that Gibbs sampling greatly improves sample quality, which is demonstrated to be due to some conditional distributions being poorly modeled.
A new STAckable Recurrent cell (STAR) is proposed for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients.
This work introduces a search space of operations called XD-Operations that mimic the inductive bias of standard multi-channel convolutions while being much more expressive: it is proved that it includes many named operations across multiple application areas.
Adding a benchmark result helps the community track progress.