3260 papers • 126 benchmarks • 313 datasets
Music transcription is the task of converting an acoustic musical signal into some form of music notation. ( Image credit: ISMIR 2015 Tutorial - Automatic Music Transcription )
(Image credit: Papersgraph)
These leaderboards are used to track progress in music-transcription
Use these libraries to find music-transcription models and implementations
This work relies on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and uses them in experiments with end-to-end training schemes and demonstrates that such complex- valued models are competitive with their real-valued counterparts.
By using notes as an intermediate representation, a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude are trained, a process the authors call Wave2Midi2Wave.
A high-resolution AMT system trained by regressing precise onset and offset times of piano notes and pedal events is proposed, and it is shown that the system is robust to the misaligned onset andoffset labels compared to previous systems.
A multi-label classification task to predict notes in musical recordings is defined, along with an evaluation protocol, and several machine learning architectures for this task are benchmarked.
This work builds and train LSTM networks using approximately 23,000 music transcriptions expressed with a high-level vocabulary (ABC notation), and uses them to generate new transcriptions to create music transcription models useful in particular contexts of music composition.
The experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part, and boost the frame-level precision to be higher than the state-of-the-art models.
This paper presents a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters.
It is shown that equivalent performance can be achieved using a generic encoder-decoder Transformer with standard decoding methods, and it is demonstrated that the model can learn to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks.
This paper introduces a simple method for scoring intervals using scaled inner product operations that resemble how attention scoring is done in transformers, and demonstrates that an encoder-only structured non-hierarchical transformer backbone is capable of transcribing piano notes and pedals with high accuracy and time precision.
This work demonstrates that a general-purpose Transformer model can perform multi-task AMT, jointly transcribing arbitrary combinations of musical instruments across several transcription datasets, and shows this unified training framework achieves high-quality transcription results across a range of datasets.
Adding a benchmark result helps the community track progress.