3260 papers • 126 benchmarks • 313 datasets
Music source separation is the task of decomposing music into its constitutive components, e. g., yielding separated stems for the vocals, bass, and drums. ( Image credit: SigSep )
(Image credit: Papersgraph)
These leaderboards are used to track progress in music-source-separation
Use these libraries to find music-source-separation models and implementations
No subtasks available.
The Wave-U-Net is proposed, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales and indicates that its architecture yields a performance comparable to a state-of-the-art spectrogram-based U- net architecture, given the same data.
A novel network architecture that extends the recently developed densely connected convolutional network (DenseNet) and takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio.
Experimental results show that the performance of Open-Unmix (UMX), a well-known and state-of-the-art open-source library for music separation, can be improved by utilizing a multi-domain loss (MDL) and two combination schemes.
Band-split RNN (BSRNN) is proposed, a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling and a semi-supervised model finetuning pipeline that can further improve the performance of the model.
This work adopts adversarial training for music source separation with the aim of driving the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
Spleeter is a new tool for music source separation with pre-trained models based on Tensorflow that makes it possible to separate audio files into 2, 4 or 5 stems with a single command line using pre- trained models.
A Wavenet-based model is proposed and Wave-U-Net can outperform DeepConvSep, a recent spectrogram-based deep learning model, and the results confirm that waveform-based models can perform similarly (if not better) than a spectrogram/deep learning model.
This paper proposes a first-order primal-dual algorithm for non-negative decomposition problems (one of the two factors is fixed) with the KL distance and provides an efficient heuristic way to select step-sizes.
A fairly straightforward approach for music source separation is to train independent models, wherein each model is dedicated for estimating only a specific source. Training a single model to estimate multiple sources generally does not perform as well as the independent dedicated models. However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation and attempts to achieve a performance comparable to that of the dedicated models. We propose a multi-channel U-Net (M-U-Net) trained using a weighted multi-task loss as an alternative to the C-U-Net. We investigate two weighting strategies for our multi-task loss: 1) Dynamic Weighted Average (DWA), and 2) Energy Based Weighting (EBW). DWA determines the weights by tracking the rate of change of loss of each task during training. EBW aims to neutralize the effect of the training bias arising from the difference in energy levels of each of the sources in a mixture. Our methods provide three-fold advantages compared to C-U-Net: 1) Fewer effective training iterations per epoch, 2) Fewer trainable network parameters (no control parameters), and 3) Faster processing at inference. Our methods achieve performance comparable to that of C-U-Net and the dedicated U-Nets at a much lower training cost.
Adding a benchmark result helps the community track progress.