3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in music-captioning-9
No benchmarks available.
Use these libraries to find music-captioning-9 models and implementations
No subtasks available.
The experiments demonstrate that the proposed MU-LLaMA model, trained on the designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field.
This study introduces a method to systematically learn multimodal alignment between audio and lyrics through contrastive learning, which paves the way for models to achieve deeper cross-modal coherence, thereby producing high-quality captions.
A systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation shows that the proposed approach outperforms the supervised baseline model.
This work introduces the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models, and benchmark popular models on three key music- and-language tasks.
MusiLingo is a novel system for music caption generation and music-related query responses, bridging the gap between music audio and textual contexts and creating the MusicInstruct dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries.
Adding a benchmark result helps the community track progress.