3260 papers • 126 benchmarks • 313 datasets
The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded. Source: DCASE 2019 Source: DCASE 2018
(Image credit: Papersgraph)
These leaderboards are used to track progress in acoustic-scene-classification-1
Use these libraries to find acoustic-scene-classification-1 models and implementations
No subtasks available.
The receptive field (RF) of CNNs is analysed and the importance of the RF to the generalization capability of the models is demonstrated, showing that very small or very large RFs can cause performance degradation, but deep models can be made to generalize well by carefully choosing an appropriate RF size within a certain range.
Comunicacio presentada a: 15th Sound and Music Computing Conference (SMC2018) Sonic crossing, celebrat a Limassol, Xipre, del 4 al 7 de Julyiol de 2018.
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
Results indicate that transfer learning is a powerful strategy in such scenarios, but prototypical networks show promising results when one does not count with external or validation data.
This paper performs a systematic investigation of different RF configuration for various CNN architectures on the DCASE 2019 Task 1.A dataset, introduces Frequency Aware CNNs to compensate for the lack of frequency information caused by the restricted RF, and investigates if and in what RF ranges they yield additional improvement.
This work proposes a novel method to optimize and regularize transformers on audio spectrograms that achieves a new state-of-the-art performance on Audioset and can be trained on a single consumer-grade GPU.
The proposed framework (SELD-TCN) outperforms the state-of-the-art SELDnet performance on four different datasets and achieves 4x faster training time per epoch and 40x faster inference time on an ordinary graphics processing unit (GPU).
The Qwen-Audio model, a multi-task training framework that achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts.
A deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi- label classification for domestic audio tagging in the DCASE-2016 contest improves the baselines by a relative amount of 17% and 19%, respectively.
The first method of unsupervised adversarial domain adaptation for acoustic scene classification is presented, which employs a model pre-trained on data from one set of conditions and by using data from other set of Conditions, which adapt the model in order that its output cannot be used for classifying the set of condition that input data belong to.
Adding a benchmark result helps the community track progress.