3260 papers • 126 benchmarks • 313 datasets
Classification of Environmental Sounds. Most often sounds found in Urban environments. Task related to noise monitoring.
(Image credit: Papersgraph)
These leaderboards are used to track progress in environmental-sound-classification-1
Use these libraries to find environmental-sound-classification-1 models and implementations
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
An extension of the CLIP model that handles audio in addition to text and images that achieves new state-of-the-art results in the Environmental Sound Classification (ESC) task and out-performs others by reaching accuracies of 97.15 % on ESC-50 and 90.07 % on UrbanSound8K.
An end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network that learns a representation directly from the audio signal that outperforms most of the state-of-the-art approaches that use handcrafted features or 2D representations as input.
It is shown that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification and qualitative results of what the CNNs learn from the spectrograms by visualizing the gradients are shown.
MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k.
This paper describes CRNNs the authors used to participate in Task 5 of the DCASE 2020 challenge, which focuses on hierarchical multilabel urban sound tagging with spatiotemporal context.
The design philosophy and core architecture of PaddleSpeech is described to support several essential speech- to-text and text-to-speech tasks to achieve competitive or state-of-the-art performance on various speech datasets.
This work describes a novel, real-time, sound-based activity recognition system that starts by taking an existing, state-of-the-art sound labeling model, which is then tuned to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry.
Preliminary work is presented that shows the feasibility of training the first layers of a deep convolutional neural network model to learn the commonly-used log-scaled mel-spectrogram transformation, and how this affects performance on the ESC-50 environmental sound classification dataset.
Adding a benchmark result helps the community track progress.