audio

Environmental Sound Classification

3260 papers • 126 benchmarks • 313 datasets

Classification of Environmental Sounds. Most often sounds found in Urban environments. Task related to noise monitoring.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in environmental-sound-classification-1

Trend

Dataset

Best Model

Actions

UrbanSound8K

ESC-50

FSD50K

Libraries

i

Use these libraries to find environmental-sound-classification-1 models and implementations

Datasets

Subtasks

Self-Supervised Sound Classification

Most implemented papers

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

J. Salamon, J. Bello•Sun Aug 14 2016

It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.

1412

Content

ESC50

0

Paper Graph

Audioclip: Extending Clip to Image, Text and Audio

A. Dengel, A. Guzhov, Federico Raue, Jörn Hees•Wed Jun 23 2021

An extension of the CLIP model that handles audio in addition to text and images that achieves new state-of-the-art results in the Environmental Sound Classification (ESC) task and out-performs others by reaching accuracies of 97.15 % on ESC-50 and 90.07 % on UrbanSound8K.

489 0

Paper Graph

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

Alessandro Lameiras Koerich, P. Cardinal, Sajjad Abdoli•Wed Apr 17 2019

An end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network that learns a representation directly from the audio signal that outperforms most of the state-of-the-art approaches that use handcrafted features or 2D representations as input.

298 0

Paper Graph

Rethinking CNN Models for Audio Classification

Angela Yao, Dipika Singhania, Kamalesh Palanisamy•Tue Jul 21 2020

It is shown that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification and qualitative results of what the CNNs learn from the spectrograms by visualizing the gradients are shown.

169 0

Paper Graph

Masked Conditional Neural Networks for Environmental Sound Classification

Fady Medhat, D. Chesmore, John A. Robinson•Mon Dec 11 2017

MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k.

16 0

Paper Graph

CRNNs for Urban Sound Tagging with spatiotemporal context

Augustin Arnault, Nicolas Riche•Sun Aug 23 2020

This paper describes CRNNs the authors used to participate in Task 5 of the DCASE 2020 challenge, which focuses on hierarchical multilabel urban sound tagging with spatiotemporal context.

7 0

Paper Graph

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Renjie Zheng, Junkun Chen, Xintong Li, Liang Huang, Hui Zhang, Tian Yuan, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, Dianhai Yu, Yanjun Ma•Thu May 19 2022

The design philosophy and core architecture of PaddleSpeech is described to support several essential speech- to-text and text-to-speech tasks to achieve competitive or state-of-the-art performance on various speech datasets.

31 0

Paper Graph

Ubicoustics: Plug-and-Play Acoustic Activity Recognition

Mayank Goel, Chris Harrison, Karan Ahuja, Gierad Laput•Wed Oct 10 2018

This work describes a novel, real-time, sound-based activity recognition system that starts by taking an existing, state-of-the-art sound labeling model, which is then tuned to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry.

139 0

Paper Graph

Utilizing Domain Knowledge in End-to-End Audio Processing

Hendrik Purwins, Lars Maaløe, T. M. S. Tax, J. Antich•Thu Nov 30 2017

Preliminary work is presented that shows the feasibility of training the first layers of a deep convolutional neural network model to learn the commonly-used log-scaled mel-spectrogram transformation, and how this affects performance on the ESC-50 environmental sound classification dataset.

8 0

Paper Graph

Adding a benchmark result helps the community track progress.