Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

audio-10

Sound Event Detection

3260 papers • 126 benchmarks • 313 datasets

Sound Event Detection (SED) is the task of recognizing the sound events and their respective temporal start and end time in a recording. Sound events in real life do not always occur in isolation, but tend to considerably overlap with each other. Recognizing such overlapping sound events is referred as polyphonic SED. Source: A report on sound event detection with different binaural features

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in sound-event-detection-10

Trend

Dataset

Best Model

Actions

DESED

DESED

L3DAS21

L3DAS21

Mivia Audio Events

Mivia Audio Events

Libraries

i

Use these libraries to find sound-event-detection-10 models and implementations

Kikyo-16/Sound_event_detection

3 papers 119

Datasets

DCASE 2016

FSDnoisy18k

L3DAS22

DESED

DCASE 2013

TUT Sound Events 2017

TUT Sound Events 2017

Subtasks

No subtasks available.

Most implemented papers

Towards Deep Learning Models Resistant to Adversarial Attacks

Ludwig Schmidt, Dimitris Tsipras, A. Ma̧dry, Aleksandar Makelov, Adrian Vladu•Sun Jun 18 2017

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

13538

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Mivia Road Events

Mivia Road Events

fgnt/sed_scores_eval

2 papers 22

TUT-SED Synthetic 2016

TUT-SED Synthetic 2016

L3DAS21

TAU-NIGENS Spatial Sound Events 2021

TAU-NIGENS Spatial Sound Events 2021

SINGA:PURA

0

PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions

Aston Zhang, D. Comminiello, Eleonora Grassucci•Thu Oct 07 2021

The parameterization of hypercomplex convolutional layers is defined and the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models that are lightweight and efficient large-scale models are introduced.

46 0

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Haohe Liu, Mark D. Plumbley, Qiuqiang Kong, Wenwu Wang, Chengqi Zhao, Tom Ko, Chutong Meng, Xinhao Mei, Yuexian Zou•Wed Mar 29 2023

This work introduces WavCaps, the first large-scale weakly-labelled audio captioning dataset, and proposes a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT, a large language model, is leveraged to filter and transform raw descriptions automatically.

313 0

Recurrent neural networks for polyphonic sound event detection in real life recordings

Tuomas Virtanen, Giambattista Parascandolo, H. Huttunen•Sat Mar 19 2016

This paper presents an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs).

334 0

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

J. Salamon, J. Bello, Brian McFee•Wed Apr 25 2018

This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.

156 0

Learning Sound Event Classifiers from Web Audio with Noisy Labels

D. Ellis, Manoj Plakal, Eduardo Fonseca, F. Font, Xavier Favory, Xavier Serra•Thu Jan 03 2019

Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

121 0

SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

Bin Yang, Sherif Abdulatif, Karim Guirguis, Christoph Schorn, A. Guntoro•Mon Mar 02 2020

The proposed framework (SELD-TCN) outperforms the state-of-the-art SELDnet performance on four different datasets and achieves 4x faster training time per epoch and 40x faster inference time on an ordinary graphics processing unit (GPU).

69 0

Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection

Yuki Mitsufuji, Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi•Wed Oct 28 2020

In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size and performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

123 0

Couple learning for semi-supervised sound event detection

Xiangdong Wang, Ruijie Tao, Long Yan, Kazushige Ouchi•Mon Oct 11 2021

An effective Couple Learning method that combines a well-trained model and a Mean Teacher model that increases strongly- and weakly-labeled data and reduces the noise impact in the pseudo-labels introduced by detection errors is proposed.

3 0

Rct: Random Consistency Training for Semi-supervised Sound Event Detection

Nian Shao, Xiaofei Li, Erfan Loweimi•Wed Oct 20 2021

A random consistency training (RCT) strategy is proposed to fuse with the teacher-student model to stabilize the training, and a hard mixup data augmentation is proposed to account for the additive property of sounds.

15 0

Adding a benchmark result helps the community track progress.

Sound Event Detection | State-of-the-Art