3260 papers • 126 benchmarks • 313 datasets
Detect fake synthetic speech generated using machine learning
(Image credit: Papersgraph)
These leaderboards are used to track progress in synthetic-speech-detection-10
No benchmarks available.
Use these libraries to find synthetic-speech-detection-10 models and implementations
No subtasks available.
The Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus and the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios.
UR-AIR system submission to the logical access (LA) and the speech deepfake (DF) tracks of the ASVspoof 2021 Challenge is presented and a channel-robust synthetic speech detection system for the challenge is proposed.
It is shown that by only using standard components, a light-weight neural network could outperform the state-of-the-art methods for the ASVspoof2019 challenge, and the proposed model is termed Time-domain Synthetic Speech Detection Net (TSSDNet), having ResNet- or Inception-style structures.
A challenging Mandarin dataset is constructed and the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A) is organized, to fill the gap of lack of Mandarin datasets for synthetic speech detection.
Inspired by the promising performance of ConvNeXt in image classification tasks, the ConvNext network architecture is revised and a lightweight end-to-end anti-spoofing model is proposed that can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify.
This work investigates the discriminative role of silenced parts in synthetic speech detection and shows how first digit statistics extracted from MFCC coefficients can efficiently enable a robust detection.
The SingFake dataset is presented, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers, and four state-of-the-art speech countermeasure systems trained on speech utterances are evaluated.
Adding a benchmark result helps the community track progress.