3260 papers • 126 benchmarks • 313 datasets
Detect fake synthetic speech generated using machine learning
(Image credit: Papersgraph)
These leaderboards are used to track progress in synthetic-speech-detection-8
No benchmarks available.
Use these libraries to find synthetic-speech-detection-8 models and implementations
No subtasks available.
The Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus and the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios.
UR-AIR system submission to the logical access (LA) and the speech deepfake (DF) tracks of the ASVspoof 2021 Challenge is presented and a channel-robust synthetic speech detection system for the challenge is proposed.
Inspired by the promising performance of ConvNeXt in image classification tasks, the ConvNext network architecture is revised and a lightweight end-to-end anti-spoofing model is proposed that can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify.
A challenging Mandarin dataset is constructed and the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A) is organized, to fill the gap of lack of Mandarin datasets for synthetic speech detection.
It is shown that by only using standard components, a light-weight neural network could outperform the state-of-the-art methods for the ASVspoof2019 challenge, and the proposed model is termed Time-domain Synthetic Speech Detection Net (TSSDNet), having ResNet- or Inception-style structures.
This work investigates the discriminative role of silenced parts in synthetic speech detection and shows how first digit statistics extracted from MFCC coefficients can efficiently enable a robust detection.
The SingFake dataset is presented, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers, and four state-of-the-art speech countermeasure systems trained on speech utterances are evaluated.
Adding a benchmark result helps the community track progress.