speech-7

Speaker Identification

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in speaker-identification-7

Trend

Dataset

Best Model

Actions

VoxCeleb1

EVI en-GB

EVI pl-PL

Libraries

i

Use these libraries to find speaker-identification-7 models and implementations

Datasets

VoxCeleb1

CSI

EVI

FSC-P2

Subtasks

No subtasks available.

Most implemented papers

Speaker Recognition from Raw Waveform with SincNet

Yoshua Bengio, M. Ravanelli•Sat Jul 28 2018

This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters, based on parametrized sinc functions, which implement band-pass filters.

814

Content

EVI fr-FR

0

Paper Graph

Deep Speaker: an End-to-End Neural Speaker Embedding System

Xiangang Li, Zhenyao Zhu, Xuewei Zhang, Ying Cao, Chao Li, Xiaokong Ma, B. Jiang, Xiao Liu, Ajay Kannan•Thu May 04 2017

Results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker recognition are presented, and it is suggested that Deep Speaker outperforms a DNN-based i-vector baseline.

521 0

Paper Graph

ATST: Audio Representation Learning with Teacher-Student Transformer

Xian Li, Xiaofei Li•Mon Apr 25 2022

This work addresses the problem of segment-level general audio SSL, and proposes a new transformer-based teacher-student SSL model, named ATST, which achieves the new state-of-the-art results on almost all of the downstream tasks.

27 0

Paper Graph

AM-MobileNet1D: A Portable Model for Speaker Recognition

David Macêdo, C. Zanchettin, João Antônio Chagas Nunes•Mon Mar 30 2020

A portable model called Additive Margin Mobile net1D (AM-MobileNet1D) to Speaker Identification on mobile devices is proposed, which takes only 11.6 megabytes on disk storage against 91.2 from SincNet and AM-SincNet architectures, making the model seven times faster, with eight times fewer parameters.

27 0

Paper Graph

AutoSpeech: Neural Architecture Search for Speaker Recognition

Zhangyang Wang, Xinyu Gong, Tianlong Chen, Shaojin Ding, Weiwei Zha•Wed May 06 2020

Results demonstrate that the derived CNN architectures from the proposed approach significantly outperform current speaker recognition systems based on VGG-M, Res net-18, and ResNet-34 back-bones, while enjoying lower model complexity.

61 0

Paper Graph

Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation

Shang-Wen Li, Hung-yi Lee, Tsung-Han Wu, Po-Han Chi, Pei-Hung Chung, Chun-Cheng Hsieh•Sun May 17 2020

This work proposes Audio ALBERT, a lite version of the self-supervised speech representation model, and applies the lightweight representation extractor to two downstream tasks, speaker classification and phoneme classification, showing that it achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

159 0

Paper Graph

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training

Jinyu Li, Shujie Liu, Zhuo Chen, Jian Wu, Furu Wei, Yu Wu, Chengyi Wang, Sanyuan Chen, Zhengyang Chen, Yao Qian, Xiangzhan Yu•Mon Oct 11 2021

This paper applies multi-task learning to the current SSL framework for speaker representation learning, and proposes an utterance mixing strategy for data augmentation, where additional overlapped utterances are created unsupervisely and incorporated during training.

125 0

Paper Graph

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Shuo Ren, Long Zhou, Jinyu Li, Shujie Liu, Furu Wei, Tom Ko, Junyi Ao, Rui Wang, Yu Wu, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian•Wed Oct 13 2021

Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.

251 0

Paper Graph

Masked Autoencoders that Listen

Po-Yao (Bernie) Huang, Michael Auli, Wojciech Galuba, Christoph Feichtenhofer, Florian Metze, Alexei Baevski, Hu Xu, Juncheng Billy Li•Tue Jul 12 2022

The Audio-MAE, a simple extension of image-based Masked Autoencoders to self-supervised representation learning from audio spectrograms, sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training.

395 0

Paper Graph

Learning Speaker Representations with Mutual Information

Yoshua Bengio, M. Ravanelli•Fri Nov 30 2018

This work learns representations that capture speaker identities by maximizing the mutual information between the encoded representations of chunks of speech randomly sampled from the same sentence.

94 0

Paper Graph

Adding a benchmark result helps the community track progress.