Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

voice-conversion

Voice Conversion

3260 papers • 126 benchmarks • 313 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information. Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in voice-conversion

Trend

Dataset

Best Model

Actions

ZeroSpeech 2019 English

ZeroSpeech 2019 English

LibriSpeech test-clean

LibriSpeech test-clean

Libraries

i

Use these libraries to find voice-conversion models and implementations

3 papers 7,942

Datasets

LibriSpeech

ESD

VESUS

GneutralSpeech Female

GneutralSpeech Female

GneutralSpeech Male

GneutralSpeech Male

Subtasks

No subtasks available.

Most implemented papers

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Hung-yi Lee, Ju-Chieh Chou, Cheng-chieh Yeh•Tue Apr 09 2019

This paper proposed a novel one-shot VC approach which is able to perform VC by only an example utterance from source and target speaker respectively, and the source andtarget speaker do not even need to be seen during training.

270

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

andi611/Self-Supervised-Speech-Pret…

3 papers 2,111

3 papers 2,110

unilight/seq2seq-vc

3 papers 67

playvoice/grad-svc

2 papers 113

0

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

Takuhiro Kaneko, H. Kameoka•Wed Nov 29 2017

This work uses a cycle-consistent adversarial network (CycleGAN) with gated convolutional neural networks (CNNs) and an identity-mapping loss to learn a mapping from source to target speech without relying on parallel data.

217 0

Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion

Takuhiro Kaneko, H. Kameoka, Kou Tanaka, Nobukatsu Hojo•Mon Apr 08 2019

CycleGAN-VC2 is proposed, which is an improved version of CycleGAN- VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN).

280 0

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Xin Wang, J. Yamagishi, Szu-Wei Fu, Yu Tsao, Chen-Chou Lo, Wen-Chin Huang, H. Wang•Tue Apr 16 2019

Results confirm that the proposed deep learning-based assessment models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating.

330 0

Unsupervised Speech Decomposition via Triple Information Bottleneck

M. Hasegawa-Johnson, Shiyu Chang, Yang Zhang, David Cox, Kaizhi Qian•Wed Apr 22 2020

SpeechSplit is among the first algorithms that can separately perform style transfer on timbre, pitch and rhythm without text labels and can blindly decompose speech into its four components by introducing three carefully designed information bottlenecks.

200 0

Utilizing Self-supervised Representations for MOS Prediction

Hung-yi Lee, W. Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin•Tue Apr 06 2021

This paper uses self-supervised pre-trained models for MOS prediction and shows their representations can distinguish between clean and noisy audios and outperforms the two previous state-of-the-art models by a significant improvement on Voice Conversion Challenge 2018.

71 0

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

Hung-yi Lee, Andy T. Liu, Haibin Wu•Thu Jun 04 2020

Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples, and successfully counter black-box attacks.

32 0

Voice conversion from non-parallel corpora using variational auto-encoder

Yu Tsao, Chin-Cheng Hsu, H. Wang, Hsin-Te Hwang, Yi-Chiao Wu•Wed Oct 12 2016

An SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora and removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system is proposed.

316 0

Adding a benchmark result helps the community track progress.

Voice Conversion | State-of-the-Art