Emotional Voice Conversion: Theory, Databases and ESD (2021-05-31T00:00:00.000000Z)

TL;DR

The development of a novel emotional speech database (ESD) that addresses the increasing research need is motivated and the ESD database is now made available to the research community.

Authors

Kun Zhou

4 papers

Berrak Sisman

3 papers

Rui Liu

3 papers

Haizhou Li

8 papers

TL;DR

Authors

References201 items

Design and Development of Cost-Effective Child Surveillance System using Computer Vision Technology

Generative Adversarial Networks

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

EmoCat: Language-agnostic Emotional Voice Conversion

Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

Controllable Emotion Transfer For End-to-End Speech Synthesis

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech

The Blizzard Challenge 2020

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset

Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition

Nonparallel Emotional Speech Conversion Using VAE-GAN

Learning Cross-Modal Representations for Language-Based Image Manipulation

Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

ICE-Talk: an Interface for a Controllable Expressive Talking Machine

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Expressive TTS Training With Frame and Style Reconstruction Loss

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network

Beyond Correlation: Acoustic Transformation Methods for the Experimental Study of Emotional Voice and Speech

Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network

Transferring Source Style in Non-Parallel Voice Conversion

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Stargan for Emotional Speech Conversion: Validated by Data Augmentation of End-To-End Emotion Recognition

Multi-Speaker and Multi-Domain Emotional Voice Conversion Using Factorized Hierarchical Variational Autoencoder

One-Shot Voice Conversion by Vector Quantization

Multi-Target Emotional Voice Conversion With Neural Vocoders

A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice

Voxceleb: Large-scale speaker verification in the wild

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion

CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

Emotional Voice Conversion Using Multitask Learning with Text-To-Speech

Emotional Speech Synthesis with Rich and Granularized Control

How many words do you need to speak Arabic? An Arabic vocabulary size test

Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network

Attribute Manipulation Generative Adversarial Networks for Fashion Images

Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features

The Blizzard Challenge 2019

CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition

Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks

VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English

A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective

A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis

Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations

Multi-speaker Emotional Acoustic Modeling for CNN-based Speech Synthesis

Sequence-to-sequence Modelling of F0 for Speech Emotion Conversion

Reinforcement Learning Based Emotional Editing Constraint Conversation Generation

Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

DEMoS: an Italian emotional speech corpus

Exploring Transfer Learning for Low Resource Emotional TTS

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion

ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms

Nonparallel Emotional Speech Conversion

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

An Open Source Emotional Speech Corpus for Human Robot Interaction Applications

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder

Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion

The Age of Artificial Emotional Intelligence

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks

Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space

Phonetically Aware Exemplar-Based Prosody Transformation