Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision

Talking Face Generation

3260 papers • 126 benchmarks • 313 datasets

Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics ( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in talking-face-generation-1

Trend

Dataset

Best Model

Actions

LRW

LRW

CREMA-D

CREMA-D

Libraries

i

Use these libraries to find talking-face-generation-1 models and implementations

Datasets

LRW

PASCAL VOC

VOCASET

CREMA-D

GLips

AnimeCeleb

Subtasks

Constrained Lip-synchronization Face Dubbing

Most implemented papers

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

C. V. Jawahar, Prajwal K R, Vinay P. Namboodiri, Rudrabha Mukhopadhyay•Sat Aug 22 2020

This work investigates the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment, and identifies key reasons pertaining to this and hence resolves them by learning from a powerful lip-sync discriminator.

1033

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

MakeItTalk: Speaker-Aware Talking Head Animation

Eli Shechtman, E. Kalogerakis, Dingzeyu Li, Xintong Han, J. Echevarria, Yang Zhou•Sun Apr 26 2020

A method that generates expressive talking heads from a single facial image with audio as the only input that is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework.

506 0

Talking Face Generation by Conditional Recurrent Adversarial Network

H. Qi, Jingwen Zhu, Dawei Li, Xiaolong Wang•Thu Apr 12 2018

A novel conditional video generation network where the audio input is treated as a condition for the recurrent adversarial network such that temporal dependency is incorporated to realize smooth transition for the lip and facial movement is proposed.

210 0

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Ziwei Liu, Ping Luo, Xiaogang Wang, Yu Liu, Hang Zhou•Thu Jul 19 2018

This work finds that the talking face sequence is actually a composition of both subject- related information and speech-related information, and learns disentangled audio-visual representation, which has an advantage where both audio and video can serve as inputs for generation.

477 0

ReenactGAN: Learning to Reenact Faces via Boundary Transfer

Chen Change Loy, Wayne Wu, C. Qian, Cheng Li, Yunxuan Zhang•Sat Jul 28 2018

The proposed method, known as ReenactGAN, is capable of transferring facial movements and expressions from an arbitrary person's monocular video input to a target person’s video, and can perform photo-realistic face reenactment.

220 0

Capture, Learning, and Synthesis of 3D Speaking Styles

Michael J. Black, Timo Bolkart, Daniel Cudeiro, Cassidy Laidlaw, Anurag Ranjan•Tue May 07 2019

A unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers is introduced and VOCA (Voice Operated Character Animation) is learned, the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting.

403 0

Neural Voice Puppetry: Audio-driven Facial Reenactment

M. Nießner, Justus Thies, A. Tewari, C. Theobalt, Mohamed A. Elgharib•Tue Dec 10 2019

This work presents Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis that generalizes across different people, allowing it to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches.

427 0

Speech Driven Talking Face Generation From a Single Image and an Emotion Condition

Z. Duan, You Zhang, S. Eskimez•Fri Aug 07 2020

An end-to-end talking face generation system that takes a speech utterance, a single face image, and a categorical emotion label as input to render a talking face video synchronized with the speech and expressing the conditioned emotion is designed.

109 0

Stochastic Talking Face Generation Using Latent Distribution Matching

Vinay P. Namboodiri, Ravindra Yadav, Ashish Sardana, R. Hegde•Sat Oct 24 2020

This work presents an unsupervised stochastic audio-to-video generation model that can capture multiple modes of the video distribution and does so through a principled multi-modal variational autoencoder framework.

4 0

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

H. Bao, Keyu Chen, Juyong Zhang, Yudong Guo, Sen Liang, Yongjin Liu•Fri Mar 19 2021

Experimental results demonstrate that the novel framework can produce high-fidelity and natural results, and support free adjustment of audio signals, viewing directions, and background images.

482 0

Adding a benchmark result helps the community track progress.

Talking Face Generation | State-of-the-Art