Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision

Talking Head Generation

3260 papers • 126 benchmarks • 313 datasets

Talking head generation is the task of generating a talking face from a set of images of a person. ( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in talking-head-generation-1

Trend

Dataset

Best Model

Actions

VoxCeleb2 - 1-shot learning

VoxCeleb2 - 1-shot learning

VoxCeleb1 - 1-shot learning

VoxCeleb1 - 1-shot learning

Libraries

i

Use these libraries to find talking-head-generation-1 models and implementations

Datasets

VoxCeleb1

VoxCeleb2

AnimeCeleb

Subtasks

Unconstrained Lip-synchronization

Most implemented papers

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

V. Lempitsky, Egor Zakharov, Aliaksandra Shysheya, Egor Burkov•Sun May 19 2019

This work presents a system that performs lengthy meta-learning on a large dataset of videos, and is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators.

673

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

VoxCeleb1 - 8-shot learning

VoxCeleb1 - 8-shot learning

VoxCeleb1 - 32-shot learning

VoxCeleb1 - 32-shot learning

VoxCeleb2 - 8-shot learning

VoxCeleb2 - 8-shot learning

VoxCeleb2 - 32-shot learning

VoxCeleb2 - 32-shot learning

100 sleep nights of 8 caregivers

100 sleep nights of 8 caregivers

0

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

C. V. Jawahar, Prajwal K R, Vinay P. Namboodiri, Rudrabha Mukhopadhyay•Sat Aug 22 2020

This work investigates the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment, and identifies key reasons pertaining to this and hence resolves them by learning from a powerful lip-sync discriminator.

1033 0

MakeItTalk: Speaker-Aware Talking Head Animation

Eli Shechtman, E. Kalogerakis, Dingzeyu Li, Xintong Han, J. Echevarria, Yang Zhou•Sun Apr 26 2020

A method that generates expressive talking heads from a single facial image with audio as the only input that is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework.

506 0

ReenactGAN: Learning to Reenact Faces via Boundary Transfer

Chen Change Loy, Wayne Wu, C. Qian, Cheng Li, Yunxuan Zhang•Sat Jul 28 2018

The proposed method, known as ReenactGAN, is capable of transferring facial movements and expressions from an arbitrary person's monocular video input to a target person’s video, and can perform photo-realistic face reenactment.

220 0

Text-based editing of talking-head video

Eli Shechtman, M. Zollhöfer, A. Tewari, Ohad Fried, C. Theobalt, Maneesh Agrawala, Dan B. Goldman, Adam Finkelstein, Kyle Genova, Zeyu Jin•Mon Jun 03 2019

This work proposes a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).

279 0

Neural Voice Puppetry: Audio-driven Facial Reenactment

M. Nießner, Justus Thies, A. Tewari, C. Theobalt, Mohamed A. Elgharib•Tue Dec 10 2019

This work presents Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis that generalizes across different people, allowing it to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches.

427 0

0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum

Gabriel García, Robert J. Griffin, J. Pratt•Wed Dec 11 2019

This work provides the necessary and sufficient conditions to maintain balance of the 3D Variable Height Inverted Pendulum (VHIP) with both, fixed and variable CoP, and shows the generalization of the Divergent Component of Motion to the3D VHIP.

2 0

What comprises a good talking-head video generation?: A Survey and Benchmark

Chenliang Xu, Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng•Wed May 06 2020

This work presents a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies, and aims to uncover the merits and drawbacks of current methods and point out promising directions for future work.

69 0

Talking-head Generation with Rhythmic Head Motion

Chenliang Xu, Zhong Li, Yi Xu, Lele Chen, Guofeng Cui, Ziyi Kou, Celong Liu•Wed Jul 15 2020

This work proposes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module that achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements.

199 0

Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

V. Lempitsky, Egor Zakharov, Aliaksandra Shysheya, Aleksei Ivakhnenko•Sat Aug 22 2020

A neural rendering-based system that creates head avatars from a single photograph by decomposing it into two layers that is compared to analogous state-of-the-art systems in terms of visual quality and speed.

177 0

Adding a benchmark result helps the community track progress.

Talking Head Generation | State-of-the-Art