3260 papers • 126 benchmarks • 313 datasets
Talking head generation is the task of generating a talking face from a set of images of a person. ( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )
(Image credit: Papersgraph)
These leaderboards are used to track progress in talking-head-generation-1
Use these libraries to find talking-head-generation-1 models and implementations
This work presents a system that performs lengthy meta-learning on a large dataset of videos, and is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators.
This work investigates the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment, and identifies key reasons pertaining to this and hence resolves them by learning from a powerful lip-sync discriminator.
A method that generates expressive talking heads from a single facial image with audio as the only input that is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework.
The proposed method, known as ReenactGAN, is capable of transferring facial movements and expressions from an arbitrary person's monocular video input to a target person’s video, and can perform photo-realistic face reenactment.
This work proposes a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).
This work presents Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis that generalizes across different people, allowing it to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches.
This work provides the necessary and sufficient conditions to maintain balance of the 3D Variable Height Inverted Pendulum (VHIP) with both, fixed and variable CoP, and shows the generalization of the Divergent Component of Motion to the3D VHIP.
This work presents a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies, and aims to uncover the merits and drawbacks of current methods and point out promising directions for future work.
This work proposes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module that achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements.
A neural rendering-based system that creates head avatars from a single photograph by decomposing it into two layers that is compared to analogous state-of-the-art systems in terms of visual quality and speed.
Adding a benchmark result helps the community track progress.