3260 papers • 126 benchmarks • 313 datasets
This task deals with lip-syncing a video (or) an image to the desired target speech. Approaches in this task work only for a specific (limited set) of identities, languages, speech/voice. See also: Unconstrained lip-synchronization - https://paperswithcode.com/task/lip-sync
(Image credit: Papersgraph)
These leaderboards are used to track progress in constrained-lip-synchronization
No benchmarks available.
Use these libraries to find constrained-lip-synchronization models and implementations
No datasets available.
No subtasks available.
A novel conditional video generation network where the audio input is treated as a condition for the recurrent adversarial network such that temporal dependency is incorporated to realize smooth transition for the lip and facial movement is proposed.
This work presents an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements, based on deep audio-visual features.
This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.
A deep learning based interactive system that automatically generates live lip sync for layered 2D characters using a Long Short Term Memory (LSTM) model that takes streaming audio as input and produces viseme sequences with less than 200ms of latency.
The proposed detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS), outperforms the state-of-the-art by up to 7%.
The first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text is presented, and it is claimed that this architecture is the first to be composed of fully trainable neural modules.
Adding a benchmark result helps the community track progress.