visual-speech-recognition

Lip to Speech Synthesis

3260 papers • 126 benchmarks • 313 datasets

Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in visual-speech-recognition

Trend

Dataset

Best Model

Actions

LRW

Libraries

i

Use these libraries to find visual-speech-recognition models and implementations

Datasets

LRW

GLips

Subtasks

Speaker-Specific Lip to Speech Synthesis

Most implemented papers

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

C. V. Jawahar, Prajwal K R, Vinay P. Namboodiri, Rudrabha Mukhopadhyay•Sat May 16 2020

This work proposes a novel approach with key design choices to achieve accurate, natural lip to speech synthesis in such unconstrained scenarios for the first time and shows that its method is four times more intelligible than previous works in this space.

132

Content

0

Paper Graph

Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Y. Ro•Sun Apr 03 2022

A novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis and synchronization learning is introduced as a form of contrastive learning that guides the generator to synthesize a speech in sync with the given input lip movements.

67 0

Paper Graph

Intelligible Lip-to-Speech Synthesis with Speech Units

Minsu Kim, Y. Ro, J. Choi•Tue May 30 2023

This paper proposes to use quantized self-supervised speech representations, named speech units, as an additional prediction target for the L2S model, and introduces a multi-input vocoder that can generate a clear waveform even from blurry and noisy mel-spectrogram by referring to the speech units.

37 0

Paper Graph

Show Me Your Face, And I'll Tell You How You Speak

Christen Millerdurai, Lotfy H. Abdel Khaliq, Timon Ulrich•Mon Jun 27 2022

This work presents a novel method, "Lip2Speech", where the speaker's voice identity is captured through their facial characteristics, i.e., age, gender, ethnicity and condition them along with the lip movements to generate speaker identity aware speech.

1 0

Paper Graph

Lip-to-Speech Synthesis in the Wild with Multi-Task Learning

Minsu Kim, Joanna Hong, Y. Ro•Thu Feb 16 2023

A powerful Lip2Speech method that can reconstruct speech with correct contents from the input lip movements, even in a wild environment is developed and verified using LRS2, LRS3, and LRW datasets.

28 0

Paper Graph

Adding a benchmark result helps the community track progress.