The aim is to recognise the words being spoken by a talking face, given only the video but not the audio, in a controlled environment.
Andrew Zisserman
62 papers
Joon Son Chung
11 papers