3260 papers • 126 benchmarks • 313 datasets
Audio-visual speech recognition is the task of transcribing a paired audio and visual stream into text.
(Image credit: Papersgraph)
These leaderboards are used to track progress in audio-visual-speech-recognition-17
Use these libraries to find audio-visual-speech-recognition-17 models and implementations
No datasets available.
No subtasks available.
Adding a benchmark result helps the community track progress.