Consists of more than 210k videos for 310 audio classes. Source: VGGSound: A Large-scale Audio-Visual Dataset