ImagesVideosTextsAudio

VTC

Videos, Titles and Comments

Introduced in VTC: Improving Video-Text Retrieval with User Comments2022

About this Dataset

VTC is a large-scale multimodal dataset containing video-caption pairs (~300k) alongside comments that can be used for multimodal representation learning.

Source: VTC: Improving Video-Text Retrieval with User Comments

Dataset Variants

VTC

Papers1

VTC: Improving Video-Text Retrieval with User Comments

This paper introduces a new dataset of videos, titles and comments and presents an attention-based mechanism that allows the model to learn from sometimes irrelevant data such as comments, and shows that by using comments, the method is able to learn better, more contextualised, representations for image, video and audio representations.

Tasks

EDIT

Video Retrieval Video Understanding Video-Text Retrieval

Similar Datasets