Introduced by Song et al. in TVSum: Summarizing web videos using titles.
The TVSum dataset comprises 50 videos, with durations ranging from 1 to 11 minutes. These videos belong to 10 different categories associated with the TRECVid MED task, with 5 videos in each category, and were collected from YouTube. The video categories include various activities like changing a vehicle tire, making a sandwich, and flash mob gatherings. For annotation, each video was reviewed and rated by 20 users, who assigned frame-level importance scores on a scale from 1 (not important) to 5 (very important).