3260 papers • 126 benchmarks • 313 datasets
The benchmark evaluates a generative Video Conversational Model with respect to Temporal Understanding. We curate a test set based on the ActivityNet-200 dataset, featuring videos with rich, dense descriptive captions and associated question-answer pairs from human annotations. We develop an evaluation pipeline using the GPT-3.5 model that assigns a relative score to the generated predictions on a scale of 1-5.
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-based-generative-performance-benchmarking-temporal-understanding-14
Use these libraries to find video-based-generative-performance-benchmarking-temporal-understanding-14 models and implementations
No subtasks available.
Adding a benchmark result helps the community track progress.