3260 papers • 126 benchmarks • 313 datasets
The benchmark evaluates a generative Video Conversational Model with respect to Correctness of Information. We curate a test set based on the ActivityNet-200 dataset, featuring videos with rich, dense descriptive captions and associated question-answer pairs from human annotations. We develop an evaluation pipeline using the GPT-3.5 model that assigns a relative score to the generated predictions on a scale of 1-5.
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-based-generative-performance-benchmarking-correctness-of-information-18
Use these libraries to find video-based-generative-performance-benchmarking-correctness-of-information-18 models and implementations
No datasets available.
No subtasks available.
Adding a benchmark result helps the community track progress.