Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-20

Spatio-Temporal Video Grounding

3260 papers • 126 benchmarks • 313 datasets

Spatio-temporal video grounding is a computer vision and natural language processing (NLP) task that involves linking textual descriptions to specific spatio-temporal regions or moments in a video. In other words, it aims to determine which parts of a video correspond to a given textual query or description. This task is essential for various applications, including video summarization, content-based video retrieval, video captioning, and more.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in spatio-temporal-video-grounding-20

Trend

Dataset

Best Model

Actions

HC-STVG2

HC-STVG2

VidSTG

VidSTG

HC-STVG1

HC-STVG1

Libraries

i

Use these libraries to find spatio-temporal-video-grounding-20 models and implementations

Datasets

No datasets available.

Subtasks

No subtasks available.

Most implemented papers

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Adding a benchmark result helps the community track progress.

Spatio-Temporal Video Grounding | State-of-the-Art