Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

computer-vision-5

Person-centric Visual Grounding

3260 papers • 126 benchmarks • 313 datasets

Person-centric visual grounding is the problem of linking between people named in a caption and people pictured in an image. Introduced in "Who's Waldo? Linking People Across Text and Images" (Cui et al, ICCV 2021).

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in person-centric-visual-grounding-5

Trend

Dataset

Best Model

Actions

Who’s Waldo

Libraries

Use these libraries to find person-centric-visual-grounding-5 models and implementations

Datasets

Who’s Waldo

Subtasks

No subtasks available.

Most implemented papers

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Antoine Miech, I. Laptev, C. Schmid, Josef Sivic, Antoine Yang•Tue Mar 29 2022

TubeDETR is proposed, a transformer-based architecture inspired by the recent success of such models for text-conditioned object detection that includes an efficient video and text encoder that models spatial multi-modal interactions over sparsely sampled frames and a space-time decoder that jointly performs spatio-temporal localization.

122

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Paper Graph

Adding a benchmark result helps the community track progress.

Person-centric Visual Grounding | State-of-the-Art