Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-19

Visual Grounding

3260 papers • 126 benchmarks • 313 datasets

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG: What is the main focus in a query? How to understand an image? How to locate an object?

(Image credit: Open Source)

Visual Grounding

Benchmarks

These leaderboards are used to track progress in visual-grounding-19

Trend

Dataset

Best Model

Actions

RefCOCO+ testA

RefCOCO+ testA

RefCOCO+ test B

RefCOCO+ test B

RefCOCO+ val

RefCOCO+ val

Libraries

i

Use these libraries to find visual-grounding-19 models and implementations

modelscope/modelscope

4 papers 5,965

Datasets

No datasets available.

Subtasks

Person-centric Visual Grounding Phrase Extraction and Grounding (PEG)

Most implemented papers

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Adding a benchmark result helps the community track progress.

Visual Grounding | State-of-the-Art