Described Object Detection

Described Object Detection (DOD) detects all instances on each image in the dataset, based on a flexible reference. It is a superset of Open-Vocabulary Object Detection (OVD) and Referring Expression Comprehension (REC). It expands category names to flexible language expressions for OVD and overcomes the limitation of REC only grounding the pre-existing object. Works related to DOD are tracked in awesome-DOD list on github.

Benchmarks

Libraries

Datasets

Subtasks

Most implemented papers

Grounded Language-Image Pre-training

Content

Simple Open-Vocabulary Object Detection with Vision Transformers

Universal Instance Perception as Object Discovery and Retrieval

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Described Object Detection: Liberating Object Detection with Flexible Expressions