Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

object-detection

Zero-Shot Object Detection

3260 papers • 126 benchmarks • 313 datasets

Zero-shot object detection (ZSD) is the task of object detection where no visual training data is available for some of the target object classes. ( Image credit: Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in object-detection

Trend

Dataset

Best Model

Actions

LVIS v1.0 minival

LVIS v1.0 minival

MS-COCO

MS-COCO

LVIS v1.0 val

LVIS v1.0 val

Libraries

i

Use these libraries to find object-detection models and implementations

idea-research/dino-x-api

3 papers 764

Datasets

LVIS

PASCAL VOC 2007

PASCAL VOC 2007

MSCOCO

ELEVATER

RF100

Subtasks

No subtasks available.

Most implemented papers

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Hang Su, Jianwei Yang, Chun-yue Li, Hao Zhang, Feng Li, Tianhe Ren, Jun-Juan Zhu, Shilong Liu, Zhaoyang Zeng, Jie Yang, Lei Zhang•Wed Mar 08 2023

An open-set object detector, called Grounding DINO, is presented by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions, and performs remarkably well on all three settings.

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Zero-Shot Object Detection | State-of-the-Art

PASCAL VOC'07

PASCAL VOC'07

MSCOCO

MSCOCO

ODinW

ODinW

ImageNet Detection

ImageNet Detection

huggingface/transformers

2 papers 137,469

2 papers 2,296

idea-research/grounded-sam-2

2 papers 1,487

3418

0

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Jianfeng Gao, Houdong Hu, Zicheng Liu, Jianwei Yang, Pengchuan Zhang, Haotian Liu, Liunian Harold Li, J. Aneja, Ping Jin, Yong Jae Lee•Mon Apr 18 2022

This work builds ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating (pre-trained) language-AUgmented visual models.

178 0

Learning Open-World Object Proposals without Learning to Classify

Tsung-Yi Lin, In-So Kweon, Weicheng Kuo, A. Angelova, Dahun Kim•Sat Aug 14 2021

A classification-free Object Localization Network (OLN) is proposed which estimates the objectness of each region purely by how well the location and shape of a region overlap with any ground-truth object.

155 0

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

Yin Cui, Tsung-Yi Lin, Weicheng Kuo, Xiuye Gu•Tue Apr 27 2021

This work distills the knowledge from a pretrained open-vocabulary image classification model (teacher) into a two-stage detector (student), which uses the teacher model to encode category texts and image regions of object proposals and trains a student detector, whose region embeddings of detected boxes are aligned with the text and image embedDings inferred by the teacher.

1170 0

Polarity Loss for Zero-shot Object Detection

Salman Hameed Khan, Nick Barnes, Shafin Rahman•Wed Nov 21 2018

A novel loss function called 'Polarity loss' is proposed, that promotes correct visual-semantic alignment for an improved zero-shot object detection and refines the noisy semantic embeddings via metric learning on a 'Semantic vocabulary' of related concepts to establish a better synergy between visual and semantic domains.

49 0

Grounded Language-Image Pre-training

Jianfeng Gao, Chunyuan Li, Lu Yuan, Lijuan Wang, Jianwei Yang, Pengchuan Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Liunian Harold Li, Lei Zhang, Haotian Zhang, Yiwu Zhong•Mon Dec 06 2021

A grounded language-image pretraining model for learning object-level, language-aware, and semantic-rich visual representations that unifies object detection and phrase grounding for pre-training and can leverage massive image-text pairs by generating grounding boxes in a self-training fashion.

1425 0

Scaling Open-Vocabulary Object Detection

N. Houlsby, A. Gritsenko, M. Minderer•Thu Jun 15 2023

The OWLv2 model and OWL-ST self-training recipe, which surpasses the performance of previous state-of-the-art open-vocabulary detectors already at comparable training scales and unlocks Web-scale training for open-world localization, similar to what has been seen for image classification and language modelling.

322 0

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Xinggang Wang, Yixiao Ge, Ying Shan, Lin Song, Wenyu Liu•Mon Jan 29 2024

YOLO-World is introduced, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets and proposes a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information.

677 0

Zero-shot instance segmentation for plant phenotyping in vertical farming with foundation models and VC-NMS

Qin-Zhou Bao, Yi-Xin Yang, Qing Li, Hai-Chao Yang•Sun May 04 2025

The integration of domain-specific indices (NCGI) and prompt optimization techniques provides an effective solution for plant phenotyping, highlighting the potential of weakly supervised models in agricultural computer vision where extensive manual annotation is impractical.

6 0

Zero-Shot Object Detection by Hybrid Region Embedding

Nazli Ikizler-Cinbis, R. G. Cinbis, B. Demirel•Mon Apr 30 2018

A novel approach to tackle zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes is presented, where a convex combination of embeddings are used in conjunction with a detection framework.

103 0

Adding a benchmark result helps the community track progress.