3260 papers • 126 benchmarks • 313 datasets
Few-Shot Object Detection is a computer vision task that involves detecting objects in images with limited training data. The goal is to train a model on a few examples of each object class and then use the model to detect objects in new images.
(Image credit: Papersgraph)
These leaderboards are used to track progress in few-shot-object-detection-1
Use these libraries to find few-shot-object-detection-1 models and implementations
This work builds ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating (pre-trained) language-AUgmented visual models.
This work finds that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task, and establishes a new state of the art on the revised benchmarks.
This work develops a few-shot object detector that can learn to detect novel objects from only a few annotated examples, using a meta feature learner and a reweighting module within a one-stage detection architecture.
This work proposes a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD, which generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
A class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way, achieving state-of-the-art performance.
Siamese Mask R-CNN is extended by a Siamese backbone encoding both reference image and scene, allowing it to target detection and segmentation towards the reference category.
A novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples, which exploits the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background.
A grounded language-image pretraining model for learning object-level, language-aware, and semantic-rich visual representations that unifies object detection and phrase grounding for pre-training and can leverage massive image-text pairs by generating grounding boxes in a self-training fashion.
An overview of the 1st NTIRE 2025 CD-FSOD Challenge is presented, highlighting the proposed solutions and summarizing the results submitted by the participants.
This study proposes to integrally migrate pre-trained transformer encoder-decoders (imTED) to a detector, constructing a feature extraction path which is "fully pre-trained" so that detectors’ generalization capacity is maximized.
Adding a benchmark result helps the community track progress.