3260 papers • 126 benchmarks • 313 datasets
( Image credit: Siamese Mask R-CNN )
(Image credit: Papersgraph)
These leaderboards are used to track progress in one-shot-object-detection
Use these libraries to find one-shot-object-detection models and implementations
No subtasks available.
Siamese Mask R-CNN is extended by a Siamese backbone encoding both reference image and scene, allowing it to target detection and segmentation towards the reference category.
Quasi-Dense Similarity Learning is presented, which densely samples hundreds of region proposals on a pair of images for contrastive learning and which outperforms all existing methods on MOT, BDD100K, Waymo, and TAO tracking benchmarks.
The paper presents a holistic approach for designing such systems; the data collection and training stages, the CNN architecture, and the optimizations necessary to efficiently map such a CNN on a lightweight embedded processing platform suitable for deployment on UAVs.
This paper proposes a strong recipe for transferring image-text models to open-vocabulary object detection using a standard Vision Transformer architecture with minimal modifications, contrastive image- text pre-training, and end-to-end detection fine-tuning.
A novel CoAE framework that develops a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects, and designs a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query.
Experimental evaluation shows that the one-stage system that performs localization and recognition jointly can detect unseen classes and outperforms several baselines by a significant margin.
A two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module is introduced, the combination of which integrates metric learning with an anchor-free Faster R-CNN-style detection pipeline, eventually eliminating the need to fine-tune on the support images.
Instance-level feature matching is significantly important to the success of modern one-shot object detectors. Re-cently, the methods based on the metric-learning paradigm have achieved an impressive process. Most of these works only measure the relations between query and target objects on a single level, resulting in suboptimal performance overall. In this paper, we introduce the balanced and hierarchical learning for our detector. The contributions are two-fold: firstly, a novel Instance-level Hierarchical Relation (IHR) module is proposed to encode the contrastive-level, salient-level, and attention-level relations simultane-ously to enhance the query-relevant similarity representation. Secondly, we notice that the batch training of the IHR module is substantially hindered by the positive-negative sample imbalance in the one-shot scenario. We then in-troduce a simple but effective Ratio-Preserving Loss (RPL) to protect the learning of rare positive samples and sup-press the effects of negative samples. Our loss can adjust the weight for each sample adaptively, ensuring the desired positive-negative ratio consistency and boosting query-related IHR learning. Extensive experiments show that our method outperforms the state-of-the-art method by 1.6% and 1.3% on PASCAL VOC and MS COCO datasets for unseen classes, respectively. The code will be available at https://github.com/hero-y/BHRL.
This work proposes a novel and efficient decision-based attack against black-box models, dubbed FastDrop, which only requires a few queries and work well under strong defenses, and generates adversarial examples by dropping information in the frequency domain.
This work proposes a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process.
Adding a benchmark result helps the community track progress.