3260 papers • 126 benchmarks • 313 datasets
Object Discovery is the task of identifying previously unseen objects. Source: Unsupervised Object Discovery and Segmentation of RGBD-images
(Image credit: Papersgraph)
These leaderboards are used to track progress in object-discovery-10
No benchmarks available.
Use these libraries to find object-discovery-10 models and implementations
No subtasks available.
An architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention is presented.
The Multi-Object Network (MONet) is developed, which is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements.
An end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification that is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets.
A classification-free Object Localization Network (OLN) is proposed which estimates the objectness of each region purely by how well the location and shape of a region overlap with any ground-truth object.
This paper describes an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency and shows state-of-the-art performance in a real-word visual object discovery game.
This paper identifies and characterize artifacts in feature maps of both supervised and self-supervised ViT networks, and proposes a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most state-of-the-art generative models do not explicitly capture the compositional nature of visual scenes. Two recent exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised fashion. Their underlying generative processes, however, do not account for component interactions. Hence, neither of them allows for principled sampling of novel scenes. Here we present GENESIS, the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes by capturing relationships between scene components. GENESIS parameterises a spatial GMM over images which is decoded from a set of object-centric latent variables that are either inferred sequentially in an amortised fashion or sampled from an autoregressive prior. We train GENESIS on several publicly available datasets and evaluate its performance on scene generation, decomposition, and semi-supervised learning.
This work proposes a simple approach, LOST, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner, that outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012.
The Phase-Correlation Decomposition Network (PCDNet), a novel model that decomposes a scene into its object components, which are represented as transformed versions of a set of learned object prototypes, is proposed.
Adding a benchmark result helps the community track progress.