3260 papers • 126 benchmarks • 313 datasets
Image credit: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation
(Image credit: Papersgraph)
These leaderboards are used to track progress in unsupervised-object-segmentation-5
Use these libraries to find unsupervised-object-segmentation-5 models and implementations
No subtasks available.
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
The Multi-Object Network (MONet) is developed, which is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements.
This work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic stick-breaking process to develop a new model, GENESIS-v2, which can infer a variable number of object representations without using RNNs or iterative refinement.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most state-of-the-art generative models do not explicitly capture the compositional nature of visual scenes. Two recent exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised fashion. Their underlying generative processes, however, do not account for component interactions. Hence, neither of them allows for principled sampling of novel scenes. Here we present GENESIS, the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes by capturing relationships between scene components. GENESIS parameterises a spatial GMM over images which is decoded from a set of object-centric latent variables that are either inferred sequentially in an amortised fashion or sampled from an autoregressive prior. We train GENESIS on several publicly available datasets and evaluate its performance on scene generation, decomposition, and semi-supervised learning.
The Phase-Correlation Decomposition Network (PCDNet), a novel model that decomposes a scene into its object components, which are represented as transformed versions of a set of learned object prototypes, is proposed.
A novel copy-pasting GAN framework is proposed, where the generator learns to discover an object in one image by compositing it into another image such that the discriminator cannot tell that the resulting image is fake.
ReDO is presented, a new model able to extract objects from images without any annotation in an unsupervised way based on the idea that it should be possible to change the textures or colors of the objects without changing the overall distribution of the dataset.
A novel framework to build a model that can learn how to segment objects from a collection of images without any human annotation is introduced, based on the observation that the location of object segments can be perturbed locally relative to a given background without affecting the realism of a scene.
This work demonstrates that large-scale unsupervised models can also perform a more challenging object segmentation task, requiring neither pixel-level nor image-level labeling, achieving new state-of-the-art results.
This work is the first truly end-to-end zero-shot object segmentation from videos and develops generic objectness for segmentation and tracking, but also outperforms prevalent image-based contrastive learning methods without augmentation engineering.
Adding a benchmark result helps the community track progress.