3260 papers • 126 benchmarks • 313 datasets
Compositional Zero-Shot Learning (CZSL) is a computer vision task in which the goal is to recognize unseen compositions fromed from seen state and object during training. The key challenge in CZSL is the inherent entanglement between the state and object within the context of an image. Some example benchmarks for this task are MIT-states, UT-Zappos, and C-GQA. Models are usually evaluated with the Accuracy for both seen and unseen compositions, as well as their Harmonic Mean(HM). ( Image credit: Heosuab )
(Image credit: Papersgraph)
These leaderboards are used to track progress in compositional-zero-shot-learning-9
Use these libraries to find compositional-zero-shot-learning-9 models and implementations
No subtasks available.
While the simple CZSL model achieves state-of-the-art performances in the closed world scenario, the feasibility scores boost the performance of the approach in the open world setting, clearly outperforming the previous state of the art.
This paper proposes to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer, and equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted.
This work proposes a new approach, Compositional Cosine Graph Embedding (Co-CGE), which achieves state-of-the-art performances in standard CZSL while outperforming previous methods in the open world scenario.
This paper proposes a previously ignored principle of attribute-object transformation: Symmetry and builds a transformation framework inspired by group theory, i.e. SymNet, which can be utilized for the Compositional Zero-Shot Learning task and outperforms the state-of-the-art on widely-used benchmarks.
A causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data is presented, and improvements compared to strong baselines are shown.
A novel graph formulation called Compositional Graph Embedding (CGE) that learns image features, compositional classifiers and latent representations of visual primitives in an end-to-end manner and significantly outperforms the state of the art on MIT-States and UT-Zappos in the challenging generalized compositional zero-shot setting.
ProtoProp, a novel prototype propagation graph method, is proposed that in the generalized compositional zero-shot setting the authors outperform state-of-the-art results, and through ablations they show the importance of each part of the method and their contribution to the final results.
A novel model for recognizing images with composite attribute-object concepts, notably for composite concepts that are unseen during model training is proposed and a blocking mechanism is proposed that equalizes the information available to the model for both seen and unseen concepts.
A Relative Moving Distance (RMD) based method to utilize the attribute change instead of the attribute pattern itself to classify attributes and is suitable for complex compositions of multiple attributes and objects when incorporating attribute correlations.
Adding a benchmark result helps the community track progress.