3260 papers • 126 benchmarks • 313 datasets
Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each multi-label image, aims to train MLR models with partial labels to reduce the annotation cost. Since existing MLR datasets have complete labels, current works propose to randomly drop a certain proportion of positive and negative labels to create partially annotated datasets, and report the results on the known labels proportion of 10% to 90%.
(Image credit: Papersgraph)
These leaderboards are used to track progress in multi-label-image-recognition-with-partial-labels-34
Use these libraries to find multi-label-image-recognition-with-partial-labels-34 models and implementations
No subtasks available.
A Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph propagation mechanism.
The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important information, we propose graph convolutional networks (GCNs) based models for multi-label image recognition, where directed graphs are constructed over classes and information is propagated between classes to learn inter-dependent class-level representations. Following this idea, we design two particular models that approach multi-label classification from different views. In our first model, the prior knowledge about the class dependencies is integrated into classifier learning. Specifically, we propose Classifier Learning GCN (C-GCN) to map class-level semantic representations (e.g., word embeddings) into classifiers that maintain the inter-class topology. In our second model, we decompose the visual representation of an image into a set of label-aware features and propose prediction learning GCN (P-GCN) to encode such features into inter-dependent image-level prediction scores. Furthermore, we also present an effective correlation matrix construction approach to capture inter-class relationships and consequently guide information propagation among classes. Empirical results on generic multi-label image recognition demonstrate that both of the proposed models can obviously outperform other existing state-of-the-arts. Moreover, the proposed methods also show advantages in some other multi-label classification related applications.
A structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i.e., merely some labels are known while other labels are missing (also called unknown labels) per image is proposed.
A unified semantic-aware representation blending framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules that obtains superior performance over current leading competitors on all known label proportion settings.
This work proposes a novel heterogeneous semantic transfer (HST) framework that consists of two complementary transfer modules that explore both within-image and cross-image semantic correlations to transfer the knowledge possessed by known labels to generate pseudo labels for the unknown labels.
A dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images, from instance and prototype perspective respectively, to transfer information of known labels to complement unknown labels.
The strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs is utilized and Dual Context Optimization (DualCoOp) is proposed as a unified framework for partial-label MLR and zero-shot MLR.
This work applies TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning and proposes double-grained prompt tuning (TaI-DPT), which outperforms zero-shot CLIP by a large margin on multiple benchmarks.
Adding a benchmark result helps the community track progress.