3260 papers • 126 benchmarks • 313 datasets
Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).
(Image credit: Papersgraph)
These leaderboards are used to track progress in generalized-referring-expression-segmentation-5
Use these libraries to find generalized-referring-expression-segmentation-5 models and implementations
No subtasks available.
A region-based GRES baseline ReLA is proposed that adaptively divides the image into regions with subinstance clues, and explicitly models the region-region and region-language dependencies and achieves new state-of-the-art performance on the both newly proposed GRES and classic RES tasks.
This work proposes to decompose expressions into three modular components related to subject appearance, location, and relationship to other objects, which allows for flexibly adapt to expressions containing different types of information in an end-to-end framework.
Transformer and multi-head attention are introduced and a Query Generation Module is proposed, which produces multiple sets of queries with different attention weights that represent the diversified comprehensions of the language expression from different aspects.
This work shows that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network.
Through extensive experiments, PSALM demonstrates its potential to transform the domain of image segmentation, leveraging the robust visual understanding capabilities of LMMs as seen in natural language processing.
This paper designs a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities and presents text-to-pixel contrastive learning.
Adding a benchmark result helps the community track progress.