Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

computer-vision-5

Generalized Referring Expression Segmentation

3260 papers • 126 benchmarks • 313 datasets

Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in generalized-referring-expression-segmentation-5

Trend

Dataset

Best Model

Actions

gRefCOCO

Libraries

Use these libraries to find generalized-referring-expression-segmentation-5 models and implementations

Datasets

gRefCOCO

Subtasks

No subtasks available.

Most implemented papers

GRES: Generalized Referring Expression Segmentation

Henghui Ding, Chang Liu, Xudong Jiang•Wed May 31 2023

A region-based GRES baseline ReLA is proposed that adaptively divides the image into regions with subinstance clues, and explicitly models the region-region and region-language dependencies and achieves new state-of-the-art performance on the both newly proposed GRES and classic RES tasks.

252

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Paper Graph

MAttNet: Modular Attention Network for Referring Expression Comprehension

Xiaohui Shen, Xin Lu, Zhe L. Lin, Tamara L. Berg, Jimei Yang, Mohit Bansal, Licheng Yu•Tue Jan 23 2018

This work proposes to decompose expressions into three modular components related to subject appearance, location, and relationship to other objects, which allows for flexibly adapt to expressions containing different types of information in an end-to-end framework.

915 0

Paper Graph

Vision-Language Transformer and Query Generation for Referring Segmentation

Xudong Jiang, Henghui Ding, Chang Liu, Suchen Wang•Wed Aug 11 2021

Transformer and multi-head attention are introduced and a Query Generation Module is proposed, which produces multiple sets of queries with different attention weights that represent the diversified comprehensions of the language expression from different aspects.

335 0

Paper Graph

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Philip H. S. Torr, Yansong Tang, Kai Chen, Hengshuang Zhao, Zhao Yang•Fri Dec 03 2021

This work shows that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network.

431 0

Paper Graph

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai•Wed Mar 20 2024

Through extensive experiments, PSALM demonstrates its potential to transform the domain of image segmentation, leveraging the robust visual understanding capabilities of LMMs as seen in natural language processing.

83 0

Paper Graph

CRIS: CLIP-Driven Referring Image Segmentation

Tongliang Liu, Xunqiang Tao, Zhaoqing Wang, Yu Lu, Qiang Li, Yan Guo, Ming Gong•Mon Nov 29 2021

This paper designs a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities and presents text-to-pixel contrastive learning.

460 0

Paper Graph

Adding a benchmark result helps the community track progress.

Generalized Referring Expression Segmentation | State-of-the-Art