1
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
2
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
3
CLUSTSEG: Clustering for Universal Segmentation
4
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
5
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
6
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
7
Side Adapter Network for Open-Vocabulary Semantic Segmentation
8
Generalized Decoding for Pixel, Image, and Language
9
ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
10
OneFormer: One Transformer to Rule Universal Image Segmentation
11
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
12
LAION-5B: An open large-scale dataset for training next generation image-text models
13
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
14
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
15
Open-Vocabulary Universal Image Segmentation with MaskCLIP
16
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
17
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
18
Vision Transformer Adapter for Dense Predictions
19
CoCa: Contrastive Captioners are Image-Text Foundation Models
20
Flamingo: a Visual Language Model for Few-Shot Learning
21
GroupViT: Semantic Segmentation Emerges from Text Supervision
22
A ConvNet for the 2020s
23
Language-driven Semantic Segmentation
24
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
25
High-Resolution Image Synthesis with Latent Diffusion Models
26
Decoupling Zero-Shot Semantic Segmentation
27
Masked-attention Mask Transformer for Universal Image Segmentation
28
Extract Free Dense Labels from CLIP
29
Florence: A New Foundation Model for Computer Vision
30
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
31
Per-Pixel Classification is Not All You Need for Semantic Segmentation
32
VinVL: Revisiting Visual Representations in Vision-Language Models
33
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
34
Segmenter: Transformer for Semantic Segmentation
35
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
36
Learning Transferable Visual Models From Natural Language Supervision
37
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
38
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
39
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
40
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
41
Scaling Wide Residual Networks for Panoptic Segmentation
42
Open-Vocabulary Object Detection Using Captions
43
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
44
Deformable DETR: Deformable Transformers for End-to-End Object Detection
45
Contrastive Learning for Weakly Supervised Phrase Grounding
46
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
47
Language Models are Few-Shot Learners
48
End-to-End Object Detection with Transformers
49
SOLOv2: Dynamic and Fast Instance Segmentation
50
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
51
Conditional Convolutions for Instance Segmentation
52
Unifying Training and Inference for Panoptic Segmentation
53
Connecting Vision and Language with Localized Narratives
54
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
55
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
56
UNITER: UNiversal Image-TExt Representation Learning
57
Object-Contextual Representations for Semantic Segmentation
58
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
59
RoBERTa: A Robustly Optimized BERT Pretraining Approach
60
Zero-Shot Semantic Segmentation
61
Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
62
YOLACT: Real-Time Instance Segmentation
63
An End-To-End Network for Panoptic Segmentation
64
Hybrid Task Cascade for Instance Segmentation
65
UPSNet: A Unified Panoptic Segmentation Network
66
Panoptic Feature Pyramid Networks
67
Dual Attention Network for Scene Segmentation
68
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
69
Path Aggregation Network for Instance Segmentation
70
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
72
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
73
Cascade R-CNN: Delving Into High Quality Object Detection
74
Decoupled Weight Decay Regularization
75
The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes
76
Scene Parsing through ADE20K Dataset
77
Rethinking Atrous Convolution for Semantic Image Segmentation
78
Attention is All you Need
80
COCO-Stuff: Thing and Stuff Classes in Context
81
InstanceCut: From Edges to Instances with MultiCut
82
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
83
The Cityscapes Dataset for Semantic Urban Scene Understanding
84
Deep Residual Learning for Image Recognition
85
U-Net: Convolutional Networks for Biomedical Image Segmentation
86
Microsoft COCO Captions: Data Collection and Evaluation Server
87
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
88
Adam: A Method for Stochastic Optimization
89
Fully convolutional networks for semantic segmentation
90
ImageNet Large Scale Visual Recognition Challenge
91
Simultaneous Detection and Segmentation
92
The Role of Context for Object Detection and Semantic Segmentation in the Wild
93
Microsoft COCO: Common Objects in Context
94
Multiscale conditional random fields for image labeling
95
Least squares quantization in PCM
96
The Hungarian method for the assignment problem
97
k-means Mask Transformer
98
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model
100
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
101
Yfcc100m: The new data in multimedia research
102
Gradient-based learning applied to document recognition
103
International Journal of Computer Vision manuscript No. (will be inserted by the editor) The PASCAL Visual Object Classes (VOC) Challenge
104
Edinburgh Research Explorer The PASCAL Visual Object Classes (VOC) Challenge