3260 papers • 126 benchmarks • 313 datasets
Scene parsing is to segment and parse an image into different image regions associated with semantic categories, such as sky, road, person, and bed. MIT Description
(Image credit: Papersgraph)
These leaderboards are used to track progress in scene-parsing-20
Use these libraries to find scene-parsing-20 models and implementations
This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.
This paper addresses the semantic segmentation task with a new context aggregation scheme named \emph{object context}, which focuses on enhancing the role of object information by using a dense relation matrix to serve as a surrogate for the binary relation matrix.
Novel deep dual-resolution networks (DDRNets) are proposed for real-time semantic segmentation of road scenes and a new contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) is designed to enlarge effective receptive fields and fuse multi-scale context.
A novel deep learning architecture, \resuneta, is presented that combines ideas from various state of the art modules used in computer vision for semantic segmentation tasks that has better convergence properties and behaves well even under the presence of highly imbalanced classes.
This paper proposes a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels, and broadcast high-level features to high resolution features effectively and efficiently and exhibits superior performance over other real-time methods even on light-weight backbone networks.
The point-wise spatial attention network (PSANet) is proposed to relax the local neighborhood constraint and achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.
This work proposes OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design that outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, Cityscapes, and COCO.
Different architectures based on PyConv are presented for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing, showing significant improvements over all these core tasks in comparison with the baselines.
Adding a benchmark result helps the community track progress.