3260 papers • 126 benchmarks • 313 datasets
Panoptic Segmentation is a computer vision task that combines semantic segmentation and instance segmentation to provide a comprehensive understanding of the scene. The goal of panoptic segmentation is to segment the image into semantically meaningful parts or regions, while also detecting and distinguishing individual instances of objects within those regions. In a given image, every pixel is assigned a semantic label, and pixels belonging to "things" classes (countable objects with instances, like cars and people) are assigned unique instance IDs. ( Image credit: Detectron2 )
(Image credit: Papersgraph)
These leaderboards are used to track progress in panoptic-segmentation-7
Use these libraries to find panoptic-segmentation-7 models and implementations
This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
This work designs a new variant of the ResNet model, named ResNeSt, which outperforms EfficientNet in terms of the accuracy/latency trade-off and applies channel-wise attention across different network branches to leverage the complementary strengths of both feature-map attention and multi-path representation.
This paper proposes a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings, and presents a neural network based on LKA, namely Visual Attention Network (VAN).
This work improves the original Pyramid Vision Transformer (PVT v1) by adding three designs: a linear complexity attention layer, an overlapping patch embedding, and a convolutional feed-forward network to reduce the computational complexity of PVT v1 to linearity and provide significant improvements on fundamental vision tasks.
State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.
Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic).
A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.
This work endsow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone, and shows it is a robust and accurate baseline for both tasks.
For the first time, a bottom-up approach could deliver state-of-the-art results on panoptic segmentation, and performs on par with several top-down approaches on the challenging COCO dataset.
Adding a benchmark result helps the community track progress.