3260 papers • 126 benchmarks • 313 datasets
Scene segmentation is the task of splitting a scene into its various object components. Image adapted from Temporally coherent 4D reconstruction of complex dynamic scenes.
(Image credit: Papersgraph)
These leaderboards are used to track progress in scene-segmentation-10
Use these libraries to find scene-segmentation-10 models and implementations
This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
This work proposes SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score, to extract local and global features and relate both representations by introducing the local-global attention mechanism.
A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.
New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.
KPConv is a new design of point convolution, i.e. that operates on point clouds without any intermediate representation, that outperform state-of-the-art classification and segmentation approaches on several datasets.
This work introduces a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models.
This paper proposes PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to largely reduce the irregular data access and improve the locality.
Adding a benchmark result helps the community track progress.