3260 papers • 126 benchmarks • 313 datasets
This task was introduced in "Semantic Scene Completion from a Single Depth Image" (https://arxiv.org/abs/1611.08974) at CVPR 2017 . The target is to infer the dense 3D voxelized semantic scene from an incompleted 3D input (e.g. point cloud, depth map) and an optional RGB image. A recent summary can be found in the paper "3D Semantic Scene Completion: a Survey" (https://arxiv.org/abs/2103.07466), published at IJCV 2021.
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-semantic-scene-completion-3
Use these libraries to find 3d-semantic-scene-completion-3 models and implementations
The semantic scene completion network (SSCNet) is introduced, an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum.
The ablation studies demonstrate the method is robust to lower density inputs, and that it enables very high speed semantic completion at the coarsest level, and provides a great performance/speed trade-off for mobile-robotics applications.
A tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes is proposed, and it is demonstrated for the first time that using only camera inputs can achieve comparable performance with LiDAR-based methods on theLiDAR segmentation task on nuScenes.
A new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object’s sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details is proposed.
This work proposes a framework that ameliorates scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps and relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
A novel sparse LiDAR point cloud semantic segmentation framework assisted by learned contextual shape priors is proposed, which inherently improves SS optimization through fully end-to-end training.
Experiments show the MonoScene framework outperform the literature on all metries and datasets while hallucinating plausible scenery even beyond the camera field of view, and introduces a 3D context relation prior to enforce spatio-semantic consistency.
Experimental results show that regardless of inputing a single depth or RGB-D, the proposed disentangled framework can generate high-quality semantic scene completion, and outperforms state-of-the-art approaches on both synthetic and real datasets.
An efficient 3D sparse convolutional network is presented, which harnesses a multiscale architecture and a coarse-to-fine prediction strategy and achieves state-of-the-art performance and fast speed.
Adding a benchmark result helps the community track progress.