3260 papers • 126 benchmarks • 313 datasets
This task relies on a single RGB image to infer the dense 3D voxelized semantic scene.
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-semantic-scene-completion-from-a-single-rgb-image-29
Use these libraries to find 3d-semantic-scene-completion-from-a-single-rgb-image-29 models and implementations
No subtasks available.
A new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object’s sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details is proposed.
The ablation studies demonstrate the method is robust to lower density inputs, and that it enables very high speed semantic completion at the coarsest level, and provides a great performance/speed trade-off for mobile-robotics applications.
A novel module called anisotropic convolution is proposed, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations are proposed.
A novel sparse LiDAR point cloud semantic segmentation framework assisted by learned contextual shape priors is proposed, which inherently improves SS optimization through fully end-to-end training.
Experiments show the MonoScene framework outperform the literature on all metries and datasets while hallucinating plausible scenery even beyond the camera field of view, and introduces a 3D context relation prior to enforce spatio-semantic consistency.
VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images that outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics.
OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction, achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features.
A novel paradigm termed Symphonies (Scene-from-Insts), that delves into the integration of instance queries to orchestrate 2D-to-3D reconstruction and 3D scene modeling, and fosters holistic scene comprehension by capturing context through the efficient fusion of instance queries.
A novel Normalized Device Coordinates scene completion network (NDC-Scene) is devised that directly extends the 2D feature map to a Normalized device coordinates space, rather than to the world space directly, through progressive restoration of the dimension of depth with deconvolution operations.
Adding a benchmark result helps the community track progress.