1
Deep Height Decoupling for Precise Vision-Based 3D Occupancy Prediction
2
Instance-Aware Monocular 3D Semantic Scene Completion
3
Not All Voxels are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
4
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
5
H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion
6
Tri-Perspective view Decomposition for Geometry-Aware Depth Completion
7
MonoOcc: Digging into Monocular Semantic Occupancy Prediction
8
Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network
9
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
10
Aggregating Feature Point Cloud for Depth Completion
11
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction
12
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
13
BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo
14
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
15
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
16
DETRs Beat YOLOs on Real-time Object Detection
17
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
18
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
19
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
20
OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion
21
LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR
22
VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
23
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
24
Planning-oriented Autonomous Driving
25
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
26
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
27
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
28
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
29
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
30
Neighborhood Attention Transformer
31
M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation
32
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
33
PETR: Position Embedding Transformation for Multi-View 3D Object Detection
34
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
35
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
36
MonoScene: Monocular 3D Semantic Scene Completion
37
Masked Autoencoders Are Scalable Vision Learners
38
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
39
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D
40
MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching
41
RigNet: Repetitive Image Guided Network for Depth Completion
42
FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras
43
S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds
44
Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion
45
AdaBins: Depth Estimation Using Adaptive Bins
46
Deformable DETR: Deformable Transformers for End-to-End Object Detection
47
LMSCNet: Lightweight Multiscale 3D Semantic Completion
48
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
49
End-to-End Object Detection with Transformers
50
Anisotropic Convolutional Networks for 3D Semantic Scene Completion
51
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
52
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
53
Orthographic Feature Transform for Monocular 3D Object Detection
54
Efficient Semantic Scene Completion Network with Spatial Group Convolution
55
Decoupled Weight Decay Regularization
56
Feature Pyramid Networks for Object Detection
57
Semantic Scene Completion from a Single Depth Image
58
Deep Residual Learning for Image Recognition
59
Are we ready for autonomous driving? The KITTI vision benchmark suite
60
StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion
61
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
62
Depthssc: Depth-spatial alignment and dynamic voxel resolution for monocular 3d semantic scene completion
63
A.6 Limitations While CGFormer exhibits strong performance on benchmarks
64
If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully