Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision

3D Object Detection

3260 papers • 126 benchmarks • 313 datasets

3D Object Detection is a task in computer vision where the goal is to identify and locate objects in a 3D environment based on their shape, location, and orientation. It involves detecting the presence of objects and determining their location in the 3D space in real-time. This task is crucial for applications such as autonomous vehicles, robotics, and augmented reality. ( Image credit: AVOD )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in 3d-object-detection

Trend

Dataset

Best Model

Actions

nuScenes

nuScenes

SUN-RGBD val

SUN-RGBD val

ScanNetV2

ScanNetV2

Libraries

i

Use these libraries to find 3d-object-detection models and implementations

open-mmlab/mmdetection3d

14 papers 5,463

Datasets

KITTI

nuScenes

ScanNet

NYUv2

S3DIS

SUN RGB-D

Subtasks

Monocular 3D Object Detection Robust 3D Object Detection Multiview Detection 3D Object Detection From Stereo Images 3D Object Detection From Stereo Images

Most implemented papers

YOLO9000: Better, Faster, Stronger

Ali Farhadi, Joseph Redmon•Sat Dec 24 2016

YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.

17159

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

KITTI Cars Easy

KITTI Cars Easy

KITTI Cars Hard

KITTI Cars Hard

nuScenes Camera Only

nuScenes Camera Only

KITTI Cyclists Moderate

KITTI Cyclists Moderate

KITTI Pedestrians Moderate

KITTI Pedestrians Moderate

KITTI Cyclists Easy

KITTI Cyclists Easy

KITTI Cyclists Hard

KITTI Cyclists Hard

KITTI Cars Easy val

KITTI Cars Easy val

KITTI Cars Moderate val

KITTI Cars Moderate val

nuscenes Camera-Radar

nuscenes Camera-Radar

KITTI Cars Hard val

KITTI Cars Hard val

View-of-Delft (val)

View-of-Delft (val)

KITTI Pedestrians Easy

KITTI Pedestrians Easy

KITTI Pedestrians Hard

KITTI Pedestrians Hard

DAIR-V2X-I

DAIR-V2X-I

waymo vehicle

waymo vehicle

Rope3D

Rope3D

Waymo Open Dataset

Waymo Open Dataset

SUN-RGBD

SUN-RGBD

waymo cyclist

waymo cyclist

waymo pedestrian

waymo pedestrian

S3DIS

S3DIS

nuScenes LiDAR only

nuScenes LiDAR only

V2XSet

V2XSet

OPV2V

OPV2V

V2X-SIM

V2X-SIM

KITTI Pedestrian Easy val

KITTI Pedestrian Easy val

KITTI Pedestrian Moderate val

KITTI Pedestrian Moderate val

KITTI Pedestrian Hard val

KITTI Pedestrian Hard val

KITTI Cyclist Easy val

KITTI Cyclist Easy val

KITTI Cyclist Moderate val

KITTI Cyclist Moderate val

KITTI Cyclist Hard val

KITTI Cyclist Hard val

ARKitScenes

ARKitScenes

Aria Everyday Objects

Aria Everyday Objects

Aria Synthetic Environments

Aria Synthetic Environments

aiMotive Dataset

aiMotive Dataset

3D Object Detection on Argoverse2 Camera Only

3D Object Detection on Argoverse2 Camera Only

MultiScan

MultiScan

3RScan

3RScan

ScanNet++

ScanNet++

waymo all_ns

waymo all_ns

Argoverse2

Argoverse2

ONCE

ONCE

NYU Depth v2

NYU Depth v2

nuScenes-F

nuScenes-F

nuScenes-FB

nuScenes-FB

KITTI Pedestrian Hard

KITTI Pedestrian Hard

KITTI Cyclists Moderate val

KITTI Cyclists Moderate val

KITTI Pedestrians Moderate val

KITTI Pedestrians Moderate val

Dense Fog

Dense Fog

KITTI Pedestrian Moderate

KITTI Pedestrian Moderate

Heavy Snowfall

Heavy Snowfall

Light Snowfall

Light Snowfall

Clear Weather

Clear Weather

KITTI Pedestrian Easy

KITTI Pedestrian Easy

KITTI Pedestrian

KITTI Pedestrian

DAIR-V2X

DAIR-V2X

Cityscapes 3D

Cityscapes 3D

TruckScenes

TruckScenes

Argoverse

Argoverse

IRV2V

IRV2V

PaddlePaddle/Paddle3D

6 papers 585

open-mmlab/OpenPCDet

5 papers 4,797

DerrickXuNu/OpenCOOD

5 papers 688

5 papers 74

KangchengLiu/FAC_Foreground_Aware_C…

4 papers 41

KangchengLiu/RM3D

4 papers 29

isl-org/Open3D-ML

3 papers 1,923

Pointcept/Pointcept

3 papers 1,802

pjlab-adg/3dtrans

3 papers 546

PaddlePaddle/PaddleDetection

2 papers 13,018

Waymo Open Dataset

Waymo Open Dataset

Argoverse

Argoverse 2

Hypersim

Robust BEV Detection

0

Frustum PointNets for 3D Object Detection from RGB-D Data

C. Qi, L. Guibas, Hao Su, W. Liu, Chenxia Wu•Tue Nov 21 2017

This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.

2466 0

PointPillars: Fast Encoders for Object Detection From Point Clouds

Holger Caesar, Alex H. Lang, Sourabh Vora, Oscar Beijbom, Lubing Zhou, Jiong Yang•Thu Dec 13 2018

benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.

4178 0

nuScenes: A Multimodal Dataset for Autonomous Driving

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yuxin Pan, G. Baldan, Oscar Beijbom•Mon Mar 25 2019

Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online.

7012 0

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud

Hongsheng Li, Xiaogang Wang, Shaoshuai Shi•Mon Dec 10 2018

Extensive experiments on the 3D detection benchmark of KITTI dataset show that the proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.

2764 0

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Xiaogang Wang, Hongsheng Li, Jianping Shi, Li Jiang, Shaoshuai Shi, Chaoxu Guo, Zhe Wang•Mon Dec 30 2019

The proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins and deeply integrates both 3D voxel Convolutional Neural Network and PointNet-based set abstraction to learn more discriminative point cloud features.

2129 0

Center-based 3D Object Detection and Tracking

Philipp Krähenbühl, Xingyi Zhou, Tianwei Yin•Thu Jun 18 2020

The framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity, and refines these estimates using additional point features on the object.

2043 0

Deep Hough Voting for 3D Object Detection in Point Clouds

C. Qi, L. Guibas, Kaiming He, O. Litany•Sat Apr 20 2019

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

1437 0

Objects as Points

Philipp Krähenbühl, Dequan Wang, Xingyi Zhou•Mon Apr 15 2019

The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors and performs competitively with sophisticated multi-stage methods and runs in real-time.

3598 0

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Yin Zhou, Oncel Tuzel•Thu Nov 16 2017

VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.

4295 0

Adding a benchmark result helps the community track progress.

3D Object Detection | State-of-the-Art