Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision

3D Object Detection From Monocular Images

3260 papers • 126 benchmarks • 313 datasets

This is the task of detecting 3D objects from monocular images (as opposed to LiDAR based counterparts). It is usually associated with autonomous driving based tasks. ( Image credit: Orthographic Feature Transform for Monocular 3D Object Detection )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in 3d-object-detection-from-monocular-images

Trend

Dataset

Best Model

Actions

KITTI-360

KITTI-360

Waymo Open Dataset

Waymo Open Dataset

nuScenes Cars

Libraries

i

Use these libraries to find 3d-object-detection-from-monocular-images models and implementations

Datasets

Waymo Open Dataset

Waymo Open Dataset

KITTI-360

3D-POP

Subtasks

No subtasks available.

Most implemented papers

Deep Hough Voting for 3D Object Detection in Point Clouds

C. Qi, L. Guibas, Kaiming He, O. Litany•Sat Apr 20 2019

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

1437

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

nuScenes Cars

0

Delving into Localization Errors for Monocular 3D Object Detection

Wanli Ouyang, Dan Xu, Shuai Yi, Haojie Li, Xinzhu Ma, Yinmin Zhang, Dongzhan Zhou•Mon Mar 29 2021

This work quantifies the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection, and investigates the underlying reasons behind localization errors.

276 0

Orthographic Feature Transform for Monocular 3D Object Detection

Alex Kendall, R. Cipolla, Thomas Roddick•Mon Nov 19 2018

The orthographic feature transform is introduced, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space and allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.

402 0

ROCA: Robust CAD Model Retrieval and Alignment from a Single Image

Angela Dai, M. Nießner, Can Gümeli•Thu Dec 02 2021

Experiments on challenging, real-world imagery from ScanNet show that ROCA signif-icantly improves on state of the art, from 9.5% to 17.6% in retrieval-aware CAD alignment accuracy.

62 0

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Winston H. Hsu, Hung-Ting Su, Kuan-Chih Huang, Tsung-Han Wu•Sun Mar 20 2022

This work proposes MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection that outperforms previous state-of-the-art monocular-based methods and achieves real-time detection.

217 0

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

Hongsheng Li, Renrui Zhang, Peng Gao, Ziyu Guo, Y. Qiao, Yiwen Tang, Tai Wang, Ziteng Cui, Xuan Xu•Wed Mar 23 2022

This paper introduces the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR, and modify the vanilla transformer to be depth-aware and guide the whole detection process by contextual depth cues.

152 0

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

Xiaoming Liu, Garrick Brazil•Fri Jul 12 2019

M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

521 0

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Armin Parchami, Abhinav Kumar, Garrick Brazil, E. Corona, Xiaoming Liu•Wed Jul 20 2022

This paper proposes Depth EquiVarIAnt NeTwork (DEVIANT), a neural network built with existing scale equivariant steerable blocks that achieves state-of-the-art monocular 3D detection results on KITTI and Waymo datasets in the image-only category and performs competitively to methods using extra information.

74 0

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Wanli Ouyang, Junjie Yan, Lei Yang, Xinzhu Ma, Yan Lu, Tianzhu Zhang, Yating Liu, Q. Chu•Wed Jul 28 2021

A GUP Net is proposed to tackle the error amplification problem at both inference and training stages and can infer more reliable object depth than existing methods and outperforms the state-of-the-art image-based monocular 3D detectors.

281 0

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

Xiaoming Liu, Abhinav Kumar, Garrick Brazil•Tue Mar 30 2021

GrooMeD-NMS addresses the mismatch between training and inference pipelines and, therefore, forces the network to select the best 3D box in a differentiable manner.

110 0

Adding a benchmark result helps the community track progress.

3D Object Detection From Monocular Images | State-of-the-Art