computer-vision-7

Vehicle Pose Estimation

3260 papers • 126 benchmarks • 313 datasets

Image Credit: GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision, ECCV'20

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in vehicle-pose-estimation-7

Trend

Dataset

Best Model

Actions

KITTI Cars Hard

KITTI

CarFusion

Libraries

i

Use these libraries to find vehicle-pose-estimation-7 models and implementations

Datasets

Subtasks

No subtasks available.

Most implemented papers

3D Bounding Box Estimation Using Deep Learning and Geometry

J. Kosecka, A. Mousavian, John Flynn, Dragomir Anguelov•Wed Nov 30 2016

Although conceptually simple, this method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.

1065

Content

ApolloCar3D

0

Paper Graph

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

Xiaoming Liu, Garrick Brazil•Fri Jul 12 2019

M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

521 0

Paper Graph

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

P. Luo, Jianping Shi, Hongwei Yi, Mingyu Ding, Yuqi Huo, Zhiwu Lu, Zhe Wang•Mon Dec 09 2019

D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps.

355 0

Paper Graph

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

Peixuan Li, Huaici Zhao, Pengfei Liu, Feidao Cao•Thu Jan 09 2020

This work proposes an efficient and accurate monocular 3D detection framework in single shot that achieves state-of-the-art performance on the KITTI benchmark and predicts the nine perspective keypoints of a 3D bounding box in image space, and utilizes the geometric relationship of 3D and 2D perspectives.

352 0

Paper Graph

Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection

S. Savarese, Yu Xiang, Yuanqing Lin, Wongun Choi•Fri Apr 15 2016

A novel region proposal network that uses subcategory information to guide the proposal generating process, and a new detection network for joint detection and subcategory classification that achieves state of-the-art performance on both detection and pose estimation on commonly used benchmarks are introduced.

289 0

Paper Graph

Kinematic 3D Object Detection in Monocular Video

Xiaoming Liu, B. Schiele, Gerard Pons-Moll, Garrick Brazil•Sat Jul 18 2020

This work proposes a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization and achieves state-of-the-art performance on monocular 3Dobject detection and the Bird's Eye View tasks within the KITTI self-driving dataset.

200 0

Paper Graph

BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance

A. Herout, Jakub Sochor, Jakub Špaňhel•Wed Mar 01 2017

The proposed method significantly improves CNN classification accuracy and outperforms the state-of-the-art methods for fine-grained recognition.

135 0

Paper Graph

Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction

Steven L. Waslander, Jason Ku, Alex D. Pon•Mon Apr 01 2019

MonopolyPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction, is presented and a novel projection alignment loss is devised to jointly optimize these tasks in the neural network to improve 3D localization accuracy.

266 0

Paper Graph

Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks

Minh Vo, S. Narasimhan, Dinesh Reddy Narapureddy•Fri May 31 2019

We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss.

59 0

Paper Graph

6D-VNet: End-To-End 6DoF Vehicle Pose Estimation From Monocular RGB Images

Xia Li, Di Wu, Zhaoyong Zhuang, Canqun Xiang, Wenbin Zou•Fri May 31 2019

We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenario. Our approach efficiently detects traffic participants in a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The method, called 6D-VNet, extends Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation. The proposed 6D-VNet is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving scenarios. Additionally, we incorporate the mutual information between traffic participants via a modified non-local block. As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. Our 6D-VNet reaches the 1 st place in ApolloScape challenge 3D Car Instance task. Code has been made available at: https://github.com/stevenwudi/6DVNET .

47 0

Paper Graph

Adding a benchmark result helps the community track progress.