3260 papers • 126 benchmarks • 313 datasets
6D Pose Estimation using RGB refers to the task of determining the six degree-of-freedom (6D) pose of an object in 3D space based on RGB images. This involves estimating the position and orientation of an object in a scene, and is a fundamental problem in computer vision and robotics. In this task, the goal is to estimate the 6D pose of an object given an RGB image of the object and the scene, which can be used for tasks such as robotic manipulation, augmented reality, and scene reconstruction. ( Image credit: Segmentation-driven 6D Object Pose Estimation )
(Image credit: Papersgraph)
These leaderboards are used to track progress in 6d-pose-estimation-using-rgb-1
Use these libraries to find 6d-pose-estimation-using-rgb-1 models and implementations
No subtasks available.
This work introduces PoseCNN, a new Convolutional Neural Network for 6D object pose estimation, which is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input.
The proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.
The core of the approach is that a set of surface points on target object model are designated as keypoints and then train a keypoint detector (KPD) to localize them and a PnP algorithm can recover the 6D pose according to the 2D-3D relationship of keypoints.
A single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses is proposed, which substantially outperforms other recent CNN-based approaches when they are all used without postprocessing.
This paper introduces a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations and uses a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences.
A Pixel-wise Voting Network (PVNet) is introduced to regress pixel-wise vectors pointing to the keypoints and use these vectors to vote for keypoint locations, which creates a flexible representation for localizing occluded or truncated keypoints.
Although the top-performing methods rely on RGB-D image channels, strong results were achieved when only RGB channels were used at both training and test time, and the photorealism of PBR images was demonstrated effective despite the augmentation.
A novel pose estimation method that predicts the 3D coordinates of each object pixel without textured models, and a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose.
This work evaluates PointFusion on two distinctive datasets: the KITTI dataset that features driving scenes captured with a lidar-camera setup, and the SUN-RGBD dataset that captures indoor environments with RGB-D cameras.
We introduce HybridPose, a novel 6D object pose estimation approach. HybridPose utilizes a hybrid intermediate representation to express different geometric information in the input image, including keypoints, edge vectors, and symmetry correspondences. Compared to a unitary representation, our hybrid representation allows pose regression to exploit more and diverse features when one type of predicted representation is inaccurate (e.g., because of occlusion). Different intermediate representations used by HybridPose can all be predicted by the same simple neural network, and outliers in predicted intermediate representations are filtered by a robust regression module. Compared to state-of-the-art pose estimation approaches, HybridPose is comparable in running time and is significantly more accurate. For example, on Occlusion Linemod dataset, our method achieves a prediction speed of 30 fps with a mean ADD(-S) accuracy of 79.2%, representing a 67.4% improvement from the current state-of-the-art approach.
Adding a benchmark result helps the community track progress.