3260 papers • 126 benchmarks • 313 datasets
6D pose estimation of hand and object
(Image credit: Papersgraph)
These leaderboards are used to track progress in hand-object-pose-15
Use these libraries to find hand-object-pose-15 models and implementations
No subtasks available.
We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. Our motivation is the current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras and jointly optimize the 3D hand and object poses over all the frames simultaneously. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created HO-3D, the first markerless dataset of color images with 3D annotations for both the hand and object. This dataset is currently made of 77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, we develop a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions and show it generalizes to objects not seen in the dataset.
This work presents an end-to-end learnable model that exploits a novel contact loss that favors phys- ically plausible hand-object constellations, and improves grasp quality metrics over baselines, using RGB images as input.
This work proposes ArtiBoost, a lightweight online data enhancement method that performs data exploration and synthesis within a learning pipeline, and those synthetic data are blended into real-world source data for training.
A depth-based framework for robust pose estimation and short response times for adaptive hands, which demonstrates the accuracy and computational efficiency of the framework when applied on challenging, highly-occluded scenarios for different object types.
A lightweight model called HOPE-Net is proposed which jointly estimates hand and object pose in 2D and 3D in real-time and could be applied to other 3D landmark detection problems, where it is possible to first predict the 2D keypoints and then transform them to 3D.
This work proposes a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image by relying on a CNN to first localize joints as 2D keypoints, and on self-attention between the CNN features at these keypoints to associate them with the corresponding hand joint.
This work proposes a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions and shows that it can be applied to datasets with varying levels of difficulty for which training data is unavailable.
This work aims to improve SDF models using priors provided by parametric representations and proposes a joint learning framework that disentangles the pose and the shape and shows that such aligned SDFs better focus on reconstructing shape details and improve reconstruction accuracy both for hands and objects.
TEG-Track is introduced, a tactile-enhanced 6D pose tracking system that can track previously unseen objects held in hand that consistently enhances state-of-the-art generalizable 6D pose trackers in synthetic and real-world scenarios.
A novel benchmark for object group distribution shifts in hand and object pose regression for object grasping is proposed and the hypothesis that meta-learning a baseline pose regression neural network can adapt to these shifts and generalize better to unknown objects is tested.
Adding a benchmark result helps the community track progress.