3260 papers • 126 benchmarks • 313 datasets
Hand pose estimation is the task of finding the joints of the hand from an image or set of video frames. ( Image credit: Pose-REN )
(Image credit: Papersgraph)
These leaderboards are used to track progress in hand-pose-estimation-14
Use these libraries to find hand-pose-estimation-14 models and implementations
A deep network is proposed that learns a network-implicit 3D articulation prior that yields good estimates of the 3D pose from regular RGB images, and a large scale 3D hand pose dataset based on synthetic hand models is introduced.
This work develops a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors, and makes several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training.
This model is designed as a 3D CNN that provides accurate estimates while running in real-time and outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based3D hand pose estimation challenge.
We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. Our motivation is the current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras and jointly optimize the 3D hand and object poses over all the frames simultaneously. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created HO-3D, the first markerless dataset of color images with 3D annotations for both the hand and object. This dataset is currently made of 77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, we develop a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions and show it generalizes to objects not seen in the dataset.
This work introduces a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct3D hand mesh reconstruction loss.
With simple improvements: adding ResNet layers, data augmentation, and better initial hand localization, DeepPrior achieves better or similar performance than more sophisticated recent methods on the three main benchmarks (NYU, ICVL, MSRA) while keeping the simplicity of the original method.
A method to learn representations, which are very specific for articulated poses, without the need for labeled training data, that consistently surpasses the performance of its fully supervised counterpart, while reducing the amount of needed labeled samples by at least one order of magnitude.
Qualitative experiments show that the HAMR framework is capable of recovering appealing 3D hand mesh even in the presence of severe occlusions, and outperforms the state-of-the-art methods for both 2D and3D hand pose estimation from a monocular RGB image on several benchmark datasets.
This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.
Adding a benchmark result helps the community track progress.