3260 papers • 126 benchmarks • 313 datasets
Image credit: GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision , ECCV'20
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-pose-estimation-7
Use these libraries to find 3d-pose-estimation-7 models and implementations
No subtasks available.
This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
An integrated approach is taken that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations.
BlazePose is presented, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices that uses both heatmaps and regression to keypoint coordinates.
The results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.
This work proposes a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera which uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene.
The OC20 dataset is developed, consisting of 1,281,121 Density Functional Theory relaxations across a wide swath of materials, surfaces, and adsorbates, and three state-of-the-art graph neural network models were applied to each of these tasks as baseline demonstrations for the community to build on.
This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views by using a multi-way matching algorithm to cluster the detected 2D poses in all views and proposes to combine geometric and appearance cues for cross-view matching.
This work trains a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error, and introduces an ensemble of pose predictors which are distill to a single "student" model.
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.
This work shows that a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the above issues and is differentiable, efficient, and compatible with any heat map based methods.
Adding a benchmark result helps the community track progress.