3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-multi-person-mesh-recovery
Use these libraries to find 3d-multi-person-mesh-recovery models and implementations
No subtasks available.
ROMP is the first real-time implementation of monocular multi-person 3D mesh regression, and achieves superior performance on the challenging multi- person benchmarks, including 3DPW and CMU Panoptic.
ExPose estimates expressive 3D humans more accurately than existing optimization methods at a small fraction of the computational cost by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image.
This work introduces AGORA, a synthetic dataset with high realism and highly accurate ground truth, and evaluates existing state-of-the-art methods for 3D human pose estimation on this dataset, finding that most methods perform poorly on images of children.
This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.
This work devises a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor, and is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement.
This work designs a one-stage pipeline for expressive whole-body mesh recovery, named OSX, without separate networks for each part, and designs a Component Aware Transformer (CAT) composed of a global body encoder and a local face/hand decoder.
A novel hypergraph relational reasoning network is proposed to formulate the complex and high-order relation correlations among individuals and groups in the crowd to produce accurate absolute body poses and shapes in large-scale crowded scenes.
This work investigates scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with upto 4.5M instances from diverse data sources.
This work presents Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image, and introduces CUFFS, the Close-Up Frames of Full-Body Subjects dataset, containing humans close to the camera with diverse hand poses.
PIXIE is introduced, which produces animatable, whole-body 3D avatars with realistic facial detail, from a single image and is shown to be more accurate whole-shape and detailed face shape than the state of the art.
Adding a benchmark result helps the community track progress.