3260 papers • 126 benchmarks • 313 datasets
Human pose forecasting is the task of detecting and predicting future human poses. ( Image credit: EgoPose )
(Image credit: Papersgraph)
These leaderboards are used to track progress in human-pose-forecasting-7
Use these libraries to find human-pose-forecasting-7 models and implementations
No subtasks available.
It is shown that, surprisingly, state of the art performance can be achieved by a simple baseline that does not attempt to model motion at all, and a simple and scalable RNN architecture is proposed that obtains state-of-the-art performance on human motion prediction.
A simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints, and design a new graph convolutional network to learn graph connectivity automatically.
A novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which the model learns a probability density function of future human poses conditioned on previous poses.
It is shown that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with this variant of the variational autoencoder model with a Gaussian mixture as a prior distribution.
This paper develops a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable and shows improvement over the state-of-the-art with a large margin.
This work presents a novel approach to human motion modeling based on convolutional neural networks (CNN), which is able to capture both invariant and dynamic information of human motion, which results in more accurate predictions.
This work addresses challenges in a Gaussian Latent Variable model for sequence prediction with a "Best of Many" sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data.
This work exploits human pose detectors as a free source of supervision and breaks the video forecasting problem into two discrete steps, and uses the structured space of pose as an intermediate representation to sidestep the problems that GANs have in generating video pixels directly.
This paper proposes a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution learned by a deep generative model, which incorporates a Gumbel-Softmax coefficient matrix sampling method and an aggressive diversity promoting hinge loss function.
This work trains networks to learn residual motion between the current and future frames, which avoids learning motion-irrelevant details and proposes a two-stage generation framework where videos are generated from structures and then refined by temporal signals.
Adding a benchmark result helps the community track progress.