3260 papers • 126 benchmarks • 313 datasets
Generating textual description for human motion.
(Image credit: Papersgraph)
These leaderboards are used to track progress in motion-captioning-7
Use these libraries to find motion-captioning-7 models and implementations
No subtasks available.
This work proposes MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks that achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.
This paper aims to explore the generation of 3D human full-body motions from texts, as well as its reciprocal task, shorthanded for text2motion and motion2text, respectively, and proposes the use of motion token, a discrete and compact motion representation.
A novel architecture design that enhances text generation quality by emphasizing interpretability through spatio-temporal and adaptive attention mechanisms is introduced and methods for guiding attention during training, emphasizing relevant skeleton areas over time and distinguishing motion-related words are proposed.
It is found that both contributions to the attention mechanism and the encoder architecture additively improve the quality of generated text (BLEU and semantic equivalence), but also of synchronization.
Adding a benchmark result helps the community track progress.