Talking With Hands 16.2M Dataset

This is a 16.2-million frame (50-hour) multimodal dataset of two-person face-to-face spontaneous conversations. This dataset features synchronized body and finger motion as well as audio data. It represents the largest motion capture and audio dataset of natural conversations to date. The statistical analysis verifies strong intraperson and interperson covariance of arm, hand, and speech features, potentially enabling new directions on data-driven social behavior analysis, prediction, and synthesis.

Talking With Hands 16.2M

Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis

ShapeNet

BP4D

MSRC-12