Human-centric Spatio-Temporal Video Grounding
Introduced in Human-centric Spatio-Temporal Video Grounding With Visual Transformers2020
The newly proposed HC-STVG task aims to localize the target person spatio-temporally in an untrimmed video. For this task, we collect a new benchmark dataset, which has spatio temporal annotations related to the target persons in complex multi-person scenes, together with full interaction and rich action information.