Introduced in TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+ contains 310.8K bounding boxes, linking depicted objects to visual concepts in questions and answers.
Source: TVQA+: Spatio-Temporal Grounding for Video Question Answering Image Source: https://github.com/jayleicn/TVQAplus
Unknown