3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in action-understanding-2
Use these libraries to find action-understanding-2 models and implementations
No subtasks available.
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos, and two novel question-answering tasks are proposed to evaluate models' fine-grade action understanding abilities.
This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features and deep-learned features, and achieves superior performance to the state of the art on these datasets.
A detailed 2D-3D joint representation learning method for human-Object Interaction detection and a new benchmark named Ambiguous-HOI consisting of hard ambiguous images are proposed to better evaluate the 2D ambiguity processing capacity of models.
The LEMMA dataset is introduced to provide a single home to address missing dimensions of daily human activities, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents with meticulously designed settings.
It is sought to establish that online/causal representations can achieve similar performance to that of offline three dimensional convolutional neural networks (CNNs) on various tasks, including action recognition, temporal action segmentation and early prediction.
This paper introduces an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans and outperforms state-of-the-art action segmentation models on three challenging datasets.
This work introduces a first of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,’ “Trick Shots” & “Party Games” and systematically analyzes the characteristics of the task/dataset to determine its suitability to serve as a video understanding problem benchmark.
HOMAGE is introduced: a multi-view action dataset with multiple modalities and view-points supplemented with hierarchical activity and atomic action labels together with dense scene composition labels and Cooperative Compositional Action Understanding (CCAU), a cooperative learning framework for hierarchical action recognition that is aware of compositional action elements.
PIANO is presented, the first parametric bone model of human hands from MRI data, which is biologically correct, simple to animate, and differentiable, achieving more anatomically precise modeling of the inner hand kinematic structure in a data-driven manner than the traditional hand models based on the outer surface only.
Adding a benchmark result helps the community track progress.