computer-vision-5

Spatio-Temporal Action Localization

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in spatio-temporal-action-localization-5

Trend

Dataset

Best Model

Actions

AVA-Kinetics

Libraries

i

Use these libraries to find spatio-temporal-action-localization-5 models and implementations

Datasets

AVA

Subtasks

No subtasks available.

Most implemented papers

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Hongsheng Li, Jing Shao, Junting Pan, Siyu Chen, Zheng Shou•Sat Jun 13 2020

An Actor-Context-Actor Relation Network (ACAR-Net) is designed which builds upon a novel High-order Relation Reasoning Operator and an Actor- Context Feature Bank to enable indirect relation reasoning for spatio-temporal action localization.

172

Content

LIRIS human activities dataset

0

Paper Graph

Action Tubelet Detector for Spatio-Temporal Action Localization

Philippe Weinzaepfel, V. Ferrari, C. Schmid, Vicky Kalogeiton•Wed May 03 2017

The proposed ACtion Tubelet detector (ACT-detector) takes as input a sequence of frames and outputs tubelets, i.e., sequences of bounding boxes with associated scores, based on anchor cuboids that outperforms the state-of-the-art methods for frame-mAP and video-m AP on the J-HMDB and UCF-101 datasets, in particular at high overlap thresholds.

340 0

Paper Graph

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Hongsheng Li, Jing Shao, Hao Shao, Junting Pan, Guanglu Song, Siyu Chen, Ziyi Lin, Yu Liu, Manyuan Zhang•Mon Jun 15 2020

This technical report introduces the winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020, based on Actor-Context-Actor Relation Network, which outperforms other entries by a large margin.

4 0

Paper Graph

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

T. Brox, M. Zolfaghari, Gabriel L. Oliveira, N. Sedaghat•Sun Apr 02 2017

This paper proposes a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images and introduces a Markov chain model which adds cues successively.

224 0

Paper Graph

Actor-Centric Relation Network

K. Murphy, Chen Sun, Abhinav Shrivastava, R. Sukthankar, Carl Vondrick, C. Schmid•Fri Jul 27 2018

This work model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions and shows that ACRN outperforms alternative approaches which capture relation information.

232 0

Paper Graph

ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos

Jiashi Feng, Roger Zimmermann, Meng-Jiun Chiou, Chun-Yu Liao, Li-Wei Wang•Mon May 24 2021

This paper shows that a naive temporal-aware variant of a common action detection baseline does not work on video-based HOIs due to a feature-inconsistency issue, and proposes a simple yet effective architecture utilizing temporal information such as human and object trajectories, correctly-localized visual features, and spatial-temporal masking pose features.

31 0

Paper Graph

KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization

R. Rodrigo, P. Jayasekara, Kalana Abeywardena, Shechem Sumanthiran, S. Jayasundara, Sachira Karunasena•Thu Nov 04 2021

This work proposes utilizing fast and efficient key-point based bounding box prediction to spatially localize actions and introduces a tube-linking algorithm that maintains the continuity of action tubes temporally in the presence of occlusions, eliminating the need for a two-stream architecture.

1 0

Paper Graph

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Boqing Gong, Ting Liu, Liangzhe Yuan, Yin Cui•Wed Dec 08 2021

This paper designs a region-based pretext task which requires the model to transform instance representations from one view to another, guided by context features, and introduces a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations.

18 0

Paper Graph

Adding a benchmark result helps the community track progress.