computer-vision-1

Action Recognition

3260 papers • 126 benchmarks • 313 datasets

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in action-recognition-1

Trend

Dataset

Best Model

Actions

UCF101

Something-Something V2

HMDB-51

Libraries

i

Use these libraries to find action-recognition-1 models and implementations

open-mmlab/mmaction2

21 papers 1,598

Datasets

Subtasks

Self-Supervised Action Recognition 3D Action Recognition Fine-grained Action Recognition Action Triplet Recognition Action Triplet Recognition

Most implemented papers

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

Y. Satoh, Hirokatsu Kataoka, Kensho Hara•Sun Nov 26 2017

Whether current video datasets have sufficient data for training very deep convolutional neural networks with spatio-temporal three-dimensional (3D) kernels is determined and it is believed that using deep 3D CNNs together with Kinetics will retrace the successful history of 2DCNNs and ImageNet, and stimulate advances in computer vision for videos.

2142

Content

HMDB-51

Something-Something V1

EPIC-KITCHENS-100

AVA v2.2

AVA v2.1

NTU RGB+D

Jester

Sports-1M

THUMOS’14

NTU RGB+D 120

Diving-48

HACS

ActivityNet

HAA500

miniSports

Volleyball

Real Life Violence Situations Dataset

IRD

ICVL-4

VIRAT Ground 2.0

ActionNet-VE

UTD-MHAD

EgoGesture

EPIC-KITCHENS-55

MECCANO

Win-Fail Action Understanding

MTL-AQA

UCF 101

yjxiong/caffe

4 papers 540

open-mmlab/mmaction

3 papers 1,718

bryanyzhu/two-stream-pytorch

3 papers 462

facebookresearch/pytorchvideo

2 papers 2,200

MichiganCOG/M-PACT

2 papers 93

alexanderrichard/squirrel

2 papers 52

KTH

Sports-1M

Something-Something V2

HowTo100M

Weakly-Supervised Action Recognition

Action Recognition In Still Images

Open Set Action Recognition

0

Paper Graph

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Andrew Zisserman, João Carreira•Sun May 21 2017

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

9084 0

Paper Graph

Non-local Neural Networks

A. Gupta, Kaiming He, Ross B. Girshick, X. Wang•Mon Nov 20 2017

This paper presents non-local operations as a generic family of building blocks for capturing long-range dependencies in computer vision and improves object detection/segmentation and pose estimation on the COCO suite of tasks.

9700 0

Paper Graph

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Yuanjun Xiong, Dahua Lin, Sijie Yan•Mon Jan 22 2018

A novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.

4837 0

Paper Graph

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Y. Qiao, Xiaoou Tang, Limin Wang, Yuanjun Xiong, Zhe Wang, Dahua Lin, L. Gool•Mon Aug 01 2016

Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Our first contribution is temporal segment network (TSN), a novel framework for video-based action recognition. which is based on the idea of long-range temporal structure modeling. It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video. The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network. Our approach obtains the state-the-of-art performance on the datasets of HMDB51 (\( 69.4\,\% \)) and UCF101 (\( 94.2\,\% \)). We also visualize the learned ConvNet models, which qualitatively demonstrates the effectiveness of temporal segment network and the proposed good practices (Models and code at https://github.com/yjxiong/temporal-segment-networks).

4097 0

Paper Graph

CIDEr: Consensus-based image description evaluation

C. L. Zitnick, Devi Parikh, Ramakrishna Vedantam•Wed Nov 19 2014

A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.

5058 0

Paper Graph

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Yann LeCun, Du Tran, L. Torresani, Manohar Paluri, Heng Wang, Jamie Ray•Wed Nov 29 2017

A new spatiotemporal convolutional block "R(2+1)D" is designed which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.

3425 0

Paper Graph

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

Shilei Wen, Xiao Liu, Errui Ding, Xin Li, Tianwei Lin•Mon Jul 22 2019

This work proposes an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously, and can achieve state-of-the-art temporal action detection performance.

687 0

Paper Graph

Adding a benchmark result helps the community track progress.