The Kinetics Human Action Video Dataset

Published in

arXiv.org(2017)

External Links:

Generate Graph

TL;DR

The dataset is described, the statistics are described, how it was collected, and some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset are given.

Abstract

We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset. We also carry out a preliminary analysis of whether imbalance in the dataset leads to bias in the classifiers.

Authors

Andrew Zisserman

62 papers

K. Simonyan

25 papers

Tim Green

4 papers

References28 items

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Semantics derived automatically from language corpora contain human-like biases

Convolutional Two-Stream Network Fusion for Video Action Recognition

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Deep Residual Learning for Image Recognition

The Kinetics Human Action Video Dataset

Published in

arXiv.org(2017)

External Links:

Generate Graph

TL;DR

Abstract

Authors

Andrew Zisserman

62 papers

K. Simonyan

25 papers

Tim Green

4 papers

References28 items

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Semantics derived automatically from language corpora contain human-like biases

Convolutional Two-Stream Network Fusion for Video Action Recognition

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Deep Residual Learning for Image Recognition

Mustafa Suleyman

5 papers

Fabio Viola

3 papers

Chloe Hillier

4 papers

João Carreira

10 papers

Sudheendra Vijayanarasimhan

3 papers

Brian Zhang

1 papers

Apostol Natsev

2 papers

Actions ~ Transformations

ActivityNet: A large-scale video benchmark for human activity understanding

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Learning Spatiotemporal Features with 3D Convolutional Networks

Long-term recurrent convolutional networks for visual recognition and description

ImageNet Large Scale Visual Recognition Challenge

The Pascal Visual Object Classes Challenge: A Retrospective

Large-Scale Video Classification with Convolutional Neural Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

HMDB: A large video database for human motion recognition

Unbiased look at dataset bias

Convolutional Learning of Spatio-temporal Features

Learning realistic human actions from movies

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Caltech-256 Object Category Dataset

Class 1 Class 2 confusion ‘riding mule

Beyond short snip-pets: Deep networks for video classiﬁcation

Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories

Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Convolutional Neural Networks for Human Action Recognition

the list of classes included in the video dataset. The number of clips for each

given by the number in brackets following each class 1

Field of Study

Computer Science

Journal Information

Name

ArXiv

Volume

abs/1705.06950

Venue Information

Name

arXiv.org

Type

URL

https://arxiv.org

Alternate Names

ArXiv