Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-5

Action Anticipation

3260 papers • 126 benchmarks • 313 datasets

Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in action-anticipation-15

Trend

Dataset

Best Model

Actions

EPIC-KITCHENS-100 (test)

EPIC-KITCHENS-100 (test)

EPIC-KITCHENS-55 (Seen test set (S1))

EPIC-KITCHENS-55 (Seen test set (S1))

Libraries

i

Use these libraries to find action-anticipation-15 models and implementations

Datasets

EPIC-KITCHENS-100

EPIC-KITCHENS-100

EGTEA

Assembly101

TVSeries

Ego4D

VIENA2

Subtasks

No subtasks available.

Most implemented papers

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Antonino Furnari, G. Farinella, S. Fidler, D. Damen, Hazel Doughty, E. Kazakos, D. Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray•Sat Apr 07 2018

This paper introduces EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments, and had the participants narrate their own videos (after recording), thus reflecting true intention, and crowd-sourced ground-truths based on these.

1213

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

EPIC-KITCHENS-55 (Unseen test set (S2)

EPIC-KITCHENS-55 (Unseen test set (S2)

EPIC-KITCHENS-100

EPIC-KITCHENS-100

Assembly101

Assembly101

EGTEA

EGTEA

OST

OST

CP2A dataset

0

What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention

Antonino Furnari, G. Farinella•Tue May 21 2019

This work tackles the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to summarize the past and formulate predictions about the future using a novel Modality ATTention mechanism which learns to weigh modalities in an adaptive fashion.

195 0

HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN

Paritosh Parmar, B. Morris•Tue Sep 07 2021

Thorough experimental evaluation has shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks and can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth.

7 0

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

Antonino Furnari, G. Farinella•Sun May 03 2020

Rolling-Unrolling LSTM is contributed, a learning architecture to anticipate actions form egocentric videos that achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition.

167 0

Temporal Aggregate Representations for Long-Range Video Understanding

Fadime Sener, Angela Yao, Dipika Singhania•Sun May 31 2020

This work addresses questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework and shows that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention.

169 0

Encouraging LSTMs to Anticipate Actions Very Early

M. Salzmann, L. Petersson, F. Saleh, Basura Fernando, Lars Andersson, Mohammad Sadegh Ali Akbarian•Mon Mar 20 2017

A new action anticipation method that achieves high prediction accuracy even in the presence of a very small percentage of a video sequence, and develops a multi-stage LSTM architecture that leverages context-aware and action-aware features, and introduces a novel loss function that encourages the model to predict the correct class as early as possible.

177 0

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

J. Gao, Zhenheng Yang, R. Nevatia•Fri Jun 30 2017

A Reinforced Encoder-Decoder (RED) network that takes multiple history representations as input and learns to anticipate a sequence of future representations, designed to encourage the system to make correct predictions as early as possible.

212 0

Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs

John K. Tsotsos, Amir Rasouli, Iuliia Kotseruba•Tue May 12 2020

This work proposes a solution for the problem of pedestrian action anticipation at the point of crossing using a novel stacked RNN architecture in which information collected from various sources, both scene dynamics and visual features, is gradually fused into the network at different levels of processing.

144 0

Adding a benchmark result helps the community track progress.

Action Anticipation | State-of-the-Art