computer-vision-5

Egocentric Activity Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in egocentric-activity-recognition-5

Trend

Dataset

Best Model

Actions

EPIC-KITCHENS-55

EGTEA

Libraries

i

Use these libraries to find egocentric-activity-recognition-5 models and implementations

open-mmlab/mmaction2

2 papers 2,032

Datasets

Subtasks

No subtasks available.

Most implemented papers

Long-Term Feature Banks for Detailed Video Understanding

Kaiming He, Philipp Krähenbühl, Ross B. Girshick, Christoph Feichtenhofer, Haoqi Fan, Chao-Yuan Wu•Tue Dec 11 2018

This paper proposes a long-term feature bank—supportive information extracted over the entire span of a video—to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds.

504

Content

0

Paper Graph

Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Du Tran, Deepti Ghadiyaram, Matt Feiszli, Heng Wang, D. Mahajan, Xueting Yan•Wed May 01 2019

The primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets.

313 0

Paper Graph

What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention

Antonino Furnari, G. Farinella•Tue May 21 2019

This work tackles the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to summarize the past and formulate predictions about the future using a novel Modality ATTention mechanism which learns to weigh modalities in an adaptive fashion.

195 0

Paper Graph

Integrating Human Gaze into Attention for Egocentric Activity Recognition

Jason J. Corso, Kyle Min•Sat Nov 07 2020

This work introduces an effective probabilistic approach to integrate human gaze into spatiotemporal attention for egocentric activity recognition by representing the locations of gaze fixation points as structured discrete latent variables to model their uncertainties.

56 0

Paper Graph

First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations

Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, Tae-Kyun Kim•Fri Apr 07 2017

This work collects RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations, and sees clear benefits of using hand pose as a cue for action recognition compared to other data modalities.

529 0

Paper Graph

A correlation based feature representation for first-person activity recognition

R. Kahani, A. Talebpour, Ahmad Mahmoudi-Aznaveh•Tue Nov 14 2017

The proposed method is appropriate for the representation of high-dimensional features such as those extracted from convolutional neural networks (CNNs) and results in highly discriminative features which can be linearly classified.

12 0

Paper Graph

Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

O. Lanz, Swathikiran Sudhakaran•Mon Jul 30 2018

An end-to-end trainable deep neural network model for egocentric activity recognition is proposed that surpasses by up to +6% points recognition accuracy the currently best performing method that leverages hand segmentation and object location strong supervision for training.

83 0

Paper Graph

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Sergio Escalera, O. Lanz, Swathikiran Sudhakaran•Sun Nov 25 2018

This paper proposes LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence, achieving state-of-the-art performance on four standard benchmarks.

154 0

Paper Graph

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition

Andrew Zisserman, D. Damen, E. Kazakos, Arsha Nagrani•Wed Aug 21 2019

This work proposes a novel architecture for multi-modal temporal-binding, i.e. the combination of modalities within a range of temporal offsets, and demonstrates the importance of audio in egocentric vision, on per-class basis, for identifying actions as well as interacting objects.

382 0

Paper Graph

Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos

K. Grauman, Tushar Nagarajan, Bo Xiong, Yanghao Li•Thu Apr 15 2021

The "Ego-Exo" framework can be seamlessly integrated into standard video models; it outperforms all baselines when fine-tuned for egocentric activity recognition, achieving state-of-the-art results on Charades-Ego and EPIC-Kitchens-100.

115 0

Paper Graph

Adding a benchmark result helps the community track progress.