Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization. Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Benchmarks

Libraries

Datasets

Subtasks

Most implemented papers

Temporal Convolutional Networks for Action Segmentation and Detection

Content

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

Alleviating Over-segmentation Errors by Detecting Action Boundaries

Global2Local: Efficient Structure Search for Video Action Segmentation

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation