computer-vision-10

Weakly Supervised Action Segmentation (Transcript)

3260 papers • 126 benchmarks • 313 datasets

Action Segmentation from weak (transcript) supervision.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in weakly-supervised-action-segmentation-transcript-10

Trend

Dataset

Best Model

Actions

Breakfast

Libraries

i

Use these libraries to find weakly-supervised-action-segmentation-transcript-10 models and implementations

Datasets

Breakfast

EgoProceL

Subtasks

No subtasks available.

Most implemented papers

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

Juergen Gall, Luca Minciullo, G. Francesca, Yaser Souri•Thu Apr 04 2019

This paper proposes a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network that achieves the accuracy of state-of-the-art approaches while being 14 times faster to train and 20 times faster during inference.

62

Content

0

Paper Graph

Weakly Supervised Energy-Based Learning for Action Segmentation

S. Todorovic, Jun Li, Peng Lei•Fri Sep 27 2019

A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.

104 0

Paper Graph

Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning

Ehsan Elhamifar, Zijia Lu•Thu Sep 30 2021

We address the problem of learning to segment actions from weakly-annotated videos, i.e., videos accompanied by transcripts (ordered list of actions). We propose a framework in which we model actions with a union of low-dimensional subspaces, learn the subspaces using transcripts and refine video features that lend themselves to action subspaces. To do so, we design an architecture consisting of a Union-of-Subspaces Network, which is an ensemble of autoencoders, each modeling a low-dimensional action subspace and can capture variations of an action within and across videos. For learning, at each iteration, we generate positive and negative soft alignment matrices using the segmentations from the previous iteration, which we use for discriminative training of our model. To regularize the learning, we introduce a constraint loss that prevents imbalanced segmentations and enforces relatively similar duration of each action across videos. To have a real-time inference, we develop a hierarchical segmentation framework that uses subset selection to find representative transcripts and hierarchically align a test video with increasingly refined representative transcripts. Our experiments on three datasets show that our method improves the state-of-the-art action segmentation and alignment, while speeding up the inference time by a factor of 4 to 13. 1

33 0

Paper Graph

Adding a benchmark result helps the community track progress.