3260 papers • 126 benchmarks • 313 datasets
Action Segmentation from weak (transcript) supervision.
(Image credit: Papersgraph)
These leaderboards are used to track progress in weakly-supervised-action-segmentation-transcript-10
Use these libraries to find weakly-supervised-action-segmentation-transcript-10 models and implementations
No subtasks available.
This paper proposes a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network that achieves the accuracy of state-of-the-art approaches while being 14 times faster to train and 20 times faster during inference.
A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.
We address the problem of learning to segment actions from weakly-annotated videos, i.e., videos accompanied by transcripts (ordered list of actions). We propose a framework in which we model actions with a union of low-dimensional subspaces, learn the subspaces using transcripts and refine video features that lend themselves to action subspaces. To do so, we design an architecture consisting of a Union-of-Subspaces Network, which is an ensemble of autoencoders, each modeling a low-dimensional action subspace and can capture variations of an action within and across videos. For learning, at each iteration, we generate positive and negative soft alignment matrices using the segmentations from the previous iteration, which we use for discriminative training of our model. To regularize the learning, we introduce a constraint loss that prevents imbalanced segmentations and enforces relatively similar duration of each action across videos. To have a real-time inference, we develop a hierarchical segmentation framework that uses subset selection to find representative transcripts and hierarchically align a test video with increasingly refined representative transcripts. Our experiments on three datasets show that our method improves the state-of-the-art action segmentation and alignment, while speeding up the inference time by a factor of 4 to 13. 1
Adding a benchmark result helps the community track progress.