computer-vision-9

Video Alignment

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in video-alignment-9

Trend

Dataset

Best Model

Actions

UPenn Action

MSU Video Alignment and Retrieval Benchmark Suite

Libraries

i

Use these libraries to find video-alignment-9 models and implementations

Datasets

Penn Action

N-Digit MNIST

MSU Video Alignment and Retrieval Benchmark Suite

IAW Dataset

Subtasks

No subtasks available.

Most implemented papers

Time-Contrastive Networks: Self-Supervised Learning from Video

S. Levine, Eric Jang, P. Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, S. Schaal•Sat Apr 22 2017

A self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints is proposed, and it is demonstrated that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be use as a reward function within a reinforcement learning algorithm.

Content

904

0

Paper Graph

Learning from Video and Text via Large-Scale Discriminative Clustering

Jean-Baptiste Alayrac, Antoine Miech, I. Laptev, Josef Sivic, Piotr Bojanowski•Wed Jul 26 2017

This work applies the proposed method to the problem of weakly supervised learning of actions and actors from movies together with corresponding movie scripts and proposes an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm.

44 0

Paper Graph

Temporal Cycle-Consistency Learning

Andrew Zisserman, P. Sermanet, Y. Aytar, Debidatta Dwibedi, Jonathan Tompson•Mon Apr 15 2019

It is shown that the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks.

299 0

Paper Graph

View-Invariant Probabilistic Embedding for Human Pose

Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Jennifer J. Sun, Ting Liu, Jiaping Zhao•Sun Dec 01 2019

An approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses, and uses probabilistic embeddings to model this input uncertainty.

88 0

Paper Graph

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Jennifer J. Sun, Ting Liu, Liangzhe Yuan, Long Zhao, Jiaping Zhao, Yuxiao Wang•Thu Oct 22 2020

This work proposes an approach to learning a compact view-invariant embedding space from 2D body joint keypoints, without explicitly predicting 3D poses, and investigates different keypoint occlusion augmentation strategies during training.

19 0

Paper Graph

LAMV: Learning to Align and Match Videos with Kernelized Temporal Layers

Matthijs Douze, H. Jégou, L. Baraldi, R. Cucchiara•Thu May 31 2018

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.

50 0

Paper Graph

Dynamic Temporal Alignment of Speech to Lips

Shmuel Peleg, Tavi Halperin, Ariel Ephrat•Sat Aug 18 2018

This work presents an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements, based on deep audio-visual features.

43 0

Paper Graph

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video

Wolfram Burgard, Oier Mees, Markus Merklinger, Gabriel Kalweit•Sun Oct 20 2019

This work proposes a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos by using an adversarial loss, and shows that the learned embedding enables training of continuous control policies to solve novel tasks that require the interpolation of previously seen skills.

33 0

Paper Graph

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Minghao Chen, Fangyun Wei, Deng Cai•Sun Mar 27 2022

This paper introduces a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner and shows outstanding performance on video alignment and fine-grained frame retrieval tasks.

45 0

Paper Graph

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space

M. Ryoo, Srijan Das, Jinghuan Shang•Wed Jun 22 2022

A 3D Token Representation Layer (3DTRL) is proposed that estimates the 3D positional information of the visual tokens and leverages it for learning viewpoint-agnostic representations and outperform their backbone Transformers in all the tasks with minimal added computation.

15 0

Paper Graph

Adding a benchmark result helps the community track progress.