3260 papers • 126 benchmarks • 313 datasets
Optical Flow Estimation is a computer vision task that involves computing the motion of objects in an image or a video sequence. The goal of optical flow estimation is to determine the movement of pixels or features in the image, which can be used for various applications such as object tracking, motion analysis, and video compression. Approaches for optical flow estimation include correlation-based, block-matching, feature tracking, energy-based, and more recently gradient-based. Further readings: Optical Flow Estimation Performance of Optical Flow Techniques Definition source: Devon: Deformable Volume Network for Learning Optical Flow Image credit: Optical Flow Estimation
(Image credit: Papersgraph)
These leaderboards are used to track progress in optical-flow-estimation-11
Use these libraries to find optical-flow-estimation-11 models and implementations
PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume, and outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks.
This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
A real-time intermediate flow estimation algorithm (RIFE) for video frame interpolation (VFI) that can be trained end-to-end and achieve excellent performance and achieves state-of-the-art index on several benchmarks is proposed.
The concept of end-to-end learning of optical flow is advanced and it work really well, and faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet are presented.
Recurrent All-Pairs Field Transforms achieves state-of-the-art performance on the KITTI and Sintel datasets and has strong cross-dataset generalization as well as high efficiency in inference time, training speed, and parameter count.
The Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters, which makes it more efficient and appropriate for embedded applications.
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
This paper develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously, which allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames.
This work proposes Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs and augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering.
This paper proposes a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels, and broadcast high-level features to high resolution features effectively and efficiently and exhibits superior performance over other real-time methods even on light-weight backbone networks.
Adding a benchmark result helps the community track progress.