3260 papers • 126 benchmarks • 313 datasets
Motion Estimation is used to determine the block-wise or pixel-wise motion vectors between two frames. Source: MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement
(Image credit: Papersgraph)
These leaderboards are used to track progress in motion-estimation-10
No benchmarks available.
Use these libraries to find motion-estimation-10 models and implementations
No subtasks available.
Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
This work addresses unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics.
It is shown that, surprisingly, state of the art performance can be achieved by a simple baseline that does not attempt to model motion at all, and a simple and scalable RNN architecture is proposed that obtains state-of-the-art performance on human motion prediction.
This paper proposes the Double Sphere camera model, which well fits with large field-of-view lenses, is computationally inexpensive and has a closed-form inverse, and is evaluated using a calibration dataset with several different lenses.
This letter reconstructs a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO that make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the mapping.
Extensive experiments on the KITTI VO dataset show competitive performance to state-of-the-art methods, verifying that the end-to-end Deep Learning technique can be a viable complement to the traditional VO systems.
T task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner, is proposed, which outperforms traditional optical flow on standard benchmarks as well as the Vimeo-90K dataset in three video processing tasks.
A state-of-the-art video denoising algorithm based on a convolutional neural network architecture that exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model.
This paper proposes a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles and can be well applied to either online or offline processing scenarios.
This work introduces a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is 1,000 times larger than previous real-world datasets in terms of the number of annotated frames, and designs human-interpretable metrics that better capture real world aspects by accounting for ego-motion and providing breakdowns per object type.
Adding a benchmark result helps the community track progress.