3260 papers • 126 benchmarks • 313 datasets
Stereo Matching is one of the core technologies in computer vision, which recovers 3D structures of real world from 2D images. It has been widely used in areas such as autonomous driving, augmented reality and robotics navigation. Given a pair of rectified stereo images, the goal of Stereo Matching is to compute the disparity for each pixel in the reference image, where disparity is defined as the horizontal displacement between a pair of corresponding pixels in the left and right images. Source: Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching
(Image credit: Papersgraph)
These leaderboards are used to track progress in stereo-matching-1
No benchmarks available.
Use these libraries to find stereo-matching-1 models and implementations
No subtasks available.
HITNet is a novel neural network architecture for real-time stereo matching that not only geometrically reasons about disparities but also infers slanted plane hypotheses allowing to more accurately perform geometric warping and upsampling operations.
PSMNet is a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN, which takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume.
This paper proposes a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes and applies the cascade cost volume to the representative MVS-Net, obtaining a 35.6% improvement on DTU benchmark.
LidarStereoNet is presented, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps, and is proposed to incorporate the piecewise planar model into the network learning to further constrain depths to conform to the underlying 3D geometry.
Two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively are proposed, which can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity.
This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume while still allowing the network to learn suitable matching costs from data, and achieves state-of-the-art accuracy on various datasets.
This paper proposes CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network, and employs a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space.
Two light models for stereo vision with reduced complexity and without sacrificing accuracy are proposed, based on a 2D and a 3D model with encoder-decoders built from2D and 3D convolutions, respectively.
A hierarchical network with recurrent refinement to update disparities in a coarse-to-fine manner, as well as a stacked cascaded architecture for inference and a new synthetic dataset with special attention to difficult cases for better generalizing to real-world scenes are introduced.
Adding a benchmark result helps the community track progress.