3260 papers • 126 benchmarks • 313 datasets
Image Matching or wide multiple baseline stereo (WxBS) is a process of establishing a sufficient number of pixel or region correspondences from two or more images depicting the same scene to estimate the geometric relationship between cameras, which produced these images. Source: The Role of Wide Baseline Stereo in the Deep Learning World ( Image credit: Kornia )
(Image credit: Papersgraph)
These leaderboards are used to track progress in image-matching-4
Use these libraries to find image-matching-4 models and implementations
SuperGlue is introduced, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points and introduces a flexible context aggregation mechanism based on attention, enabling SuperGlue to reason about the underlying 3D scene and feature assignments jointly.
A unified four-stage STR framework is introduced that most existing STR models fit into and allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations.
The proposed method, LoFTR, uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, and enables the method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points.
Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations.
It is shown that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning, and a novel hard negative-constant loss function is proposed for learning of affine regions.
A novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture that outperforms state-of-the-art detectors in terms of repeatability, matching performance and complexity is introduced.
Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.
This work argues that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints, and proposes to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness.
Adding a benchmark result helps the community track progress.