3260 papers • 126 benchmarks • 313 datasets
Visual Localization is the problem of estimating the camera pose of a given image relative to a visual representation of a known scene. Source: Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization
(Image credit: Papersgraph)
These leaderboards are used to track progress in visual-localization
Use these libraries to find visual-localization models and implementations
No subtasks available.
This paper proposes a combination/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud, and proposes the "lazy triplet and quadruplet" loss functions that can achieve more discriminative and generalizable global descriptors to tackle the retrieval task.
The proposed method, LoFTR, uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, and enables the method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points.
This work proposes a learning-based method, titled Deep Closest Point (DCP), inspired by recent techniques in computer vision and natural language processing, that provides a state-of-the-art registration technique and evaluates the suitability of the learned features transferred to unseen objects.
An end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model is developed.
This work uses prior information to improve model hypothesis search, increasing the chance of finding outlier-free minimal sets and combines neural guidance with differentiable RANSAC to build neural networks which focus on certain parts of the input data and make the output predictions as good as possible.
It is shown that exploiting temporal continuity in the testing sequence significantly improves visual localization and achieves better results than state of the art approaches for the task on visual localization under significant appearance change.
It is argued that drones could serve as the third platform to deal with the geo-localization problem and propose a strong CNN baseline on this challenging dataset, named University-1652, which is the first drone-based geo- localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation.
HF-Net is proposed, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
Bundle-Adjusting Neural Radiance Fields (BARF) is proposed for training NeRF from imperfect (or even unknown) camera poses — the joint problem of learning neural 3D representations and registering camera frames and it is shown that coarse-to-fine registration is also applicable to NeRF.
Adding a benchmark result helps the community track progress.