3260 papers • 126 benchmarks • 313 datasets
Pointcloud-based place recognition and retrieval
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-place-recognition-7
Use these libraries to find 3d-place-recognition-7 models and implementations
No subtasks available.
This paper proposes a combination/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud, and proposes the "lazy triplet and quadruplet" loss functions that can achieve more discriminative and generalizable global descriptors to tackle the retrieval task.
Point cloud based place recognition is still an open issue due to the difficulty in extracting local features from the raw 3D point cloud and generating the global descriptor, and it’s even harder in the large-scale dynamic environments. In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Two modules, the adaptive local feature extraction module and the graph-based neighborhood aggregation module, are proposed, which contribute to extract the local structures and reveal the spatial distribution of local features in the large-scale point cloud, with an end-to-end manner. We implement the proposed global descriptor in solving point cloud based retrieval tasks to achieve the large-scale place recognition. Comparison results show that our LPD-Net is much better than PointNetVLAD and reaches the state-of-the-art. We also compare our LPD-Net with the vision-based solutions to show the robustness of our approach to different weather and light conditions.
This paper proposes a Point Contextual Attention Network (PCAN), which can predict the significance of each local point feature based on point context, and makes it possible to pay more attention to the task-relevent features when aggregating local features.
A Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points and integrates FlexConv and Squeeze-and-Excitation to assure that the learned local descriptor captures multi-level geometric information and channel-wise relations.
Evaluation on standard benchmarks proves that MinkLoc3D outperforms current state-of-the-art methods for computing a discriminative 3D point cloud descriptor, based on a sparse voxelized point cloud representation and sparse 3D convolutions.
This work introduces a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors and proposes a novel loss function called HPHN quadruplet loss, that achieves better performance than the commonly used metric learning loss.
A novel registration-aided 3D domain adaptation network for point cloud based place recognition with outperform state-of-the-art 3D place recognition baselines or achieve comparable on the real-world Oxford RobotCar dataset with the visualization of registration on the virtual dataset.
Recently, deep learning based point cloud descriptors have achieved impressive results in the place recognition task. Nonetheless, due to the sparsity of point clouds, how to extract discriminative local features of point clouds to efficiently form a global descriptor is still a challenging problem. In this paper, we propose a pyramid point cloud transformer network (PPT-Net) to learn the discriminative global descriptors from point clouds for efficient retrieval. Specifically, we first develop a pyramid point transformer module that adaptively learns the spatial relationship of the different k-NN neighboring points of point clouds, where the grouped self-attention is proposed to extract discriminative local features of the point clouds. The grouped self-attention not only enhances long-term dependencies of the point clouds, but also reduces the computational cost. In order to obtain discriminative global descriptors, we construct a pyramid VLAD module to aggregate the multi-scale feature maps of point clouds into the global descriptors. By applying VLAD pooling on multi-scale feature maps, we utilize the context gating mechanism on the multiple global descriptors to adaptively weight the multi-scale global context information into the final global descriptor. Experimental results on the Oxford dataset and three in-house datasets show that our method achieves the state-of-the-art on the point cloud based place recognition task. Code is available at https://github.com/fpthink/PPT-Net.
This work introduces a discriminative multimodal descriptor based on a pair of sensor readings: a point cloud from a LiDAR and an image from an RGB camera, and uses late fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline.
A novel method named TransLoc3D is proposed, utilizing adaptive receptive fields with a point-wise reweighting scheme to handle objects of different sizes while suppressing noises, and an external transformer to capture long-range feature dependencies.
Adding a benchmark result helps the community track progress.