3260 papers • 126 benchmarks • 313 datasets
Point cloud data represents 3D shapes as a set of discrete points in 3D space. This kind of data is primarily sourced from 3D scanners, LiDAR systems, and other similar technologies. Point cloud processing has a wide range of applications, such as robotics, autonomous vehicles, and augmented/virtual reality. Pre-training on point cloud data is similar in spirit to pre-training on images or text. By pre-training a model on a large, diverse dataset, it learns essential features of the data type, which can then be fine-tuned on a smaller, task-specific dataset. This two-step process (pre-training and fine-tuning) often results in better performance, especially when the task-specific dataset is limited in size.
(Image credit: Papersgraph)
These leaderboards are used to track progress in point-cloud-pre-training-5
No benchmarks available.
Use these libraries to find point-cloud-pre-training-5 models and implementations
No datasets available.
No subtasks available.
Point-M2AE is proposed, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds that modifications the encoder and decoder into pyramid architectures to progressively model spatial geometries and capture both fine-grained and high-level semantics of3D shapes.
A novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation by differentiable neural rendering is introduced, thereby establishing a pathway to 3D foundational models.
This work aims at facilitating research on 3D representation learning by selecting a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes and achieving improvement over recent best results in segmentation and detection across 6 different benchmarks.
This paper shows that this method outperforms previous pre-training methods in object classification, and both part-based and semantic segmentation tasks, and even when it pre-train on a single dataset (ModelNet40), improves accuracy across different datasets and encoders.
A new self-supervised learning method, called Mixing and Disentangling (MD), for 3D point cloud representation learning, which has improved the empirical performance on both ModelNet-40 and ShapeNet-Part datasets in terms of point cloud classification and segmentation tasks.
The construction of 3D point cloud datasets requires a great deal of human effort. Therefore, constructing a large-scale 3D point clouds dataset is difficult. In order to rem-edy this issue, we propose a newly developed point cloud fractal database (PC-FractalDB), which is a novel family of formula-driven supervised learning inspired by fractal geometry encountered in natural 3D structures. Our re-search is based on the hypothesis that we could learn rep-resentations from more real-world 3D patterns than con-ventional 3D datasets by learning fractal geometry. We show how the PC-FractalDB facilitates solving several re-cent dataset-related problems in 3D scene understanding, such as 3D model collection and labor-intensive annotation. The experimental section shows how we achieved the performance rate of up to 61.9% and 59.0% for the Scan-NetV2 and SUN RGB-D datasets, respectively, over the current highest scores obtained with the PointContrast, con-trastive scene contexts (CSC), and RandomRooms. More-over, the PC-FractalDB pre-trained model is especially ef-fective in training with limited data. For example, in 10% of training data on ScanNetV2, the PC-FractalDB pre-trained VoteNet performs at 38.3%, which is +14.8% higher accu-racy than CSC. Of particular note, we found that the pro-posed method achieves the highest results for 3D object de-tection pre-training in limited point cloud data. 11Dataset release: https://ryosuke-yamada.github.io/PointCloud-FractalDataBase/
POS-BERT is proposed, a one-stage BERT pre-training method for point clouds that achieves the state-of-the-art classification accuracy and has significantly improved many downstream tasks, such as fine-tuned classification, few-shot classification, part segmentation.
A new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals that optimizes with both inter-cluster and inter-proposal separation.
The McP-BERT, a pre-training framework with multi-choice tokens, ease the previous single-choice constraint on patch token ids in Point-BERT, and provides multi-choice token ids for each patch as supervision to improve the performance of Point-BERT on all downstream tasks without extra computational overhead.
A bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation in a BEV perspective and avoid complex decoder design during pre-training is proposed and BEV-MAE achieves new state-of-the-art LiDAR-based 3D object detection results.
Adding a benchmark result helps the community track progress.