3260 papers • 126 benchmarks • 313 datasets
3D tracking of a single object, based on an initial 3D bounding box, provided to the tracker. 3D single object tracking is commonly performed using point cloud data from Lidars, as it provides valuable depth information, which is lost in camera images. However, irregular point cloud structure and an increasing point sparsity with distance makes Lidar-based 3D single object tracking a nontrivial task.
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-single-object-tracking-7
No benchmarks available.
Use these libraries to find 3d-single-object-tracking-7 models and implementations
No subtasks available.
This work proposes the BoxCloud, an informative and robust representation, to depict an object using the point-to-box relation, and designs an efficient box-aware feature fusion module, which leverages the aforementioned BoxCloud for reliable feature matching and embedding.
In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird’s-eye View (BEV) coordinates; and therefore, the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI [1] Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices; while, VPIT maintains its ability to track the object.
This work proposes a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals.
A Variational Neural Network-based version of a Voxel Pseudo Image Tracking method for 3D Single Object Tracking and it is shown that both methods improve tracking performance, while penalization of uncertain features provides the best uncertainty quality.
A simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking that synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters is proposed.
M3SOT is unveiled, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple receptive fields (continuous contexts), and multiple solution spaces (distinct tasks) in ONE model, and sidesteps the need for complex frameworks and auxiliary components to deliver sterling results.
This paper proposes a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking, which contains three blocks for feature embedding, position encoding, and self-attention feature computation.
This work introduces a motion-centric paradigm to handle 3D SOT from a new perspective and proposes a matching-free two-stage tracker M2-Track, which significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS.
DMT is proposed, a Detector-free Motion-prediction-based 3D Tracking network that completely removes the usage of complicated 3D detectors and is lighter, faster, and more accurate than previous trackers.
A Siamese point Transformer network is developed to learn shape context information of the target and develops an iterative coarse-to-fine correlation network to learn the robust cross correlation between the template and the search area.
Adding a benchmark result helps the community track progress.