3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in point-tracking-10
Use these libraries to find point-tracking-10 models and implementations
No subtasks available.
This work proposes DragGAN, a powerful yet much less explored way of controlling GANs to "drag" any points of the image to precisely reach target points in a user-interactive manner, which consists of a feature-based motion supervision that drives the handle point to move towards the target position, and a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points.
A novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video, and proposes a simple end-to-end point tracking model TAP-Net, which outperforms all prior methods on the authors' benchmark when trained on synthetic data.
Modifications to the PIPs point tracking method are introduced, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.
Discriminative Correlation Filters have demonstrated excellent performance for visual object tracking and the key to their success is the ability to efficiently exploit available negative data.
Despite significant research in the area, reconstruction of multiple dynamic rigid objects (eg. vehicles) observed from wide-baseline, uncalibrated and unsynchronized cameras, remains hard. On one hand, feature tracking works well within each view but is hard to correspond across multiple cameras with limited overlap infields of view or due to occlusions. On the other hand, advances in deep learning have resulted in strong detectors that work across different viewpoints but are still not precise enough for triangulation-based reconstruction. In this work, we develop a framework to fuse both the single-view feature tracks and multiview detected part locations to significantly improve the detection, localization and reconstruction of moving vehicles, even in the presence of strong occlusions. We demonstrate our framework at a busy traffic intersection by reconstructing over 62 vehicles passing within a 3-minute window. We evaluate the different components within our framework and compare to alternate approaches such as reconstruction using tracking-by-detection.
A deep reinforcement learning method is proposed to estimate the muscle excitations in simulated biomechanical systems and introduces a custom-made reward function which incentivizes faster point-to-point tracking of target motion.
The results show that SD-DefSLAM outperforms DefSLAM in point tracking, reconstruction accuracy and scale drift thanks to the improvement in all the data association steps, being the first system able to robustly perform SLAM inside the human body.
This work proposes virtual point tracking for real-time target-less dynamic displacement measurement, incorporating deep learning techniques and domain knowledge and demonstrates the approach for a railway application, where the lateral displacement of the wheel on the rail is measured during operation.
Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking. Extensive experiments, including 3D action recognition and 4D semantic segmentation, on four benchmarks demonstrate the effectiveness of our P4Transformer for point cloud video modeling.
This work proposes Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations and creates a large-scale point cloud single object tracking benchmark based on the Waymo Open Dataset.
Adding a benchmark result helps the community track progress.