3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in birds-eye-view-semantic-segmentation-21
Use these libraries to find birds-eye-view-semantic-segmentation-21 models and implementations
No subtasks available.
This work presents a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks and remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions.
The architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism, performing at state-of-the-art on the nuScenes dataset, with 4x faster inference speeds.
The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation and is shown to be generalizable to other tasks, including 1) BEV segmentation with single-agent multi-camera and 2) 3D object detection with multi-agent LiDAR systems.
This paper proposes an efficient multi-camera to Bird’s-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT, which enjoys a faster speed and less memory footprint while remaining deploy-friendly.
The introduction of LandCover.ai (Land Cover from Aerial Imagery) dataset for semantic segmentation proves that the automatic mapping of land cover is possible with a relatively small, cost-efficient, RGB-only dataset.
In pursuit of the goal of learning dense representations for motion planning, it is shown that the representations inferred by the model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by the network.
PETRv2, a unified framework for 3D perception from multi-view images, is proposed and the 3D position embedding in PETR is extended for temporal modeling, which achieves the temporal alignment on object position of different frames.
It is found that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect, and it is demonstrated that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems.
'LaRa' is presented, an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras that outperforms the best previous works using transformers on nuScenes.
Adding a benchmark result helps the community track progress.