3260 papers • 126 benchmarks • 313 datasets
Incorporating multiple camera views for detection in heavily occluded scenarios.
(Image credit: Papersgraph)
These leaderboards are used to track progress in multiview-detection
Use these libraries to find multiview-detection models and implementations
No subtasks available.
This work proposes a novel multiview detection system, MVDet, which takes an anchor-free approach to aggregateMultiview information by projecting feature maps onto the ground plane (bird's eye view) and applies large kernel convolutions on theGround plane feature map to resolve any remaining spatial ambiguity.
Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies, especially pronounced in adverse situations when occlusions and objects not captured by learned representations are present.
A new architecture is introduced that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model ambiguities in people detection, and it is shown that it outperforms several state-of-the-art algorithms on challenging scenes.
This paper proposes a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiv view information, and proposes an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multivview consistency.
A novel Generalized MVD (GMVD) dataset is proposed, assimilating diverse scenes with changing daytime, camera configurations, and a varying number of cameras, and the properties essential to bring gener-alization to MVD are discussed and a barebones model incorpo-rating them are proposed.
VFA, voxelized 3D feature aggregation, is proposed, for feature transformation and aggregation in multi-view detection, which allows to identify and then aggregate 2D features along the same vertical line, alleviating projection distortions to a large extent.
This paper proposes a novel multi-view 3D object detection method named MVM3Det which simultaneously estimates the 3D position and orientation of the object according to the multi-View monocular information and presents a first dataset for multi- View monocular object detection, which achieves very competitive results.
A data augmentation method is proposed to randomly generate 3D cylinder occlusions on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training.
This paper proposes a new pedestrian representation scheme based on human point clouds modeling, using ray tracing for holistic human depth estimation, and model pedestrians as upright, thin cardboard point clouds on the ground.
This paper argues that an even more effective approach is to predict people motion over time and infer people’s presence in individual frames from these, which enables to enforce consistency bothover time and across views of a single temporal frame.
Adding a benchmark result helps the community track progress.