3260 papers • 126 benchmarks • 313 datasets
Video salient object detection (VSOD) is significantly essential for understanding the underlying mechanism behind HVS during free-viewing in general and instrumental to a wide range of real-world applications, e.g., video segmentation, video captioning, video compression, autonomous driving, robotic interaction, weakly supervised attention. Besides its academic value and practical significance, VSOD presents great difficulties due to the challenges carried by video data (diverse motion patterns, occlusions, blur, large object deformations, etc.) and the inherent complexity of human visual attention behavior (i.e., selective attention allocation, attention shift) during dynamic scenes. Online benchmark: http://dpfan.net/davsod. ( Image credit: Shifting More Attention to Video Salient Object Detection, CVPR2019-Best Paper Finalist )
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-salient-object-detection-10
Use these libraries to find video-salient-object-detection-10 models and implementations
No subtasks available.
A multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using twoSub-networks, one sub-network for salient object Detection in still images and the other for motion saliency detection in optical flow images, which significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks.
A depth-cooperated trimodal network, called DCTNet for VSOD, is proposed, which is a pioneering work to incorporate depth information to assist VSOD and introduces a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
A novel, efficient, and easy to calculate measure known as S-measure (structural measure) to evaluate foreground maps, which simultaneously evaluates region-aware and object-aware structural similarity between a foreground map and a ground-truth map.
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvL STM), which achieves state-of-the-art results on two popular benchmarks, well demonstrating its superior performance and high applicability.
The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye-fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally ~84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliency shift- aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments open up promising future directions for model development and comparison.
This paper presents an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module and proposes a novel method for generating pixel-level pseudo-labels from sparsely annotated frames.
Adding a benchmark result helps the community track progress.