3260 papers • 126 benchmarks • 313 datasets
RGB Salient object detection is a task-based on a visual attention mechanism, in which algorithms aim to explore objects or regions more attentive than the surrounding areas on the scene or RGB images. ( Image credit: Attentive Feedback Network for Boundary-Aware Salient Object Detection )
(Image credit: Papersgraph)
These leaderboards are used to track progress in rgb-salient-object-detection
Use these libraries to find rgb-salient-object-detection models and implementations
This paper proposes a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block that represents multi-scale features at a granular level and increases the range of receptive fields for each network layer.
A simple yet powerful deep network architecture, U2-Net, for salient object detection (SOD), a two-level nested U-structure that enables us to train a deep network from scratch without using backbones from image classification tasks.
This paper provides a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail, and investigates the ability of existing models to detect salient objects.
This paper presents an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network to solve the complementarity between salient edge information and salient object information.
This work solves the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks by building a global guidance module (GGM) and designing a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down path- way.
This paper proposes a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture, which takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection.
The F3Net is able to segment salient object regions accurately and provide clear local details and outperforms state-of-the-art approaches on six evaluation metrics.
Qualitative and quantitative results on six challengingRGB-D benchmark datasets show the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process superior performance in learning the distribution of saliency maps.
This paper proposes to adapt pyramid pooling to Multi-Head Self-Attention (MHSA) in the vision transformer, simultaneously reducing the sequence length and capturing powerful contextual features, in a universal vision transformer backbone, dubbed Pyramid Pooling Transformer (P2T).
An accurate yet compact deep network for efficient salient object detection that employs residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy.
Adding a benchmark result helps the community track progress.