3260 papers • 126 benchmarks • 313 datasets
A saliency map is a model that predicts eye fixations on a visual scene. Saliency prediction is informed by the human visual attention mechanism and predicts the possibility of the human eyes to stay in a certain position in the scene.
(Image credit: Papersgraph)
These leaderboards are used to track progress in saliency-prediction-21
Use these libraries to find saliency-prediction-21 models and implementations
This work proposes an approach based on a convolutional neural network pre-trained on a large-scale image classification task that achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and demonstrates the effectiveness of the suggested approach on five datasets and selected examples.
Deep SESR is presented, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution and formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation.
Qualitative and quantitative results on six challengingRGB-D benchmark datasets show the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process superior performance in learning the distribution of saliency maps.
This work introduces SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples and shows how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE.
An accurate yet compact deep network for efficient salient object detection that employs residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy.
The proposed conditional generative adversarial network-based model is suitable for real-time preprocessing in the autonomy pipeline by visually-guided underwater robots and provides improved performances of standard models for underwater object detection, human pose estimation, and saliency prediction.
Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -- pixel-, patch- and map- level -- by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersection-over-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.
The first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM) is presented and SUIM-Net, a fully-convolutional deep residual model that balances the trade-off between performance and computational efficiency is presented.
This paper proposes an architecture which combines features extracted at different levels of a Convolutional Neural Network (CNN) which outperforms under all evaluation metrics on the SALICON dataset, and achieves competitive results on the MIT300 benchmark.
A novel framework, the specificity-preserving network (SPNet), which improves SOD performance by exploring both the shared information and modality-specific properties, and which outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks.
Adding a benchmark result helps the community track progress.