3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-saliency-detection
Use these libraries to find video-saliency-detection models and implementations
No subtasks available.
This work proposes an approach based on a convolutional neural network pre-trained on a large-scale image classification task that achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and demonstrates the effectiveness of the suggested approach on five datasets and selected examples.
This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain using a ConvLSTM and a conceptually simple exponential moving average of an internal convolutional state.
This work introduces a new benchmark for predicting human eye movements during dynamic scene free-viewing, and proposes a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning.
UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state of theart for image Saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods.
The proposed hierarchical method for action recognition based on temporal and spatial features segments salient objects based on motion, edges, and colour features and can be added as a preprocessing step to most current HAR systems to improve performance.
This work proposes an end-to-end dilated inception network (DINet) for visual saliency prediction that captures multi-scale contextual features effectively with very limited extra parameters and improves the performance of the saliency model by using a set of linear normalization-based probability distribution distance metrics as loss functions.
TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports and is especially better at attending to salient moving objects.
An effective spatiotemporal feature alignment network tailored to VSP is developed, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutionsal Long Short-Term Memory (Bi-ConvLSTM) network.
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data by using the newly sensed and coded temporal information, which will be able to maintain temporal saliency awareness, achieving much improved detection performance.
A novel spatiotemporal network is advocated, where the key innovation is the design of its temporal unit, which fully enables the computation of temporal saliency cues that interact with their spatial counterparts, ultimately boosting the overall VSOD performance and realizing its full potential towards mutual performance improvement for each.
Adding a benchmark result helps the community track progress.