3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-saliency-detection-5
Use these libraries to find video-saliency-detection-5 models and implementations
No subtasks available.
This work proposes an approach based on a convolutional neural network pre-trained on a large-scale image classification task that achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and demonstrates the effectiveness of the suggested approach on five datasets and selected examples.
This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain using a ConvLSTM and a conceptually simple exponential moving average of an internal convolutional state.
UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state of theart for image Saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods.
This work introduces a new benchmark for predicting human eye movements during dynamic scene free-viewing, and proposes a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning.
This work proposes an end-to-end dilated inception network (DINet) for visual saliency prediction that captures multi-scale contextual features effectively with very limited extra parameters and improves the performance of the saliency model by using a set of linear normalization-based probability distribution distance metrics as loss functions.
TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports and is especially better at attending to salient moving objects.
An effective spatiotemporal feature alignment network tailored to VSP is developed, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutionsal Long Short-Term Memory (Bi-ConvLSTM) network.
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data by using the newly sensed and coded temporal information, which will be able to maintain temporal saliency awareness, achieving much improved detection performance.
Adding a benchmark result helps the community track progress.