3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in gaze-prediction-2
No benchmarks available.
Use these libraries to find gaze-prediction-2 models and implementations
No subtasks available.
A new dataset and benchmark to evaluate gaze prediction from EEG measurements, consisting of simultaneous Electroencephalography and Eye-tracking recordings from 356 different subjects collected from three different experimental paradigms, is presented.
Through a combination of knowledge distillation and Fisher pruning, this paper obtains much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset.
A hybrid model based on deep neural networks which integrates task-dependent attention transition with bottom-up saliency prediction is proposed which significantly outperforms state-of-the-art gaze prediction methods and is able to learn meaningful transition of human attention.
We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all well-established baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.
A novel wUoC metric is proposed that can reveal the difference between boxes even when they share no overlapping area and is verification the superiority of the method in all three tracks, i.e. object detection, gaze estimation, and gaze object prediction.
The audio-visual landscape of social interactions is distilled into a number of multimodal patches that convey different social value, and the general frame of foraging as a tradeoff between local patch exploitation and landscape exploration is worked under.
Gazeformer is a novel model that outperforms existing target-detection models on standard gaze prediction for both target- present and target-absent search tasks, and is more than five times faster than the state-of-the-art target-present visual search model.
Adding a benchmark result helps the community track progress.