3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in deep-attention-23
No benchmarks available.
Use these libraries to find deep-attention-23 models and implementations
No subtasks available.
The ability to focus on points that are relevant for matching greatly improves performance: PREDATOR raises the rate of successful registrations by more than 20% in the low-overlap scenario, and also sets a new state of the art for the 3DMatch benchmark with 89% registration recall.
This work introduces a new local sparse attention layer that preserves two-dimensional geometry and locality in SAGAN and presents a novel way to invert Generative Adversarial Networks with attention.
Tests of the proposed Deep Attention Recurrent Q-Network (DARQN) algorithm on multiple Atari 2600 games show level of performance superior to that of DQN.
A fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image and is evaluated on three classification tasks, where it allows to reduce computation and memory footprint by an order of magnitude for the same accuracy as classical architectures.
Large, fine-grained image segmentation datasets, annotated at pixel-level, are difficult to obtain, particularly in medical imaging, where annotations also require expert knowledge. Weakly-supervised learning can train models by relying on weaker forms of annotation, such as scribbles. Here, we learn to segment using scribble annotations in an adversarial game. With unpaired segmentation masks, we train a multi-scale GAN to generate realistic segmentation masks at multiple resolutions, while we use scribbles to learn their correct position in the image. Central to the model’s success is a novel attention gating mechanism, which we condition with adversarial signals to act as a shape prior, resulting in better object localization at multiple scales. Subject to adversarial conditioning, the segmentor learns attention maps that are semantic, suppress the noisy activations outside the objects, and reduce the vanishing gradient problem in the deeper layers of the segmentor. We evaluated our model on several medical (ACDC, LVSC, CHAOS) and non-medical (PPSS) datasets, and we report performance levels matching those achieved by models trained with fully annotated segmentation masks. We also demonstrate extensions in a variety of settings: semi-supervised learning; combining multiple scribble sources (a crowdsourcing scenario) and multi-task learning (combining scribble and mask supervision). We release expert-made scribble annotations for the ACDC dataset, and the code used for the experiments, at https://vios-s.github.io/multiscale-adversarial-attention-gates.
This paper proposes to incorporate the attention learning as additional objectives in a person ReID network without changing the original structure, thus maintain the same inference time and model size.
This paper compares four different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years and shows that a model that uses a multi-stream input representation and that processes each lag individually combined with a cascaded convolution and LSTM is capable of better forecasting than the other compared models.
This work presents BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing that effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
The wavelet in transformer (WiT) network is proposed to address the image desnow inverse problem and exploits the joint systemic capabilities of the vision transformer and the renowned discrete wavelet transform to achieve effective restoration of snow-degraded images.
This paper presents the deep attention-based classification (DABC) network for robust single image depth prediction, in the context of the Robust Vision Challenge 2018 (ROB 2018), and employs a soft-weighted sum inference strategy for the final prediction.
Adding a benchmark result helps the community track progress.