3260 papers • 126 benchmarks • 313 datasets
Classification with both source Image and Text
(Image credit: Papersgraph)
These leaderboards are used to track progress in multimodal-text-and-image-classification-18
Use these libraries to find multimodal-text-and-image-classification-18 models and implementations
Fine-grained image classification is a challenging task due to the presence of hierarchical coarse-to-fine-grained distribution in the dataset. Generally, parts are used to discriminate various objects in fine-grained datasets, however, not all parts are beneficial and indispensable. In recent years, natural language descriptions are used to obtain information on discriminative parts of the object. This paper leverages on natural language description and proposes a strategy for learning the joint representation of natural language description and images using a two-branch network with multiple layers to improve the fine-grained classification task. Extensive experiments show that our approach gains significant improvements in accuracy for the fine-grained image classification task. Furthermore, our method achieves new state-of-the-art results on the CUB-200-2011 dataset.
This paper utilizes convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation of social media data to learn a joint representation using state-of-the-art deep learning techniques.
This work utilizes VisualBERT -- which meant to be the BERT of vision and language -- that was trained multimodally on images and captions and applies Ensemble Learning to detect hate speech in multimodal memes.
Harmonic-NAS is proposed, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices that demonstrates the superiority of Harmonic- NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
Adding a benchmark result helps the community track progress.