Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

natural-language-processing-9

Multimodal Text and Image Classification

3260 papers • 126 benchmarks • 313 datasets

Classification with both source Image and Text

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multimodal-text-and-image-classification-18

Trend

Dataset

Best Model

Actions

Food-101

CUB-200-2011

CD18

Libraries

Use these libraries to find multimodal-text-and-image-classification-18 models and implementations

Datasets

Subtasks

image-sentence alignment Open-World Social Event Classification

Most implemented papers

Are These Birds Similar: Learning Branched Networks for Fine-grained Representations

Nicola Landro, I. Gallo, Alessandro Calefati, Shah Nawaz, Moreno Caraffini•Sat Nov 30 2019

Fine-grained image classification is a challenging task due to the presence of hierarchical coarse-to-fine-grained distribution in the dataset. Generally, parts are used to discriminate various objects in fine-grained datasets, however, not all parts are beneficial and indispensable. In recent years, natural language descriptions are used to obtain information on discriminative parts of the object. This paper leverages on natural language description and proposes a strategy for learning the joint representation of natural language description and images using a two-branch network with multiple layers to improve the fine-grained classification task. Extensive experiments show that our approach gains significant improvements in accuracy for the fine-grained image classification task. Furthermore, our method achieves new state-of-the-art results on the CUB-200-2011 dataset.

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

14 0

Paper Graph

Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response

Firoj Alam, Ferda Ofli, Muhammad Imran•Mon Apr 13 2020

This paper utilizes convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation of social media data to learn a joint representation using state-of-the-art deep learning techniques.

125 0

Paper Graph

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge

Riza Velioglu, J. Rose•Tue Dec 22 2020

This work utilizes VisualBERT -- which meant to be the BERT of vision and language -- that was trained multimodally on images and captions and applies Ensemble Learning to detect hate speech in multimodal memes.

107 0

Paper Graph

Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices

Mohamed Imed Eddine Ghebriout, Halima Bouzidi, S. Niar, Hamza Ouarnoughi•Mon Sep 11 2023

Harmonic-NAS is proposed, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices that demonstrates the superiority of Harmonic- NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.

8 0

Paper Graph

Adding a benchmark result helps the community track progress.

Multimodal Text and Image Classification | State-of-the-Art