Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

medical-2

Multi-modal Classification

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multi-modal-classification-5

Trend

Dataset

Best Model

Actions

VGG-Sound

AudioSet

Libraries

Use these libraries to find multi-modal-classification-5 models and implementations

Datasets

AudioSet

VGG-Sound

Mudestreda

Subtasks

Image-text Classification

Most implemented papers

What Makes Training Multi-Modal Classification Networks Hard?

Du Tran, Weiyao Wang, Matt Feiszli•Tue May 28 2019

This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity and second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal.

573

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Paper Graph

Image and Encoded Text Fusion for Multi-Modal Classification

I. Gallo, Alessandro Calefati, Shah Nawaz, Muhammad Kamran Janjua•Tue Oct 02 2018

This paper presents a novel multi-modal approach that fuses images and text descriptions to improve multi- modal classification performance in real-world scenarios and evaluates the approach against two famous multi-Modal strategies namely early fusion and late fusion.

48 0

Paper Graph

Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification

Junzhou Huang, Zongbo Han, Fan Yang, Changqing Zhang, Jianhua Yao•Tue May 31 2022

Integration of heterogeneous and high-dimensional data (e.g., multiomics) is becoming increasingly important. Existing multimodal classification algorithms mainly focus on improving performance by exploiting the complementarity from different modalities. However, conventional approaches are basically weak in providing trustworthy multimodal fusion, especially for safety-critical applications (e.g., medical diagnosis). For this issue, we propose a novel trustworthy multimodal classification algorithm termed Multimodal Dynamics, which dynamically evaluates both the feature-level and modality-level informativeness for different samples and thus trustworthily integrates multiple modalities. Specifically, a sparse gating is introduced to capture the information variation of each within-modality feature and the true class probability is employed to assess the classification confidence of each modality. Then a transparent fusion algorithm based on the dynamical informativeness estimation strategy is induced. To the best of our knowledge, this is the first work to jointly model both feature and modality variation for different samples to provide trustworthy fusion in multi-modal classification. Extensive experiments are conducted on multimodal medical classification datasets. In these experiments, superior performance and trustworthiness of our algorithm are clearly validated compared to the state-of-the-art methods.

143 0

Paper Graph

Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations

Tanmoy Chakraborty, Md. Shad Akhtar, Shivani Kumar, Manjot Bedi•Wed May 19 2021

A Hindi-English code-mixed dataset is developed for the multi-modal sarcasm detection and humor classification in conversational dialog, and a novel attention-rich neural architecture for the utterance classification is proposed.

91 0

Paper Graph

On Modality Bias Recognition and Reduction

Mohan Kankanhalli, A. Bimbo, Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng•Thu Feb 24 2022

A plug-and-play loss function method is proposed, whereby the feature space for each label is adaptively learned according to the training set statistics, which yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.

48 0

Paper Graph

UAVM: Towards Unifying Audio and Visual Models

Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass•Thu Jul 28 2022

The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound and finds a few intriguing properties of UavM that the modality-independent counterparts do not have.

30 0

Paper Graph

Contrastive Audio-Visual Masked Autoencoder

Hilde Kuehne, Yuan Gong, James R. Glass, Leonid Karlinsky, David F. Harwath, Alexander H. Liu, Andrew Rouditchenko•Sat Oct 01 2022

The Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) is proposed by combining contrastive learning and masked data modeling, two major self-supervised learning frameworks, to learn a joint and coordinated audio-visual representation.

168 0

Paper Graph

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Li Zhang, Yi-Zhe Song, Xiatian Zhu, Tao Xiang, Xiaoping Han, Licheng Yu•Fri Mar 03 2023

This work proposes a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL), which applies a single model for multiple heterogeneous fashion tasks, therefore being much more parameter-efficient.

67 0

Paper Graph

Adding a benchmark result helps the community track progress.

Multi-modal Classification | State-of-the-Art