reasoning-1

Multimodal Reasoning

3260 papers • 126 benchmarks • 313 datasets

Reasoning over multimodal inputs.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multimodal-reasoning-1

Trend

Dataset

Best Model

Actions

REBUS

$MATH-V$

MATH-V

AlgoPuzzleVQA

Libraries

i

Use these libraries to find multimodal-reasoning-1 models and implementations

Datasets

Subtasks

No subtasks available.

Most implemented papers

e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

Zeynep Akata, Oana-Maria Camburu, Thomas Lukasiewicz, Virginie Do•Mon Apr 06 2020

This paper presents a data collection effort to correct the class with the highest error rate in SNLI-VE, and re-evaluate an existing model on the corrected corpus, which is called SN LI-VE-2.0, and introduces e-SNLI-UTE, which appends human-written natural language explanations to SNLI -VE- 2.0.

34

Content

0

Paper Graph

Dual Attention Networks for Multimodal Reasoning and Matching

Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim•Tue Nov 01 2016

This work proposes Dual Attention Networks which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language and introduces two types of DANs for multimodal reasoning and matching, respectively.

703 0

Paper Graph

WebQA: Multihop and Multimodal QA

Jianfeng Gao, Yonatan Bisk, Yingshan Chang, M. Narang, Hisami Suzuki, Guihong Cao•Tue Aug 31 2021

This work introduces WEBQA, a challenging new benchmark that proves difficult for large-scale state-of-the-art models which lack language groundable visual representations for novel objects and the ability to reason, yet trivial for humans.

115 0

Paper Graph

Multimodal Analogical Reasoning over Knowledge Graphs

Huajun Chen, Ningyu Zhang, Shumin Deng, Xiaozhuan Liang, Ningyu Zhang, Lei Li•Fri Sep 30 2022

A novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory, which can obtain better performance.

36 0

Paper Graph

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

Z. Li, Hai Zhao, Yao Yao•Thu May 25 2023

Graph-of-Thought reasoning is proposed, which models human thought processes not only as a chain but also as a graph, and achieves significant improvement over the strong CoT baseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59% using the T5-base model.

47 0

Paper Graph

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

Soujanya Poria, Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken•Tue Mar 05 2024

This paper presents a new dataset, AlgoPuzzleVQA, designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning.

22 0

Paper Graph

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

Soujanya Poria, Deepanway Ghosal, Lidong Bing, Yew Ken Chia, Vernon Toh Yan Han•Tue Mar 19 2024

A systematic analysis finds that the main bottlenecks of GPT-4V are weaker visual perception and inductive reasoning abilities, which hope to shed light on the limitations of large multimodal models and how they can better emulate human cognitive processes in the future.

48 0

Paper Graph

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Fandong Meng, Peng Li, Jie Zhou, Jiaming Xu, Bo Xu, Feilong Chen•Tue Dec 17 2019

A novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM, which synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning.

35 0

Paper Graph

A Multimodal Framework for the Detection of Hateful Memes

H. Yannakoudakis, Ekaterina Shutova, Phillip Lippe, Nithin Holla, Shantanu Chandra, S. Rajamanickam, Georgios Antoniou•Tue Dec 22 2020

This work improves the performance of existing multimodal approaches beyond simple fine-tuning and shows the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning based on cross-validation to improve robustness.

94 0

Paper Graph

UniT: Multimodal Multitask Learning with a Unified Transformer

Ronghang Hu•Sun Feb 21 2021

UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to natural language understanding and multimodal reasoning, achieves strong performance on each task with significantly fewer parameters.

346 0

Paper Graph

Adding a benchmark result helps the community track progress.