computer-vision-10

Chart Question Answering

3260 papers • 126 benchmarks • 313 datasets

Question Answering task on charts images

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in chart-question-answering-21

Trend

Dataset

Best Model

Actions

ChartQA

PlotQA

RealCQA

Libraries

i

Use these libraries to find chart-question-answering-21 models and implementations

huggingface/transformers

3 papers 139,691

Datasets

Subtasks

No subtasks available.

Most implemented papers

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Julian Martin Eisenschlos, Kristina Toutanova, Ming-Wei Chang, Fangyu Liu, Hexiang Hu, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Iulia Turc, Peter Shaw•Thu Oct 06 2022

Pix2Struct is presented, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language and introduced a variable-resolution input representation and a more flexible integration of language and vision inputs.

Content

RealCQA

Vega-Lite Chart Collection

SBS Figures

382

0

Paper Graph

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

Botian Shi, Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Yu Qiao, Junchi Yan, Bo Zhang•Tue Sep 19 2023

A novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach to chart perception and reasoning tasks, which is generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works.

6 0

Paper Graph

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Basil Mustafa, N. Houlsby, Mario Lucic, Xiaohua Zhai, Yuanzhong Xu, Alexander Kolesnikov, Lucas Beyer, Mostafa Dehghani, Anurag Arnab, M. Minderer, Arsha Nagrani, Radu Soricut, Sebastian Goodman, Yang Li, Siamak Shakeri, A. Piergiovanni, J. Djolonga, Michael Tschannen, Bo Pang, Soravit Changpinyo, Austin Waters, Gang Li, Hexiang Hu, Mandar Joshi, Kenton Lee, Yi Tay, Ceslee Montgomery, Piotr Padlewski, Xi Chen, A. Angelova, Jialin Wu, Carlos Riquelme Ruiz, Xiao Wang, Daniel M. Salz, Paulina Pietrzyk, Marvin Ritter, Filip Pavetic, Ibrahim M. Alabdulmohsin, J. Amelot, A. Steiner, Daniel Keysers, Keran Rong, Mojtaba Seyedhosseini•Sun May 28 2023

PaLI-X, a multilingual vision and language model, advances the state-of-the-art on most vision-and-language benchmarks considered and observes emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.

255 0

Paper Graph

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jingren Zhou, Shuai Bai, Junyang Lin, Peng Wang, Jinze Bai, Shijie Wang, Sinan Tan, Chang Zhou•Wed Aug 23 2023

The Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images, set new records for generalist models under similar model scales on a broad range of visual-centric benchmarks.

1653 0

Paper Graph

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Srinivas Sunkara, Maria Wang, Gilles Baechler, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Carbune, Jason Lin, Jindong Chen, Abhanshu Sharma•Tue Feb 06 2024

This work introduces ScreenAI, a vision-language model that specializes in UI and infographics understanding that improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets.

98 0

Paper Graph

FigureQA: An Annotated Figure Dataset for Visual Reasoning

Samira Ebrahimi Kahou, Vincent Michalski, Yoshua Bengio, Adam Trischler, Adam Atkinson, Ákos Kádár•Wed Oct 18 2017

FigureQA is envisioned as a first step towards developing models that can intuitively recognize patterns from visual representations of data, and preliminary results indicate that the task poses a significant machine learning challenge.

414 0

Paper Graph

DVQA: Understanding Data Visualizations via Question Answering

Kushal Kafle, Christopher Kanan, Scott D. Cohen, Brian L. Price•Tue Jan 23 2018

DVQA is presented, a dataset that tests many aspects of bar chart understanding in a question answering framework and two strong baselines are proposed that perform considerably better than current VQA algorithms.

478 0

Paper Graph

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Robik Shrestha, Kushal Kafle, Christopher Kanan, Scott D. Cohen, Brian L. Price•Sun Aug 04 2019

This work proposes a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL), which first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learnedembeddings to answer the given question.

65 0

Paper Graph

Classification-Regression for Chart Comprehension

Rami Ben-Ari, Dani Lischinski, Matan Levy•Sun Nov 28 2021

This work proposes a new model that jointly learns classification and regression for chart question answering, and uses co-attention transformers to capture the complex real-world interactions between the question and the textual elements.

17 0

Paper Graph

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Shafiq R. Joty, Do Xuan Long, Enamul Hoque, Ahmed Masry, J. Tan•Fri Mar 18 2022

This work presents two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions and achieves the state-of-the-art results on the previous datasets as well as on the benchmark.

1158 0

Paper Graph

Adding a benchmark result helps the community track progress.