3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in chart-understanding-2
No benchmarks available.
Use these libraries to find chart-understanding-2 models and implementations
No subtasks available.
A novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach to chart perception and reasoning tasks, which is generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works.
A large-scale MultiModal ChartInstruction (MMC-Instruction) dataset is introduced comprising 600k instances supporting diverse tasks and chart types and an instruction-tuning methodology and benchmark to advance multimodal understanding of charts is proposed.
ChartGalaxy is introduced, a million-scale dataset designed to advance the understanding and generation of infographic charts and provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.
Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question answering benchmarks fall short in evaluating these capabilities of MLLMs due to the lack of paired plain charts and visual-element-based questions. To bridge this gap, we introduce InfoChartQA, a benchmark for evaluating MLLMs on infographic chart understanding. It includes 5,642 pairs of infographic and plain charts, each sharing the same underlying data but differing in visual presentations. We further design visual-element-based questions to capture their unique visual designs and communicative intent. Evaluation of 20 MLLMs reveals a substantial performance decline on infographic charts, particularly for visual-element-based questions related to metaphors. The paired infographic and plain charts enable fine-grained error analysis and ablation studies, which highlight new opportunities for advancing MLLMs in infographic chart understanding. We release InfoChartQA at https://github.com/CoolDawnAnt/InfoChartQA.
DVQA is presented, a dataset that tests many aspects of bar chart understanding in a question answering framework and two strong baselines are proposed that perform considerably better than current VQA algorithms.
The proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model and offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks.
It is found that pretraining the model on a large corpus with chart-specific low- and high-level tasks followed by finetuning on three down-streaming tasks results in state-of-the-art performance on three downstream tasks.
Compared to the popular BLIP-2, MiniGPT4, and LLaVA, Vary can maintain its vanilla capabilities while enjoying more excellent fine-grained perception and understanding ability and is competent in new document parsing features (OCR or markdown conversion).
A novel Patch-and-Text Prediction (PTP) objective is proposed, which masks and recovers both image patches of screenshots and text within screenshots, and can significantly reduce perplexity by utilizing the screenshot context.
Adding a benchmark result helps the community track progress.