computer-vision-9

Image Comprehension

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in image-comprehension-9

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find image-comprehension-9 models and implementations

Datasets

Visual7W

Subtasks

No subtasks available.

Most implemented papers

Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia•Tue Mar 26 2024

This work introduces Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs), and proposes to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count to enhance visual tokens.

334

Content

0

Paper Graph

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter

HU Xue, Xinyi Wang, Yongming Liu, Zhuanzhe Zhao, Kun Wang•Thu May 11 2023

This paper proposes ArtGPT-4, a pioneering large vision-language model tailored to address the limitations of existing models in artistic comprehension, and shows that it can render images with an artistic-understanding and convey the emotions they inspire, mirroring human interpretation.

13 0

Paper Graph

JourneyDB: A Benchmark for Generative Image Understanding

Yi Wang, Y. Qiao, Jifeng Dai, Hongsheng Li, Yuying Ge, Xiaoshi Wu, Renrui Zhang, Junting Pan, Haodong Duan, Aojun Zhou, Keqiang Sun, Zipeng Qin•Sun Jul 02 2023

This work introduces a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding and introduces an external subset with results of another 22 text-to-image generative models, which makes JourneyDB a comprehensive benchmark for evaluating the comprehension of generated images.

171 0

Paper Graph

Hierarchical Open-vocabulary Universal Image Segmentation

Trevor Darrell, K. Kozuka, Xudong Wang, Shufang Li, Konstantinos Kallidromitis, Yu Kato•Sun Jul 02 2023

The resulting model, named HIPIE, tackles HIerarchical, oPen-vocabulary, and unIvErsal segmentation tasks within a unified framework and achieves the state-of-the-art results at various levels of image comprehension.

63 0

Paper Graph

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

Chaohui Yu, Zhibin Wang, Qiang Zhou, Shaofeng Zhang, Sitong Wu, Fan Wang•Wed Aug 02 2023

Experimental results verify that the freezing of the Q-Former can preserve the image comprehension capability of BILP-2 and further gain a comprehension of the newly introduced point cloud modality and regional objects.

32 0

Paper Graph

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Y. Qiao, Linke Ouyang, Zhiyuan Zhao, Conghui He, Kai Chen, Haodong Duan, Songyang Zhang, Wei Li, Da Lin, Pan Zhang, Xiaoyi Wang, Yuhang Cao, Chao Xu, Shuangrui Ding, Hang Yan, Xinyu Zhang, Jingwen Li, Xingcheng Zhang•Mon Sep 25 2023

This work proposes InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition that achieves competitive text-image composition scores compared to public solutions, including GPT4-V and GPT3.5.

316 0

Paper Graph

Adding a benchmark result helps the community track progress.