Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision

Document Image Classification

3260 papers • 126 benchmarks • 313 datasets

Document image classification is the task of classifying documents based on images of their contents. ( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in document-image-classification

Trend

Dataset

Best Model

Actions

RVL-CDIP

RVL-CDIP

Tobacco-3482

Tobacco-3482

Noisy Bangla Numeral

Libraries

i

Use these libraries to find document-image-classification models and implementations

huggingface/transformers

10 papers 126,503

Datasets

RVL-CDIP

Tobacco-3482

S-VED

SUT

Subtasks

No subtasks available.

Most implemented papers

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

N. Houlsby, Xiaohua Zhai, Thomas Unterthiner, Alexander Kolesnikov, Alexey Dosovitskiy, Lucas Beyer, M. Minderer, Jakob Uszkoreit, Dirk Weissenborn, Mostafa Dehghani, G. Heigold, S. Gelly•Wed Oct 21 2020

Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

53368

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Noisy Bangla Numeral

Noisy Bangla Characters

Noisy Bangla Characters

n-MNIST

n-MNIST

Noisy MNIST

Noisy MNIST

AIP

AIP

SUT

SUT

rwightman/pytorch-image-models

4 papers 30,120

facebookresearch/data2vec_vision

4 papers 75

PaddlePaddle/PaddleOCR

3 papers 39,132

3 papers 1,695

BordiaS/layoutlm

3 papers 81

microsoft/unilm

2 papers 18,626

facebookresearch/vissl

2 papers 3,233

Westlake-AI/openmixup

2 papers 577

2 papers 377

oneflow-inc/libai

2 papers 377

UdbhavPrasad072300/Transformer-Impl…

2 papers 57

wangyz1608/knowledge-distillation-v…

2 papers 38

0

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Danqi Chen, Omer Levy, Luke Zettlemoyer, Veselin Stoyanov, M. Lewis, Myle Ott, Naman Goyal, Mandar Joshi, Jingfei Du, Yinhan Liu•Thu Jul 25 2019

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

28280 0

Training data-efficient image transformers & distillation through attention

Francisco Massa, Matthijs Douze, M. Cord, Herv'e J'egou, Hugo Touvron, Alexandre Sablayrolles•Tue Dec 22 2020

This work produces a competitive convolution-free transformer by training on Imagenet only and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.

8094 0

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Ming Zhou, Shaohan Huang, Yiheng Xu, Furu Wei, Minghao Li, Lei Cui•Mon Dec 30 2019

The LayoutLM is proposed to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

901 0

BEiT: BERT Pre-Training of Image Transformers

Li Dong, Furu Wei, Hangbo Bao•Mon Jun 14 2021

A self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers, is introduced, and results on image classification and semantic segmentation show that the model achieves competitive results with previous pre-training methods.

3439 0

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Cha Zhang, Yijuan Lu, Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, D. Florêncio, Furu Wei•Sat Apr 17 2021

The LayoutXLM model has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset and aims to bridge the language barriers for visually-rich document understanding.

167 0

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

Sheraz Ahmed, Muhammad Zeshan Afzal, M. Liwicki, Andreas Kölsch•Mon Apr 10 2017

An exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half is presented.

88 0

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Lidong Zhou, Cha Zhang, Wanxiang Che, Yijuan Lu, Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, D. Florêncio, Furu Wei, Yang Xu, Min Zhang•Mon Dec 28 2020

LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks.

621 0

OCR-Free Document Understanding Transformer

Sangdoo Yun, Dongyoon Han, Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon Nam, Jinyeong Yim, Wonseok Hwang, Seunghyun Park•Mon Nov 29 2021

A novel OCR-free VDU model named Donut, which stands for Document understanding transformer, achieves state-of-the-art performances on various VD U tasks in terms of both speed and accuracy and offers a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.

395 0

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

U. Bhattacharya, Arindam Das, Saikat Roy•Sun Jan 28 2018

The proposed region-based Deep Convolutional Neural Network framework for document structure learning achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.

76 0

Adding a benchmark result helps the community track progress.

Document Image Classification | State-of-the-Art