3260 papers • 126 benchmarks • 313 datasets
Document understanding involves document classification, layout analysis, information extraction, and DocQA.
(Image credit: Papersgraph)
These leaderboards are used to track progress in document-understanding-8
No benchmarks available.
Use these libraries to find document-understanding-8 models and implementations
No subtasks available.
The LayoutXLM model has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset and aims to bridge the language barriers for visually-rich document understanding.
LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks.
A novel OCR-free VDU model named Donut, which stands for Document understanding transformer, achieves state-of-the-art performances on various VD U tasks in terms of both speed and accuracy and offers a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.
A novel type of text representation is introduced that preserves the 2D layout of a document by encoding each document page as a two-dimensional grid of characters and it is shown that it significantly outperforms approaches based on sequential text or document images.
The ICDAR 2021 Scientific Literature Parsing Competition (ICDAR2021-SLP) aims to drive the advances specifically in document understanding and leverages the PubLayNet and PubTabNet datasets, which provide hundreds of thousands of training and evaluation examples.
Experiments show that the proposed LayoutLLM significantly outperforms existing methods that adopt open-source 7B LLMs/MLLMs for document understanding, and brings a certain degree of interpretability, which could facilitate manual inspection and correction.
This paper represents documents as word co-occurrence networks and proposes an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD), and proposes several hierarchical variants of MPAD.
Experimental results on eight languages have shown that LiLT can achieve competitive or even superior performance on diverse widely-used downstream benchmarks, which enables language-independent benefit from the pre-training of document layout structure.
Dessurt is a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods and is able to handle a variety ofdocument domains and tasks.
Adding a benchmark result helps the community track progress.