3260 papers • 126 benchmarks • 313 datasets
Table recognition refers to the process of automatically identifying and extracting tabular structures from unstructured data sources such as text documents, images, or scanned documents. The goal of table recognition is to accurately detect the presence of tables within the data and extract their contents, including rows, columns, headers, and cell values.
(Image credit: Papersgraph)
These leaderboards are used to track progress in table-recognition-5
Use these libraries to find table-recognition-5 models and implementations
No subtasks available.
The largest publicly available table recognition dataset PubTabNet is developed, containing 568k table images with corresponding structured HTML representation, and a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition is proposed, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric.
Document structure analysis, such as zone segmentation and table recognition, is a complex problem in document processing and is an active area of research. The recent success of deep learning in solving various computer vision and machine learning problems has not been reflected in document structure analysis since conventional neural networks are not well suited to the input structure of the problem. In this paper, we propose an architecture based on graph networks as a better alternative to standard neural networks for table recognition. We argue that graph networks are a more natural choice for these problems, and explore two gradient-based graph neural networks. Our proposed architecture combines the benefits of convolutional neural networks for visual feature extraction and graph networks for dealing with the problem structure. We empirically demonstrate that our method outperforms the baseline by a significant margin. In addition, we identify the lack of large scale datasets as a major hindrance for deep learning research for structure analysis and present a new large scale synthetic dataset for the problem of table recognition. Finally, we open-source our implementation of dataset generation and the training framework of our graph networks to promote reproducible research in this direction.
This paper proposes CascadeTabNet: a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time.
The ICDAR 2021 Scientific Literature Parsing Competition (ICDAR2021-SLP) aims to drive the advances specifically in document understanding and leverages the PubLayNet and PubTabNet datasets, which provide hundreds of thousands of training and evaluation examples.
The architecture of utilizing various object detection and table structure recognition methods to create an effective and efficient system is explained, as well as a set of development trends to keep up with state-of-the-art algorithms and future research.
This paper presents our solution for ICDAR 2021 competition on scientific literature parsing taskB: table recognition to HTML. In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment.Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm. PSENet [2] is used to detect each text line in the table image. For text linerecognition, our model is also built on MASTER. Finally, in the box assignment phase, we associatedthe text boxes detected by PSENet with the structure item reconstructed by table structure prediction,and fill the recognized content of the text line into the corresponding item. Our proposed methodachieves a 96.84% TEDS score on 9,115 validation samples in the development phase, and a 96.32%TEDS score on 9,064 samples in the final evaluation phase.
This work develops a new, more comprehensive dataset for table extraction, called PubTables-1M, and addresses a significant source of ground truth inconsistency observed in prior datasets called oversegmentation, using a novel canonicalization procedure.
It is discovered that a convolutional stem can match classic CNN backbone performance, with a much simpler model, and strike an optimal balance between two crucial factors for high-performance TSR: a higher receptive field (RF) ratio and a longer sequence length.
This work presents an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells, and incorporates structural constraints as additional differential components to the loss function for cell detection.
This paper proposes the framework of Local and Global Pyramid Mask Alignment, which adopts the soft pyramid mask learning mechanism in both the local and global feature maps and allows the predicted boundaries of bounding boxes to break through the limitation of original proposals.
Adding a benchmark result helps the community track progress.