3260 papers • 126 benchmarks • 313 datasets
Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.
(Image credit: Papersgraph)
These leaderboards are used to track progress in text-spotting-1
Use these libraries to find text-spotting-1 models and implementations
No subtasks available.
For the first time, a novel BezierAlign layer is designed for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods and introducing negligible computation overhead.
This work proposes a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks, and introduces RoIRotate to share convolutional features between detection and Recognition.
This paper proposes a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.
It is shown how learning a word-to-word or word- to-sentence relatedness score can improve the performance of text spotting systems up to 2.9 points, outperforming other measures in a benchmark dataset.
This work introduces a large-scale, Bilingual, Open World Video text benchmark dataset (BOVText), and proposes an end-to-end video text spotting framework with Transformer, termed TransVTSpotter, which solves the multi-orient text spotting in video with a simple, but efficient attention-based query-key mechanism.
The SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed and experiments suggest a potential preference for single- point representation in scene text spotting when compared to other representations.
This paper introduces a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
This paper proposes a feasible framework for multi-lingual arbitrary-shaped STR, including instance segmentation based text detection and language model based attention mechanism for text recognition.
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection, and is the first time to improve text detection by using a language model.
The PGNet is a single-shot text spotter, where the pixel-level character classification map is learned with proposed PG-CTC loss avoiding the usage of character-level annotations, and a graph refinement module (GRM) is proposed to optimize the coarse recognition and improve the end-to-end performance.
Adding a benchmark result helps the community track progress.