3260 papers • 126 benchmarks • 313 datasets
Detecting the text in the image and localise it using a bounding box. The text can be in any shape and size. We need to localise all such instances of text in the entire image along with bounding box for each word.
(Image credit: Papersgraph)
These leaderboards are used to track progress in text-detection-7
Use these libraries to find text-detection-7 models and implementations
No subtasks available.
A novel Progressive Scale Expansion Network (PSENet) is proposed, which can precisely detect text instances with arbitrary shapes and is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
A new scene text detection method to effectively detect text area by exploring each character and affinity between characters by exploiting both the given character- level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model.
This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.
For the first time, a novel BezierAlign layer is designed for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods and introducing negligible computation overhead.
This paper proposes a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network, and validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed.
Extensive experiments demonstrate that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes, and also validate the effectiveness and the good generalization of FCENet for arbitrary-shaped text detection.
This work proposes a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks, and introduces RoIRotate to share convolutional features between detection and Recognition.
SegLink, an oriented text detection method to decompose text into two locally detectable elements, namely segments and links, achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.
This paper proposes an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
Adding a benchmark result helps the community track progress.