U-Net Based Architectures for Document Text Detection and Binarization

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the increasing popularity of document analysis and recognition systems, text detection (TD) and text binarization (TB) in document images remain challenging tasks. In the paper, we introduced a two-step architecture for the TD task. Firstly, a U-net based model is used to get a text mask in terms of word-level bounding boxes. Secondly, we approximate the mask of the bounding boxes with rectangles using a classic computer vision method. The model achieves state-of-the-art result on document images and outperforms other popular approaches. Moreover, we introduce the Hybrid U-net architecture, which helps to solve the TB and TD problems at the same time. The model demonstrates high results on both problems. The shared convolution encoder allows to reduce the number of parameters and consumed memory compared to separate models without reducing the model performance.

Cite

CITATION STYLE

APA

Nikitin, F., Dokholyan, V., Zharikov, I., & Strijov, V. (2019). U-Net Based Architectures for Document Text Detection and Binarization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11845 LNCS, pp. 79–88). Springer. https://doi.org/10.1007/978-3-030-33723-0_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free