U-Net Based Architectures for Document Text Detection and Binarization

Filipp Nikitin; Vladimir Dokholyan; Ilia Zharikov; Vadim Strijov

Conference Proceedings

U-Net Based Architectures for Document Text Detection and Binarization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11845 LNCS 79-88

DOI: 10.1007/978-3-030-33723-0_7

3Citations

4Readers

Get full text

Abstract

With the increasing popularity of document analysis and recognition systems, text detection (TD) and text binarization (TB) in document images remain challenging tasks. In the paper, we introduced a two-step architecture for the TD task. Firstly, a U-net based model is used to get a text mask in terms of word-level bounding boxes. Secondly, we approximate the mask of the bounding boxes with rectangles using a classic computer vision method. The model achieves state-of-the-art result on document images and outperforms other popular approaches. Moreover, we introduce the Hybrid U-net architecture, which helps to solve the TB and TD problems at the same time. The model demonstrates high results on both problems. The shared convolution encoder allows to reduce the number of parameters and consumed memory compared to separate models without reducing the model performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Nikitin, F., Dokholyan, V., Zharikov, I., & Strijov, V. (2019). U-Net Based Architectures for Document Text Detection and Binarization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11845 LNCS, pp. 79–88). Springer. https://doi.org/10.1007/978-3-030-33723-0_7

U-Net Based Architectures for Document Text Detection and Binarization

Abstract

Author supplied keywords

Cite

Register to see more suggestions