Research on text line segmentation of historical Tibetan documents based on the connected component analysis

Yiqun Wang; Weilan Wang; Zhenjiang Li; Yuehui Han; Xiaojuan Wang

Conference ProceedingsOPEN ACCESS

Research on text line segmentation of historical Tibetan documents based on the connected component analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11258 LNCS 74-87

DOI: 10.1007/978-3-030-03338-5_7

8Citations

2Readers

Abstract

Text line segmentation is one of the critical content in handwriting documents recognition especially in the historical documents’ analysis and recognition. Because of the low quality and the complexity of these documents (background noise, scattered character, touching components between consecutive lines), automatic text line segmentation remains to be a hot spot for researching. In this paper we propose a new method to segment the text line from the historical Tibetan scripture “kangjur” of the Beijing version on the paper by means of woodcut. This method first performs document image skew detection and correction, using projection profiles to get the baseline of text line, then the connected component is allocated to text line according to the location relationship. For some connected components, analyzing their location and sharp to assign these connected components correctly. This method using connected component instead of pixels, avoiding the noise generated by splitting characters. Experiments show that this method is effective in copes with touching text lines and promising in text line segmentation from historical Tibetan document.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., Wang, W., Li, Z., Han, Y., & Wang, X. (2018). Research on text line segmentation of historical Tibetan documents based on the connected component analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11258 LNCS, pp. 74–87). Springer Verlag. https://doi.org/10.1007/978-3-030-03338-5_7

Research on text line segmentation of historical Tibetan documents based on the connected component analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions