Segmentation of handwritten characters for digitalizing korean historical documents

Min Soo Kim; Kyu Tae Cho; Hee Kue Kwag; Jin Hyung Kim

Journal ArticleOPEN ACCESS

Segmentation of handwritten characters for digitalizing korean historical documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3163 114-124

DOI: 10.1007/978-3-540-28640-0_11

18Citations

9Readers

Abstract

The historical documents are valuable cultural heritages and sources for the study of history, social aspect and life at that time. The digitalization of historical documents aims to provide instant access to the archives for the researchers and the public, who had been endowed with limited chance due to maintenance reasons. However, most of these documents are not only written by hand in ancient Chinese characters, but also have complex page layouts. As a result, it is not easy to utilize conventional OCR(optical character recognition) system about historical documents even if OCR has received the most attention for several years as a key module in digitalization. We have been developing OCR-based digitalization system of historical documents for years. In this paper, we propose dedicated segmentation and rejection methods for OCR of Korean historical documents. Proposed recognition-based segmentation method uses geometric feature and context information with Viterbi algorithm. Rejection method uses Mahalanobis distance and posterior probability for solving out-of-class problem, especially. Some promising experimental results are reported. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Kim, M. S., Cho, K. T., Kwag, H. K., & Kim, J. H. (2004). Segmentation of handwritten characters for digitalizing korean historical documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3163, 114–124. https://doi.org/10.1007/978-3-540-28640-0_11

Segmentation of handwritten characters for digitalizing korean historical documents

Abstract

Cite

Register to see more suggestions