Abstract
Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the eighth century. Over 3 million books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis. The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts. Thus there has been a great deal of interest in using machine learning to automatically recognize these historical texts and transcribe them into modern Japanese characters. Our proposed model KuroNet (which builds on Clanuwat et al. in International conference on document analysis and recognition (ICDAR), 2019) outperforms other model for Kuzushiji recognition. In this paper, KuroNet achieves higher accuracy while still recognizing entire pages of text using the residual U-Net architecture from adding more regularization. We also explore areas where our system is limited and suggests directions for future work.
Author supplied keywords
Cite
CITATION STYLE
Lamb, A., Clanuwat, T., & Kitamoto, A. (2020). KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition. SN Computer Science, 1(3). https://doi.org/10.1007/s42979-020-00186-z
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.