This chapter deals with the automated recognition of Asian scripts and focuses primarily on text belonging to two script families, viz., oriental scripts such as Chinese, Japanese, and Korean (CJK) and scripts from the Indian subcontinent (Indic) such as Devanagari, Bangla, and the scripts of South India. Since these scripts have attracted the greatest interest from the document analysis community and cover most of the issues potentially encountered in the recognition of Asian scripts, application to other Asian scripts should primarily be a matter of implementation. Specific challenges encountered in the OCR of CJK and Indic scripts are due to the large number of character classes and the resultant high probability of confusions between similar character shapes for a machine reading system. This has led to a greater role being played by language models and other post-processing techniques in the development of successful OCR systems for Asian scripts.
CITATION STYLE
Setlur, S., & Shi, Z. (2014). Asian character recognition. In Handbook of Document Image Processing and Recognition (pp. 459–486). Springer London. https://doi.org/10.1007/978-0-85729-859-1_14
Mendeley helps you to discover research relevant for your work.