A database for arabic printed character recognition

Ashraf Abdelraouf; Colin A. Higgins; Mahmoud Khalil

Conference Proceedings

A database for arabic printed character recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5112 LNCS 567-578

DOI: 10.1007/978-3-540-69812-8_56

17Citations

34Readers

Get full text

Abstract

Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabic-speaking countries. In developing an OCR system for Arabic it is necessary to create a database of Arabic words. Such a database has many uses as well as in training and testing a recognition system. This paper provides a comprehensive study and analysis of Arabic words and explains how such a database was constructed. Unlike earlier studies, this paper describes a database developed using a large number of collected Arabic words (6 million). It also considers connected segments or Pieces of Arabic Words (PAWs) as well as Naked Pieces of Arabic Word (NPAWs); PAWS without diacritics. Background information concerning the Arabic language is also presented. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Abdelraouf, A., Higgins, C. A., & Khalil, M. (2008). A database for arabic printed character recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5112 LNCS, pp. 567–578). https://doi.org/10.1007/978-3-540-69812-8_56

A database for arabic printed character recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions