Corpus-based technique for Improving Arabic OCR system

Ahmed Hussain Aliwy; Basheer Al-Sadawi

Journal ArticleOPEN ACCESS

Corpus-based technique for Improving Arabic OCR system

Indonesian Journal of Electrical Engineering and Computer Science (2021) 21(1) 233-241

DOI: 10.11591/ijeecs.v21.i1.pp233-241

6Citations

13Readers

Abstract

An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output.

Author supplied keywords

Cite

CITATION STYLE

APA

Aliwy, A. H., & Al-Sadawi, B. (2021). Corpus-based technique for Improving Arabic OCR system. Indonesian Journal of Electrical Engineering and Computer Science, 21(1), 233–241. https://doi.org/10.11591/ijeecs.v21.i1.pp233-241

Corpus-based technique for Improving Arabic OCR system

Abstract

Author supplied keywords

Cite

Register to see more suggestions