Corpus-based technique for Improving Arabic OCR system

6Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output.

Cite

CITATION STYLE

APA

Aliwy, A. H., & Al-Sadawi, B. (2021). Corpus-based technique for Improving Arabic OCR system. Indonesian Journal of Electrical Engineering and Computer Science, 21(1), 233–241. https://doi.org/10.11591/ijeecs.v21.i1.pp233-241

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free