Low cost correction of OCR errors using learning in a multi-engine environment

Ahmad Abdulkader; Matthew R. Casey

Conference Proceedings

Low cost correction of OCR errors using learning in a multi-engine environment

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2009) 576-580

DOI: 10.1109/ICDAR.2009.242

29Citations

46Readers

Get full text

Abstract

We propose a low cost method for the correction of the output of OCR engines through the use of human labor. The method employs an error estimator neural network that learns to assess the error probability of every word from ground-truth data. The error estimator uses features computed from the outputs of multiple OCR engines. The output probability error estimate is used to decide which words are inspected by humans. The error estimator is trained to optimize the area under the word error ROC leading to an improved efficiency of the human correction process. A significant reduction in cost is achieved by clustering similar words together during the correction process. We also show how active learning techniques are used to further improve the efficiency of the error estimator. © 2009 IEEE.

Author supplied keywords

Cite

CITATION STYLE

APA

Abdulkader, A., & Casey, M. R. (2009). Low cost correction of OCR errors using learning in a multi-engine environment. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (pp. 576–580). https://doi.org/10.1109/ICDAR.2009.242

Low cost correction of OCR errors using learning in a multi-engine environment

Abstract

Author supplied keywords

Cite

Register to see more suggestions