Efficient OCR post-processing combining language, hypothesis and error models

6Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, an OCR post-processing method that combines a language model, OCR hypothesis information and an error model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic Error-Correcting Language Modeling. We use Weighted Finite-State Transducers (WFSTs) to represent the language model, the complete set of OCR hypotheses interpreted as a sequence of vectors of a posteriori class probabilities, and an error model with symbol substitutions, insertions and deletions. This approach combines the practical advantages of a de-coupled (OCR + post-processor) model with the error-recovery power of a integrated model. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Llobet, R., Navarro-Cerdan, J. R., Perez-Cortes, J. C., & Arlandis, J. (2010). Efficient OCR post-processing combining language, hypothesis and error models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6218 LNCS, pp. 728–737). https://doi.org/10.1007/978-3-642-14980-1_72

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free