Efficient OCR post-processing combining language, hypothesis and error models

Rafael Llobet; J. Ramon Navarro-Cerdan; Juan Carlos Perez-Cortes; Joaquim Arlandis

Conference ProceedingsOPEN ACCESS

Efficient OCR post-processing combining language, hypothesis and error models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6218 LNCS 728-737

DOI: 10.1007/978-3-642-14980-1_72

6Citations

13Readers

Abstract

In this paper, an OCR post-processing method that combines a language model, OCR hypothesis information and an error model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic Error-Correcting Language Modeling. We use Weighted Finite-State Transducers (WFSTs) to represent the language model, the complete set of OCR hypotheses interpreted as a sequence of vectors of a posteriori class probabilities, and an error model with symbol substitutions, insertions and deletions. This approach combines the practical advantages of a de-coupled (OCR + post-processor) model with the error-recovery power of a integrated model. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Llobet, R., Navarro-Cerdan, J. R., Perez-Cortes, J. C., & Arlandis, J. (2010). Efficient OCR post-processing combining language, hypothesis and error models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6218 LNCS, pp. 728–737). https://doi.org/10.1007/978-3-642-14980-1_72

Efficient OCR post-processing combining language, hypothesis and error models

Abstract

Cite

Register to see more suggestions