In this paper, an OCR post-processing method that combines a language model, OCR hypothesis information and an error model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic Error-Correcting Language Modeling. We use Weighted Finite-State Transducers (WFSTs) to represent the language model, the complete set of OCR hypotheses interpreted as a sequence of vectors of a posteriori class probabilities, and an error model with symbol substitutions, insertions and deletions. This approach combines the practical advantages of a de-coupled (OCR + post-processor) model with the error-recovery power of a integrated model. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Llobet, R., Navarro-Cerdan, J. R., Perez-Cortes, J. C., & Arlandis, J. (2010). Efficient OCR post-processing combining language, hypothesis and error models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6218 LNCS, pp. 728–737). https://doi.org/10.1007/978-3-642-14980-1_72
Mendeley helps you to discover research relevant for your work.