Post-processing is a crucial step in improving the performance of OCR process. In this paper, we present a novel approach which explores a modified way of candidate generating and candidate scoring at character level as well as word level. These features are combined with some important features suggested by related work for ranking candidates in a regression model. The experimental results show that our approach has comparable results with the top performing approaches in the Post-OCR text correction competition ICDAR 2017.
CITATION STYLE
Nguyen, T. T. H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N. V. (2018). Adaptive edit-distance and regression approach for Post-OCR text correction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11279 LNCS, pp. 278–289). Springer Verlag. https://doi.org/10.1007/978-3-030-04257-8_29
Mendeley helps you to discover research relevant for your work.