In this paper, we present our work in the RISOT track of FIRE 2011. Here, we describe an error modeling technique for OCR errors in an Indic script. Based on the error model, we apply a two-fold error correction method on the OCRed corpus. First, we correct the corpus by correction with full confidence and correction without full confidence approaches. Finally, we use query expansion for error correction. We have achieved retrieval results which are significantly better than the baseline and the difference between our best result and the original text run is not significant.
CITATION STYLE
Ghosh, K., & Parui, S. K. (2013). Retrieval from OCR text: RISOT track. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7536 LNCS, pp. 214–226). https://doi.org/10.1007/978-3-642-40087-2_21
Mendeley helps you to discover research relevant for your work.