Maryland at FIRE 2011: Retrieval of OCR'd Bengali

Utpal Garain; David S. Doermann; Douglas W. Oard

Conference Proceedings

Maryland at FIRE 2011: Retrieval of OCR'd Bengali

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7536 LNCS 205-213

DOI: 10.1007/978-3-642-40087-2_20

1Citations

3Readers

Get full text

Abstract

In this year's Forum for Information Retrieval Evaluation (FIRE), the University of Maryland participated in the Retrieval of Indic Script OCRed Text (RISOT) task to experiment with the retrieval of Bengali script OCR'd documents. The experiments focused on evaluating a retrieval strategy motivated by recent work on Cross-Language Information Retrieval (CLIR), but which makes use of OCR error modeling rather than parallel text alignment. The approach obtains a probability distribution over substitutions for the actual query terms that possibly correspond to terms in the document representation. The results reported indicate that this is a promising way of using OCR error modeling to improve CLIR.

Cite

CITATION STYLE

APA

Garain, U., Doermann, D. S., & Oard, D. W. (2013). Maryland at FIRE 2011: Retrieval of OCR’d Bengali. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7536 LNCS, pp. 205–213). https://doi.org/10.1007/978-3-642-40087-2_20

Maryland at FIRE 2011: Retrieval of OCR'd Bengali

Abstract

Cite

Register to see more suggestions