The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n-gram based distributed model for retrieval on degraded text large collections. Evaluation was carried out with both the TREC Confusion Track and Legal Track collections showing that the presented approach outperforms in terms of effectiveness the classical term centred approach and the most of the participant systems in the TREC Confusion Track. © Springer-Verlag Berlin Heidelberg 2009.
CITATION STYLE
Parapar, J., Freire, A., & Barreiro, Á. (2009). Revisiting N-gram based models for retrieval in degraded large collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 680–684). https://doi.org/10.1007/978-3-642-00958-7_66
Mendeley helps you to discover research relevant for your work.