Revisiting N-gram based models for retrieval in degraded large collections

Javier Parapar; Ana Freire; Álvaro Barreiro

Conference Proceedings

Revisiting N-gram based models for retrieval in degraded large collections

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5478 LNCS 680-684

DOI: 10.1007/978-3-642-00958-7_66

13Citations

7Readers

Get full text

Abstract

The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n-gram based distributed model for retrieval on degraded text large collections. Evaluation was carried out with both the TREC Confusion Track and Legal Track collections showing that the presented approach outperforms in terms of effectiveness the classical term centred approach and the most of the participant systems in the TREC Confusion Track. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Parapar, J., Freire, A., & Barreiro, Á. (2009). Revisiting N-gram based models for retrieval in degraded large collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 680–684). https://doi.org/10.1007/978-3-642-00958-7_66

Revisiting N-gram based models for retrieval in degraded large collections

Abstract

Cite

Register to see more suggestions