Vocabulary reduction and text enrichment at WebCLEF

Franco Rojas; Héctor Jiménez-Salazar; David Pinto

Conference Proceedings

Vocabulary reduction and text enrichment at WebCLEF

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4730 LNCS 838-843

DOI: 10.1007/978-3-540-74999-8_106

1Citations

3Readers

Get full text

Abstract

Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists of the corpus vocabulary reduction. In real situations some methods of IR such as the well-known vector space model, it is necessary to reduce the term space. In this work, we have considered a vocabulary reduction process based on the selection of mid-frequency terms. Our approach enhances precision, but in order to obtain a better recall, we have conducted an enrichment process based on the addition of co-ocurrence terms. By using this approach, we have obtained an improvement of 40%, using the BiEnEs topics of the WebCLEF 2005 task. The obtained results in the current mixed monolingual task of the WebCLEF 2006 have shown that the text enrichment must be done before the vocabulary reduction process in order to get the best performance. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Rojas, F., Jiménez-Salazar, H., & Pinto, D. (2007). Vocabulary reduction and text enrichment at WebCLEF. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4730 LNCS, pp. 838–843). Springer Verlag. https://doi.org/10.1007/978-3-540-74999-8_106

Vocabulary reduction and text enrichment at WebCLEF

Abstract

Author supplied keywords

Cite

Register to see more suggestions