Application of variable length N-gram vectors to monolingual and bilingual information retrieval

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks at CLEF: monolingual (Russian) and bilingual (Spanish-to-English) information retrieval. Our main goal was to test the application to IR of a modified version of the n-gram vector space model (codenamed blindLight). This new approach has been successfully applied to other NLP tasks such as language identification or text summarization and the results achieved at CLEF 2004, although not exceptional, are encouraging. There are two major differences between the blindLight approach and classical techniques: (1) relative frequencies are no longer used as vector weights but are replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques, not so computationally expensive. In order to perform cross-language IR we have developed a naive n-gram pseudo-translator similar to those described by McNamee and Mayfield or Pirkola et al. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Gayo-Avello, D., Álvarez-Gutiérrez, D., & Gayo-Avello, J. (2005). Application of variable length N-gram vectors to monolingual and bilingual information retrieval. In Lecture Notes in Computer Science (Vol. 3491, pp. 73–82). Springer Verlag. https://doi.org/10.1007/11519645_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free