Exploring new languages with HAIRCUT at CLEF 2005

Paul McNamee

Conference Proceedings

Exploring new languages with HAIRCUT at CLEF 2005

McNamee P

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4022 LNCS 155-164

DOI: 10.1007/11878773_17

2Citations

6Readers

Get full text

Abstract

JHU/APL has long espoused the use of language-neutral methods for cross-language information retrieval. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. We undertook our first investigations in the Bulgarian and Hungarian languages. In our bilingual experiments we used several non-traditional CLEF query languages such as Greek, Hungarian, and Indonesian, in addition to several western European languages. We found that character n-grams remain an attractive option for representing documents and queries in these new languages. In our monolingual tests n-grams were more effective than unnormalized words for retrieval in Bulgarian (+30%) and Hungarian (+63%). Our bilingual runs made use of subword translation, statistical translation of character n-grams using aligned corpora, when parallel data were available, and web-based machine translation, when no suitable data could be found. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

McNamee, P. (2006). Exploring new languages with HAIRCUT at CLEF 2005. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4022 LNCS, pp. 155–164). Springer Verlag. https://doi.org/10.1007/11878773_17

Exploring new languages with HAIRCUT at CLEF 2005

Abstract

Cite

Register to see more suggestions