Cross-lingual information retrieval system for Indian languages

Jagadeesh Jagarlamudi; A. Kumaran

Conference Proceedings

Cross-lingual information retrieval system for Indian languages

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5152 LNCS 80-87

DOI: 10.1007/978-3-540-85760-0_10

25Citations

87Readers

Get full text

Abstract

This paper describes our attempt to build a Cross-Lingual Information Retrieval (CLIR) system as a part of the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF competition. In this track, the task required retrieval of relevant documents from an English corpus in response to a query expressed in different Indian languages including Hindi, Tamil, Telugu, Bengali and Marathi. Groups participating in this track were required to submit a English to English monolingual run and a Hindi to English bilingual run with optional runs in rest of the languages. Our submission consisted of a monolingual English run and a Hindi to English cross-lingual run. We used a word alignment table that was learnt by a Statistical Machine Translation (SMT) system trained on aligned parallel sentences, to map a query in the source language into an equivalent query in the language of the document collection. The relevant documents are then retrieved using a Language Modeling based retrieval algorithm. On the CLEF 2007 data set, our official cross-lingual performance was 54.4% of the monolingual performance and in the post submission experiments we found that it can be significantly improved up to 76.3%. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Jagarlamudi, J., & Kumaran, A. (2008). Cross-lingual information retrieval system for Indian languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5152 LNCS, pp. 80–87). Springer Verlag. https://doi.org/10.1007/978-3-540-85760-0_10

Cross-lingual information retrieval system for Indian languages

Abstract

Cite

Register to see more suggestions