Frequent case generation in ad hoc retrieval of three Indian languages - Bengali, Gujarati and Marathi

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents results of a generative method for the management of morphological variation of query keywords in Bengali, Gujarati and Marathi. The method is called Frequent Case Generation (FCG). It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. We participated in the ad hoc task at FIRE 2011 and applied the FCG method on monolingual Bengali, Gujarati and Marathi test collections. Our evaluation was carried out with title and description fields of test topics, and the Lemur search engine. We used plain unprocessed word index as the baseline, and n-gramming and stemming as competing methods. The evaluation results show 30%, 16% and 70% relative mean average precision improvements for Bengali, Gujarati and Marathi respectively when comparing the FCG method to plain words. The method shows competitive performance in comparison to n-gramming and stemming.

Cite

CITATION STYLE

APA

Paik, J. H., Kettunen, K., Pal, D., & Järvelin, K. (2013). Frequent case generation in ad hoc retrieval of three Indian languages - Bengali, Gujarati and Marathi. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7536 LNCS, pp. 38–50). https://doi.org/10.1007/978-3-642-40087-2_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free