Information retrieval with Hindi, Bengali, and Marathi languages: Evaluation and analysis

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Our first objective in participating in FIRE evaluation campaigns is to analyze the retrieval effectiveness of various indexing and search strategies when dealing with corpora written in Hindi, Bengali and Marathi languages. As a second goal, we have developed new and more aggressive stemming strategies for both Marathi and Hindi languages during this second campaign. We have compared their retrieval effectiveness with both light stemming strategy and n-gram language-independent approach. As another languageindependent indexing strategy, we have evaluated the trunc-n method in which the indexing term is formed by considering only the first n letters of each word. To evaluate these solutions we have used various IR models including models derived from Divergence from Randomness (DFR), Language Model (LM) as well as Okapi, or the classical tf idf vector-processing approach. For the three studied languages, our experiments tend to show that IR models derived from Divergence from Randomness (DFR) paradigm tend to produce the best overall results. For these languages, our various experiments demonstrate also that either an aggressive stemming procedure or the trunc-n indexing approach produces better retrieval effectiveness when compared to other word-based or n-gram language-independent approaches. Applying the Z-score as data fusion operator after a blind-query expansion tends also to improve the MAP of the merged run over the best single IR system.

Cite

CITATION STYLE

APA

Savoy, J., Dolamic, L., & Akasereh, M. (2013). Information retrieval with Hindi, Bengali, and Marathi languages: Evaluation and analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7536 LNCS, pp. 334–352). https://doi.org/10.1007/978-3-642-40087-2_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free