An improved stemming approach using HMM for a highly inflectional language

Navanath Saharia; Kishori M. Konwar; Utpal Sharma; Jugal K. Kalita

Conference Proceedings

An improved stemming approach using HMM for a highly inflectional language

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7816 LNCS(PART 1) 164-173

DOI: 10.1007/978-3-642-37247-6_14

13Citations

17Readers

Get full text

Abstract

Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we focus on the problem of stemming texts in Assamese, a low resource Indic language spoken in the North-Eastern part of India by approximately 30 million people. Stemming is hard in Assamese due to the common appearance of single letter suffixes as morphological inflections. More than 50% of the inflections in Assamese appear as single letter suffixes. Such single letter morphological inflections cause ambiguity when predicting underlying root word. Therefore, we propose a new method that combines a rule based algorithm for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes. The combined approach can predict morphologically inflected words with 92% accuracy. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Saharia, N., Konwar, K. M., Sharma, U., & Kalita, J. K. (2013). An improved stemming approach using HMM for a highly inflectional language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7816 LNCS, pp. 164–173). https://doi.org/10.1007/978-3-642-37247-6_14

An improved stemming approach using HMM for a highly inflectional language

Abstract

Cite

Register to see more suggestions