Optimal stem identification in presence of suffix list

N. Vasudevan; Pushpak Bhattacharyya

Conference Proceedings

Optimal stem identification in presence of suffix list

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7181 LNCS(PART 1) 92-103

DOI: 10.1007/978-3-642-28604-9_8

1Citations

5Readers

Get full text

Abstract

Stemming is considered crucial in many NLP and IR applications. In the absence of any linguistic information, stemming is a challenging task. Stemming of words using suffixes of a language as linguistic information is in comparison an easier problem. In this work we considered stemming as a process of obtaining minimum number of lexicon from an unannotated corpus by using a suffix set. We proved that the exact lexicon reduction problem is NP-hard and came up with a polynomial time approximation. One probabilistic model that minimizes the stem distributional entropy is also proposed for stemming. Performances of these models are analyzed using an unannotated corpus and a suffix set of Malayalam, a morphologically rich language of India belonging to the Dravidian family. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Vasudevan, N., & Bhattacharyya, P. (2012). Optimal stem identification in presence of suffix list. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 92–103). https://doi.org/10.1007/978-3-642-28604-9_8

Optimal stem identification in presence of suffix list

Abstract

Cite

Register to see more suggestions