Distribution based stemmer refinement

B. L. Narayan; Sankar K. Pal

Conference ProceedingsOPEN ACCESS

Distribution based stemmer refinement

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3776 LNCS 672-677

DOI: 10.1007/11590316_108

2Citations

9Readers

Abstract

Stemming is a common preprocessing task applied to text corpora. Errors in this process may be refined either manually or based on a corpus. We describe a novel corpus-based stemming technique which models the given words as being generated from a multinomial distribution over the topics available in the corpus. A sequential hypothesis testing like procedure helps us group together distributionally similar words. This stemmer refines any given stemmer and its strength can be controlled with the help of two thresholds. A refinement based on the 20 Newsgroups data set shows that the proposed method splits equivalence classes appropriately. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Narayan, B. L., & Pal, S. K. (2005). Distribution based stemmer refinement. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3776 LNCS, pp. 672–677). https://doi.org/10.1007/11590316_108

Distribution based stemmer refinement

Abstract

Cite

Register to see more suggestions