Leveraging inflection tables for stemming and lemmatization

16Citations
Citations of this article
115Readers
Mendeley users who have this article in their library.

Abstract

We present several methods for stemming and lemmatization based on discriminative string transduction. We exploit the paradigmatic regularity of semi-structured inflection tables to identify stems in an unsupervised manner with over 85% accuracy. Experiments on English, Dutch and German show that our stemmers substantially outperform Snowball and Morfessor, and approach the accuracy of a supervised model. Furthermore, the generated stems are more consistent than those annotated by experts. Our direct lemmatization model is more accurate than Morfette and Lemming on most datasets. Finally, we test our methods on the data from the shared task on morphological reinflection.

Cite

CITATION STYLE

APA

Nicolai, G., & Kondrak, G. (2016). Leveraging inflection tables for stemming and lemmatization. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (Vol. 2, pp. 1138–1147). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-1108

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free