We present several methods for stemming and lemmatization based on discriminative string transduction. We exploit the paradigmatic regularity of semi-structured inflection tables to identify stems in an unsupervised manner with over 85% accuracy. Experiments on English, Dutch and German show that our stemmers substantially outperform Snowball and Morfessor, and approach the accuracy of a supervised model. Furthermore, the generated stems are more consistent than those annotated by experts. Our direct lemmatization model is more accurate than Morfette and Lemming on most datasets. Finally, we test our methods on the data from the shared task on morphological reinflection.
CITATION STYLE
Nicolai, G., & Kondrak, G. (2016). Leveraging inflection tables for stemming and lemmatization. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (Vol. 2, pp. 1138–1147). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-1108
Mendeley helps you to discover research relevant for your work.