In this paper we tackle the problem of lemmatization of inflectional languages. We introduce a new algorithm which utilizes vector models of words. Current approaches in this area are limited to knowing either full grammar rules or the translation matrix between the word and its basic form. However, this information is encoded in natural text. Our solution uses text corpora to build vector models of words and a small amount of user input to infer lemmas. We have evaluated our approach on the Slovak language and present interesting findings on its feasibility for real-world utilization.
CITATION STYLE
Gallay, L., & Šimko, M. (2016). Utilizing vector models for automatic text lemmatization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9587, pp. 532–543). Springer Verlag. https://doi.org/10.1007/978-3-662-49192-8_43
Mendeley helps you to discover research relevant for your work.