Utilizing vector models for automatic text lemmatization

Ladislav Gallay; MariÁn Šimko

Conference Proceedings

Utilizing vector models for automatic text lemmatization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9587 532-543

DOI: 10.1007/978-3-662-49192-8_43

6Citations

9Readers

Get full text

Abstract

In this paper we tackle the problem of lemmatization of inflectional languages. We introduce a new algorithm which utilizes vector models of words. Current approaches in this area are limited to knowing either full grammar rules or the translation matrix between the word and its basic form. However, this information is encoded in natural text. Our solution uses text corpora to build vector models of words and a small amount of user input to infer lemmas. We have evaluated our approach on the Slovak language and present interesting findings on its feasibility for real-world utilization.

Author supplied keywords

Cite

CITATION STYLE

APA

Gallay, L., & Šimko, M. (2016). Utilizing vector models for automatic text lemmatization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9587, pp. 532–543). Springer Verlag. https://doi.org/10.1007/978-3-662-49192-8_43

Utilizing vector models for automatic text lemmatization

Abstract

Author supplied keywords

Cite

Register to see more suggestions