Morpheme level word embedding

Ruslan Galinsky; Tatiana Kovalenko; Julia Yakovleva; Andrey Filchenkov

Conference Proceedings

Morpheme level word embedding

Communications in Computer and Information Science (2018) 789 143-155

DOI: 10.1007/978-3-319-71746-3_13

2Citations

7Readers

Get full text

Abstract

Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.

Author supplied keywords

Cite

CITATION STYLE

APA

Galinsky, R., Kovalenko, T., Yakovleva, J., & Filchenkov, A. (2018). Morpheme level word embedding. In Communications in Computer and Information Science (Vol. 789, pp. 143–155). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_13

Morpheme level word embedding

Abstract

Author supplied keywords

Cite

Register to see more suggestions