Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.
CITATION STYLE
Galinsky, R., Kovalenko, T., Yakovleva, J., & Filchenkov, A. (2018). Morpheme level word embedding. In Communications in Computer and Information Science (Vol. 789, pp. 143–155). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_13
Mendeley helps you to discover research relevant for your work.