Intrinsic evaluation of lithuanian word embeddings using wordnet

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Neural network-based word embeddings –outperforming traditional approaches in the various Natural Language Processing tasks – have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of ∼234 thousand running words and produced several word embedding models: based on the continuous bag-of-words and skip-gram architectures; softmax and negative sampling training algorithms; varied number of dimensions (100, 300, 500, and 1,000). Word embeddings were evaluated using the Lithuanian WordNet as the resource for the synonym search. We have determined the superiority of the continuous bag-of-words over the skip-gram architecture; while the training algorithm and dimensionality showed no significant impact on the results. Better results were achieved with the continuous bag-of-words, negative sampling and 1,000 dimensions.

Cite

CITATION STYLE

APA

Kapočiūtė-Dzikienė, J., & Damaševičius, R. (2019). Intrinsic evaluation of lithuanian word embeddings using wordnet. In Advances in Intelligent Systems and Computing (Vol. 764, pp. 394–404). Springer Verlag. https://doi.org/10.1007/978-3-319-91189-2_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free