Abstract
Neural network-based word embeddings –outperforming traditional approaches in the various Natural Language Processing tasks – have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of ∼234 thousand running words and produced several word embedding models: based on the continuous bag-of-words and skip-gram architectures; softmax and negative sampling training algorithms; varied number of dimensions (100, 300, 500, and 1,000). Word embeddings were evaluated using the Lithuanian WordNet as the resource for the synonym search. We have determined the superiority of the continuous bag-of-words over the skip-gram architecture; while the training algorithm and dimensionality showed no significant impact on the results. Better results were achieved with the continuous bag-of-words, negative sampling and 1,000 dimensions.
Author supplied keywords
Cite
CITATION STYLE
Kapočiūtė-Dzikienė, J., & Damaševičius, R. (2019). Intrinsic evaluation of lithuanian word embeddings using wordnet. In Advances in Intelligent Systems and Computing (Vol. 764, pp. 394–404). Springer Verlag. https://doi.org/10.1007/978-3-319-91189-2_39
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.