Intrinsic evaluation of lithuanian word embeddings using wordnet

Jurgita Kapočiūtė-Dzikienė; Robertas Damaševičius

Conference Proceedings

Intrinsic evaluation of lithuanian word embeddings using wordnet

Advances in Intelligent Systems and Computing (2019) 764 394-404

DOI: 10.1007/978-3-319-91189-2_39

7Citations

4Readers

Get full text

Abstract

Neural network-based word embeddings –outperforming traditional approaches in the various Natural Language Processing tasks – have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of ∼234 thousand running words and produced several word embedding models: based on the continuous bag-of-words and skip-gram architectures; softmax and negative sampling training algorithms; varied number of dimensions (100, 300, 500, and 1,000). Word embeddings were evaluated using the Lithuanian WordNet as the resource for the synonym search. We have determined the superiority of the continuous bag-of-words over the skip-gram architecture; while the training algorithm and dimensionality showed no significant impact on the results. Better results were achieved with the continuous bag-of-words, negative sampling and 1,000 dimensions.

Author supplied keywords

Cite

CITATION STYLE

APA

Kapočiūtė-Dzikienė, J., & Damaševičius, R. (2019). Intrinsic evaluation of lithuanian word embeddings using wordnet. In Advances in Intelligent Systems and Computing (Vol. 764, pp. 394–404). Springer Verlag. https://doi.org/10.1007/978-3-319-91189-2_39

Intrinsic evaluation of lithuanian word embeddings using wordnet

Abstract

Author supplied keywords

Cite

Register to see more suggestions