Predicting Word Embeddings Variability

17Citations
Citations of this article
76Readers
Mendeley users who have this article in their library.

Abstract

Neural word embeddings models (such as those built with word2vec) are known to have stability problems: when retraining a model with the exact same hyperparameters, words neighborhoods may change. We propose a method to estimate such variation, based on the overlap of neighbors of a given word in two models trained with identical hyperparameters. We show that this inherent variation is not negligible, and that it does not affect every word in the same way. We examine the influence of several features that are intrinsic to a word, corpus or embedding model and provide a methodology that can predict the variability (and as such, reliability) of a word representation in a semantic vector space.

Cite

CITATION STYLE

APA

Pierrejean, B., & Tanguy, L. (2018). Predicting Word Embeddings Variability. In NAACL HLT 2018 - Lexical and Computational Semantics, SEM 2018, Proceedings of the 7th Conference (pp. 154–159). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s18-2019

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free