Continuous word embedding fusion via spectral decomposition

3Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

Word embeddings have become a mainstream tool in statistical natural language processing. Practitioners often use pre-trained word vectors, which were trained on large generic text corpora, and which are readily available on the web. However, pre-trained word vectors oftentimes lack important words from specific domains. It is therefore often desirable to extend the vocabulary and embed new words into a set of pre-trained word vectors. In this paper, we present an efficient method for including new words from a specialized corpus, containing new words, into pre-trained generic word embeddings. We build on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task. Experiments on several domain-specific corpora with specialized vocabularies demonstrate that our method is able to embed the new words efficiently into the original embedding space. Compared to competing methods, our method is faster, parameter-free, and deterministic.

Cite

CITATION STYLE

APA

Fu, T., Zhang, C., & Mandt, S. (2018). Continuous word embedding fusion via spectral decomposition. In CoNLL 2018 - 22nd Conference on Computational Natural Language Learning, Proceedings (pp. 11–20). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k18-1002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free