Continuous word embedding fusion via spectral decomposition

Tianfan Fu; Cheng Zhang; Stephan Mandt

Conference ProceedingsOPEN ACCESS

Continuous word embedding fusion via spectral decomposition

CoNLL 2018 - 22nd Conference on Computational Natural Language Learning, Proceedings (2018) 11-20

DOI: 10.18653/v1/k18-1002

3Citations

75Readers

Abstract

Word embeddings have become a mainstream tool in statistical natural language processing. Practitioners often use pre-trained word vectors, which were trained on large generic text corpora, and which are readily available on the web. However, pre-trained word vectors oftentimes lack important words from specific domains. It is therefore often desirable to extend the vocabulary and embed new words into a set of pre-trained word vectors. In this paper, we present an efficient method for including new words from a specialized corpus, containing new words, into pre-trained generic word embeddings. We build on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task. Experiments on several domain-specific corpora with specialized vocabularies demonstrate that our method is able to embed the new words efficiently into the original embedding space. Compared to competing methods, our method is faster, parameter-free, and deterministic.

Cite

CITATION STYLE

APA

Fu, T., Zhang, C., & Mandt, S. (2018). Continuous word embedding fusion via spectral decomposition. In CoNLL 2018 - 22nd Conference on Computational Natural Language Learning, Proceedings (pp. 11–20). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k18-1002

Continuous word embedding fusion via spectral decomposition

Abstract

Cite

Register to see more suggestions