Conventional word embeddings are trained with specific criteria (e.g., based on language modeling or co-occurrence) inside a single information source, disregarding the opportunity for further calibration using external knowledge. This paper presents a unified framework that leverages pre-learned or external priors, in the form of a regularizer, for enhancing conventional language model-based embedding learning. We consider two types of regularizers. The first type is derived from topic distribution by running latent Dirichlet allocation on unlabeled data. The second type is based on dictionaries that are created with human annotation efforts. To effectively learn with the regularizers, we propose a novel data structure, trajectory softmax, in this paper. The resulting embeddings are evaluated by word similarity and sentiment classification. Experimental results show that our learning framework with regularization from prior knowledge improves embedding quality across multiple datasets, compared to a diverse collection of baseline methods.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Song, Y., Lee, C. J., & Xia, F. (2017). Learning word representations with regularization from prior knowledge. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 143–152). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k17-1016