A semantic relation preserved word embedding reuse method

Xinchun Li; Dechuan Zhan

Journal ArticleOPEN ACCESS

A semantic relation preserved word embedding reuse method

Scientia Sinica Informationis (2020) 50(6) 813-823

DOI: 10.1360/SSI-2019-0284

2Citations

5Readers

Abstract

When deep learning is applied to natural language processing, a word embedding layer can improve task performance significantly due to the semantic information expressed in word vectors. Word embeddings can be optimized end-to-end with the whole framework. However, considering the number of parameters in a word embedding layer, in tasks with a small corpus, the training set can easily be overfitted. To solve this problem, pretrained embeddings obtained from a much larger corpus will be utilized to boost the current model performance. This paper summarizes several methods to reuse pretrained word embeddings. In addition, as corpus topics change, new words will appear for a given task, and their corresponding embeddings cannot be obtained from pretrained vectors. Therefore, to reuse word embeddings, we propose a semantic relation preserved word embedding reuse method. The proposed method first learns word relations from the current corpus. Then, pretrained word embeddings are utilized to help generate embeddings for new observed words. Experimental results verify the effectiveness of the proposed method.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, X., & Zhan, D. (2020). A semantic relation preserved word embedding reuse method. Scientia Sinica Informationis, 50(6), 813–823. https://doi.org/10.1360/SSI-2019-0284

A semantic relation preserved word embedding reuse method

Abstract

Author supplied keywords

Cite

Register to see more suggestions