Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

Yachun Tang

Journal ArticleOPEN ACCESS

Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

Tang Y

Advances in Multimedia (2022) 2022

DOI: 10.1155/2022/4414207

5Citations

5Readers

Abstract

Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE+P, and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.

Cite

CITATION STYLE

APA

Tang, Y. (2022). Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm. Advances in Multimedia, 2022. https://doi.org/10.1155/2022/4414207

Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

Abstract

Cite

Register to see more suggestions