Text classification research based on improved word2vec and CNN

Mengyuan Gao; Tinghui Li; Peifang Huang

Conference Proceedings

Text classification research based on improved word2vec and CNN

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11434 LNCS 126-135

DOI: 10.1007/978-3-030-17642-6_11

11Citations

27Readers

Get full text

Abstract

In view of the traditional classification algorithm, the problem of high feature dimension and data sparseness often occurs when text classification of short texts. This paper proposes a text feature combining neural network language model word2vec and document topic model Latent Dirichlet Allocation (LDA). Represents a matrix model. The matrix model can not only effectively represent the semantic features of the words but also convey the context features and enhance the feature expression ability of the model. The feature matrix was input into the convolutional neural network (CNN) for convolution pooling, and text classification experiments were performed. The experimental results show that the proposed matrix model has better classification effect than the traditional text classification methods based on word2vec and CNN. In the text classification accuracy rate, recall rate and F1 three evaluation indicators increased by 8.4%, 8.9% and 8.6%.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, M., Li, T., & Huang, P. (2019). Text classification research based on improved word2vec and CNN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11434 LNCS, pp. 126–135). Springer Verlag. https://doi.org/10.1007/978-3-030-17642-6_11

Text classification research based on improved word2vec and CNN

Abstract

Author supplied keywords

Cite

Register to see more suggestions