In view of the traditional classification algorithm, the problem of high feature dimension and data sparseness often occurs when text classification of short texts. This paper proposes a text feature combining neural network language model word2vec and document topic model Latent Dirichlet Allocation (LDA). Represents a matrix model. The matrix model can not only effectively represent the semantic features of the words but also convey the context features and enhance the feature expression ability of the model. The feature matrix was input into the convolutional neural network (CNN) for convolution pooling, and text classification experiments were performed. The experimental results show that the proposed matrix model has better classification effect than the traditional text classification methods based on word2vec and CNN. In the text classification accuracy rate, recall rate and F1 three evaluation indicators increased by 8.4%, 8.9% and 8.6%.
CITATION STYLE
Gao, M., Li, T., & Huang, P. (2019). Text classification research based on improved word2vec and CNN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11434 LNCS, pp. 126–135). Springer Verlag. https://doi.org/10.1007/978-3-030-17642-6_11
Mendeley helps you to discover research relevant for your work.