Chinese text classification based on character-level CNN and SVM

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the rapid development of the Internet, the high dimensional text data has increased rapidly. How to build an efficient and extensible text classification algorithm has become a hot topic in the field of data mining. Aiming at the problems of high feature dimension, sparse data and long computation time in traditional SVM classification algorithm based on TF-IDF (Term Frequency-Inverse Document Frequency), we propose a novel hybrid system for Chinese text classification: CSVM, which is independent of the hand-designed features and domain knowledge. Firstly, the encoding words are done by constructing a text vocabulary of size m for the input language, and then quantize each word using 1-of-m encoding. Secondly, we exploit the CNN (Convolutional Neural Network) to extract the morphological features of character vectors from each word, and then through large scale text material training the semantic feature of each word vectors are be obtained the semantic feature of each word vectors. Finally, the text classification is carried out with the SVM multiple classifier. Testing on a text dataset with 10 categories, the experimental results show that the CSVM algorithm is more effective than other traditional Chinese text classification algorithm.

Cite

CITATION STYLE

APA

Wu, H., Li, D., & Cheng, M. (2019). Chinese text classification based on character-level CNN and SVM. In Communications in Computer and Information Science (Vol. 986, pp. 227–238). Springer Verlag. https://doi.org/10.1007/978-981-13-6473-0_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free