Chinese text classification based on character-level CNN and SVM

Huaiguang Wu; Daiyi Li; Ming Cheng

Conference Proceedings

Chinese text classification based on character-level CNN and SVM

Communications in Computer and Information Science (2019) 986 227-238

DOI: 10.1007/978-981-13-6473-0_20

3Citations

5Readers

Get full text

Abstract

With the rapid development of the Internet, the high dimensional text data has increased rapidly. How to build an efficient and extensible text classification algorithm has become a hot topic in the field of data mining. Aiming at the problems of high feature dimension, sparse data and long computation time in traditional SVM classification algorithm based on TF-IDF (Term Frequency-Inverse Document Frequency), we propose a novel hybrid system for Chinese text classification: CSVM, which is independent of the hand-designed features and domain knowledge. Firstly, the encoding words are done by constructing a text vocabulary of size m for the input language, and then quantize each word using 1-of-m encoding. Secondly, we exploit the CNN (Convolutional Neural Network) to extract the morphological features of character vectors from each word, and then through large scale text material training the semantic feature of each word vectors are be obtained the semantic feature of each word vectors. Finally, the text classification is carried out with the SVM multiple classifier. Testing on a text dataset with 10 categories, the experimental results show that the CSVM algorithm is more effective than other traditional Chinese text classification algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, H., Li, D., & Cheng, M. (2019). Chinese text classification based on character-level CNN and SVM. In Communications in Computer and Information Science (Vol. 986, pp. 227–238). Springer Verlag. https://doi.org/10.1007/978-981-13-6473-0_20

Chinese text classification based on character-level CNN and SVM

Abstract

Author supplied keywords

Cite

Register to see more suggestions