Semantic and Morphological Information Guided Chinese Text Classification

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently proposed models such as BERT, perform well in many text processing tasks. They get context-sensitive features, which is a good semantic for word sense disambiguation, through deeper layer and a large number of texts. But, for Chinese text classification, majority of datasets are crawled from social networking sites, these datasets are semantically complex and variable. How much data is needed to pre-train these models in order for them to grasp semantic features and understand context is a question. In this paper, we propose a novel shallow layer language model, which uses sememe information to guide model to grasp semantic information without a large number of pre-trained data. Then, we use the Chinese character representations generated from this model to do text classification. Furthermore, in order to make Chinese as easy to initialize as English, we employ convolution neural networks over Chinese strokes to get Chinese character structure initialization for our model. This model pre-trains on a part of the Chinese Wikipedia dataset, and we use the representations generated by this pre-trained model to do text classification. Experiments on text classification datasets show our model outperforms other state-of-arts models by a large margin. Also, our model is superior in terms of interpretability due to the introduction of semantic and morphological information.

Cite

CITATION STYLE

APA

Song, J., Xu, Q., Liu, W., Zu, Y., & Chen, M. (2020). Semantic and Morphological Information Guided Chinese Text Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11962 LNCS, pp. 14–26). Springer. https://doi.org/10.1007/978-3-030-37734-2_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free