A Long-Text Classification Method of Chinese News Based on BERT and CNN

Xinying Chen; Peimin Cong; Shuo Lv

Journal ArticleOPEN ACCESS

A Long-Text Classification Method of Chinese News Based on BERT and CNN

IEEE Access (2022) 10 34046-34057

DOI: 10.1109/ACCESS.2022.3162614

126Citations

89Readers

Abstract

Text Classification is an important research area in natural language processing (NLP) that has received a considerable amount of scholarly attention in recent years. However, real Chinese online news is characterized by long text, a large amount of information and complex structure, which also reduces the accuracy of Chinese long text classification as a result. To improve the accuracy of long text classification of Chinese news, we propose a BERT-based local feature convolutional network (LFCN) model including four novel modules. First, to address the limitation of Bidirectional Encoder Representations from Transformers (BERT) on the length of the max input sequence, we propose a named Dynamic LEAD-n (DLn) method to extract short texts within the long text based on the traditional LEAD digest algorithm. In Text-Text Encoder (TTE) module, we use BERT pretrained language model to complete the sentence-level feature vector representation of a news text and to capture global features by using the attention mechanism to identify correlated words in text. After that, we propose a CNN-based local feature convolution (LFC) module to capture local features in text, such as key phrases. Finally, the feature vectors generated by the different operations over several different periods are fused and used to predict the category of a news text. Experimental results show that the new method further improves the accuracy of long text classification of Chinese news.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, X., Cong, P., & Lv, S. (2022). A Long-Text Classification Method of Chinese News Based on BERT and CNN. IEEE Access, 10, 34046–34057. https://doi.org/10.1109/ACCESS.2022.3162614

A Long-Text Classification Method of Chinese News Based on BERT and CNN

Abstract

Author supplied keywords

Cite

Register to see more suggestions