A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, F1 value, Ma_F and Mi_F are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.

Author supplied keywords

Cite

CITATION STYLE

APA

Pan, L., Lim, W. H., & Gan, Y. (2023). A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM. Electronics (Switzerland), 12(7). https://doi.org/10.3390/electronics12071531

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free