A comparative study on the classification performance of machine learning models for academic full texts

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

[Objectives] The study aims to compare the classification performance of various machine learning models, explore the classification effects of traditional machine learning models and deep learning models, solve the problem of missing category information of chapter structure in academic literature, promote the retrieval of the content of the specified chapter structure in the academic literature, and automatically extract and customize the formation of specific text services. [Methodology] 31,888 academic articles in the journal “PLOS ONE” were selected. After data cleaning and segmentation, a text classification corpus containing 313,952 chapter structure category information was constructed. Based on traditional machine learning models NB, SVM, CRF, and the deep learning model RNN model group, Bi-LSTM model group, IDCNN model group, BERT model group, a total of 17 machine learning models were used to carry out chapter structure division experiment. [Results] Among the classification tasks, the BERT-Bi-LSTM-CRF model has the best classification performance, with an average F value of 71.18%, which is 0.51% and 3.31% higher than the second CRF and the third Bi-LSTM-CRF, respectively. For deep learning models, the use of BERT for text representation is better than word2vec. Adding the Attention mechanism and replacing the Softmax layer with the CRF layer can achieve better classification results. In addition, the online version of the Chapter Structure Recognition Presentation and Application Platform has been developed, which can visually display the overall situation of the research and the model training process, and can realize machine learning and deep learning models such as NB, SVM, CRF, Bi-LSTM, IDCNN. The models can perform online recognition application of chapter structure.

Cite

CITATION STYLE

APA

Hu, H., Deng, S., Lu, H., & Wang, D. (2020). A comparative study on the classification performance of machine learning models for academic full texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12051 LNCS, pp. 713–737). Springer. https://doi.org/10.1007/978-3-030-43687-2_61

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free