A comparative study on the classification performance of machine learning models for academic full texts

Haotian Hu; Sanhong Deng; Haoxiang Lu; Dongbo Wang

Conference Proceedings

A comparative study on the classification performance of machine learning models for academic full texts

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12051 LNCS 713-737

DOI: 10.1007/978-3-030-43687-2_61

3Citations

2Readers

Get full text

Abstract

[Objectives] The study aims to compare the classification performance of various machine learning models, explore the classification effects of traditional machine learning models and deep learning models, solve the problem of missing category information of chapter structure in academic literature, promote the retrieval of the content of the specified chapter structure in the academic literature, and automatically extract and customize the formation of specific text services. [Methodology] 31,888 academic articles in the journal “PLOS ONE” were selected. After data cleaning and segmentation, a text classification corpus containing 313,952 chapter structure category information was constructed. Based on traditional machine learning models NB, SVM, CRF, and the deep learning model RNN model group, Bi-LSTM model group, IDCNN model group, BERT model group, a total of 17 machine learning models were used to carry out chapter structure division experiment. [Results] Among the classification tasks, the BERT-Bi-LSTM-CRF model has the best classification performance, with an average F value of 71.18%, which is 0.51% and 3.31% higher than the second CRF and the third Bi-LSTM-CRF, respectively. For deep learning models, the use of BERT for text representation is better than word2vec. Adding the Attention mechanism and replacing the Softmax layer with the CRF layer can achieve better classification results. In addition, the online version of the Chapter Structure Recognition Presentation and Application Platform has been developed, which can visually display the overall situation of the research and the model training process, and can realize machine learning and deep learning models such as NB, SVM, CRF, Bi-LSTM, IDCNN. The models can perform online recognition application of chapter structure.

Author supplied keywords

Cite

CITATION STYLE

APA

Hu, H., Deng, S., Lu, H., & Wang, D. (2020). A comparative study on the classification performance of machine learning models for academic full texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12051 LNCS, pp. 713–737). Springer. https://doi.org/10.1007/978-3-030-43687-2_61

A comparative study on the classification performance of machine learning models for academic full texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions