The Classification of Scientific Abstracts Using Text Statistical Features

Timur Ishankulov; Gleb Danilov; Konstantin Kotik; Yuriy Orlov; Mikhail Shifrin; Alexander Potapov

Conference ProceedingsOPEN ACCESS

The Classification of Scientific Abstracts Using Text Statistical Features

Studies in Health Technology and Informatics (2022) 290 263-267

DOI: 10.3233/SHTI220075

4Citations

13Readers

Abstract

Automated abstracts classification could significantly facilitate scientific literature screening. The classification of short texts could be based on their statistical properties. This research aimed to evaluate the quality of short medical abstracts classification primarily based on text statistical features. Twelve experiments with machine learning models over the sets of text features were performed on a dataset of 671 article abstracts. Each experiment was repeated 300 times to estimate the classification quality, ending up with 3600 tests total. We achieved the best result (F1 = 0.775) using a random forest machine learning model with keywords and three-dimensional Word2Vec embeddings. The classification of scientific abstracts might be implemented using straightforward and computationally inexpensive methods presented in this paper. The approach we described is expected to facilitate literature selection by researchers.

Author supplied keywords

Cite

CITATION STYLE

APA

Ishankulov, T., Danilov, G., Kotik, K., Orlov, Y., Shifrin, M., & Potapov, A. (2022). The Classification of Scientific Abstracts Using Text Statistical Features. In Studies in Health Technology and Informatics (Vol. 290, pp. 263–267). IOS Press BV. https://doi.org/10.3233/SHTI220075

The Classification of Scientific Abstracts Using Text Statistical Features

Abstract

Author supplied keywords

Cite

Register to see more suggestions