Coronavirus, also known as COVID-19, was first detected in Wuhan, China, in December 2019. It is a family of viruses ranging from the common cold to severe acute respiratory syndrome (SARS). The symptoms of such a virus are similar to those of a cold or seasonal allergies. Like other respiratory viruses, it is mainly transmitted through airborne droplets when coughing or sneezing. Therefore, the recognition of COVID-19 requires careful laboratory analysis, and the reduction of recognition resources is a major challenge. On 11 March, 2020, the World Health Organization (WHO) declared COVID-19, caused by SARS-CoV-2, a pandemic, as there had been an exponential increase in cases worldwide, and demand for intensive beds and related structures had far exceeded existing capacity. The first examples of this are the regions of Italy. Brazil registered the first case of SARS-CoV-2 on 02/26/2020. Transmission of the virus in this country shifted very quickly from imported cases to local and, finally, community missions, with the Brazilian federal government announcing national community transmission on 03/20/2020. As of March 23, in the state of São Paulo with a population of about 12 million people, where the Israelita Albert Einstein Hospital is located, 477 cases of the disease and 30 related deaths were registered, and on March 27, there were already 1223 cases of COVID-19 with 68 concomitant deaths. To slow the spread of the virus in the state of São Paulo, quarantines and social distancing measures were introduced. One of the motivations for this challenge is the fact that, in the context of an extensive healthcare system with the possible limitation of SARS-CoV-2 testing, it is not practical to test every case, and test results can only be used in testing the target subpopulation. The study objective is to build a model based on machine learning that can predict the detection of SARS-CoV-2 from medical data. For this, various classification models of machine learning are compared, and the best one to predict coronaviruses is determined. The comparison is based on individuals in class 1, i.e., those with a positive test. Therefore, it is required to determine the machine learning model with the best response and F1 score for class 1. Materials and Methods . An open-source data set from the Israelita Albert Einstein Hospital in São Paulo, Brazil, was taken as a basis. The following machine learning models were used for the study: RandomForests (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT) and AdaBoost (AB), as well as the 10-time cross-validation technique. Some machine learning performance measures, such as accuracy, recall, and F1 score were evaluated. Results . Out of a total of 5,644 people tested during the COVID-19 pandemic, 5,086 people tested negative and 558 people tested positive. At the same time, support for machine vectors showed the best results in detecting coronavirus with a recall of 75 % and an F1 score of 60 % compared to models: Random drill, KNN, LR, AB, and DT. Discussion and Conclusions . It was found that when using AB algorithms, greater accuracy is achieved, but the stability of the LSVM algorithm is higher. Therefore, it can be recommended as a useful tool for detecting COVID-19.
CITATION STYLE
Amos, B. K., Smirnov, I. V., & Hermann, M. M. (2022). Comparison of machine learning models for coronavirus prediction. Advanced Engineering Research, 22(1), 67–75. https://doi.org/10.23947/2687-1653-2022-22-1-67-75
Mendeley helps you to discover research relevant for your work.