The imbalanced data affect the accuracy of models, especially for precision and sensitivity, it makes difficult to find information on minority class. The problem is identified in the tracer study dataset Universitas Sriwijaya that has 2934 data. The label attribute is divided into several label classes, namely not tight, somewhat-tight, tight, very tight, and tightest. The number of the tightest and very tight is 27% and 38.6% of the number majority classes. In the study, the SMOTE is combined with eliminating the missing value of data to handle the imbalanced data. The method was evaluated by the classification methods KNN, ANN, and C4.5. The results of these methods show a significant increase in accuracy as a whole and a significant increase in the precision and sensitivity of minority classes. The precision and sensitivity of both the majority and minority are not too different, although the number of the minority is very less compared to the majority class. the information on minority classes can be obtained with quite high precision and sensitivity. As a conclusion, the proposed method is passably to improve accuracy and greatly affects the increase in sensitivity and precision.
CITATION STYLE
Desiani, A., Yahdin, S., Kartikasari, A., & Irmeilyana. (2021). Handling the imbalanced data with missing value elimination smote in the classification of the relevance education background with graduates employment. IAES International Journal of Artificial Intelligence, 10(2), 346–354. https://doi.org/10.11591/ijai.v10.i2.pp346-354
Mendeley helps you to discover research relevant for your work.