Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

4Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.

Abstract

Overcoming the predominant analogical models in face-toface education takes on a special connotation within the e-learning field. The present research contributed in reducing this gap through the development of a predictive model regarding the dropping out of online graduate studies from two universities in the Ibero-American region, using machine learning tools for decision making. In this sense, unlike what happens in a face-to-face approach, the significant variables were identified only with the academic setting in general, and timeliness in particular, excluding the socio-demographic aspects of a student. In line with the Institution’s strategy, priority was given to sensitivity or recall, and to adopting the seldom used but effective technique of optimal probability threshold adjustment as opposed to other traditional techniques for processing unbalanced data. In this context, the classifier optimizations were: Logistic Regression, Random Forests and Neural Networks, together with different techniques, attributes, and resampling algorithms (SMOTE, SMOTE SVM, ADASYN and Hyperparameters), provided thresholds between 0.454 and 0.669, sufficiently valid to reach a recall value of 0.75 for the Neural Network classifier with SMOTE_SVM, followed by Logistic Regression with SMOTE_SVM (0.67), and Random Forests with Hyperparameters (0.6). Likewise, with an optimal threshold of 0.427, the robustness of Random Forests for unbalanced classes was demonstrated by achieving metrics very similar to those obtained by consensus of the three previous models (threshold = 0.463). Lastly, this research paper will hopefully contribute in boosting the application of this simple but powerful technique, which is highly underestimated with respect to data resampling techniques for unbalanced classes

Cite

CITATION STYLE

APA

Velasco, C. L. R., Villena, E. G., Ballester, J. B., Prados, F. Á. D., Alvarado, E. S., & Álvarez, J. C. (2023). Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning, 18(4), 120–155. https://doi.org/10.3991/ijet.v18i04.34825

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free