An empirical evaluation of clustering processes for early detection of university dropout

1Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The elevated rates of dropout within academic institutions have prompted the use of Artificial Intelligence (AI) to tackle this issue. These efforts often rely mainly on administrative and academic data, lacking personal information about students. In a previous study, we explored machine learning models to leverage this data and harness their knowledge-extraction capabilities. However, a critical factor, the availability of labeled data, was not addressed. Obtaining these data may be challenging due to their distribution across different systems or the considerable time required to collect them, especially when new degrees are being implemented. The lack of labeled data is a major obstacle for institutions that do not possess them so that they are unable to take advantage of the full potential of AI for their purposes. Clustering algorithms have conventionally been employed to uncover latent patterns within unlabeled data. These unsupervised algorithms may reduce the need for data labeling; nonetheless, it necessitates rigorous validation of the resulting clusters, particularly when dealing with datasets encompassing numerical and categorical attributes. This paper introduces a comparison of various clustering algorithms to discern the most appropriate technique for uncovering the underlying factors contributing to university student attrition, employing unlabeled data. The novelty lies not only in the algorithmic comparison but also in their integration with diverse data preprocessing methodologies, streamlining the selection of the optimal combination including advanced data transformations for the harmonization of numerical and categorical information. It is illustrated through a real-world case utilizing academic data from a Spanish university, providing empirical validation for the proposed methodology. We also conducted an exploratory analysis to identify the factors behind cluster formation. The insights gained can be extrapolated to analogous experiments where social or economic data are scarce, and most of the available attributes are academic in nature.

Cite

CITATION STYLE

APA

Melchor, F., Conejero, J. M., Fernández-García, A. J., Sánchez-Figueroa, F., & Rodríguez-Echeverría, R. (2026). An empirical evaluation of clustering processes for early detection of university dropout. International Journal of Data Science and Analytics, 22(1). https://doi.org/10.1007/s41060-025-00965-y

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free