An empirical evaluation of clustering processes for early detection of university dropout

Fran Melchor; José M. Conejero; Antonio Jesús Fernández-García; Fernando Sánchez-Figueroa; Roberto Rodríguez-Echeverría

Journal ArticleOPEN ACCESS

An empirical evaluation of clustering processes for early detection of university dropout

International Journal of Data Science and Analytics (2026) 22(1)

DOI: 10.1007/s41060-025-00965-y

1Citations

48Readers

Abstract

The elevated rates of dropout within academic institutions have prompted the use of Artificial Intelligence (AI) to tackle this issue. These efforts often rely mainly on administrative and academic data, lacking personal information about students. In a previous study, we explored machine learning models to leverage this data and harness their knowledge-extraction capabilities. However, a critical factor, the availability of labeled data, was not addressed. Obtaining these data may be challenging due to their distribution across different systems or the considerable time required to collect them, especially when new degrees are being implemented. The lack of labeled data is a major obstacle for institutions that do not possess them so that they are unable to take advantage of the full potential of AI for their purposes. Clustering algorithms have conventionally been employed to uncover latent patterns within unlabeled data. These unsupervised algorithms may reduce the need for data labeling; nonetheless, it necessitates rigorous validation of the resulting clusters, particularly when dealing with datasets encompassing numerical and categorical attributes. This paper introduces a comparison of various clustering algorithms to discern the most appropriate technique for uncovering the underlying factors contributing to university student attrition, employing unlabeled data. The novelty lies not only in the algorithmic comparison but also in their integration with diverse data preprocessing methodologies, streamlining the selection of the optimal combination including advanced data transformations for the harmonization of numerical and categorical information. It is illustrated through a real-world case utilizing academic data from a Spanish university, providing empirical validation for the proposed methodology. We also conducted an exploratory analysis to identify the factors behind cluster formation. The insights gained can be extrapolated to analogous experiments where social or economic data are scarce, and most of the available attributes are academic in nature.

Author supplied keywords

Cite

CITATION STYLE

APA

Melchor, F., Conejero, J. M., Fernández-García, A. J., Sánchez-Figueroa, F., & Rodríguez-Echeverría, R. (2026). An empirical evaluation of clustering processes for early detection of university dropout. International Journal of Data Science and Analytics, 22(1). https://doi.org/10.1007/s41060-025-00965-y

An empirical evaluation of clustering processes for early detection of university dropout

Abstract

Author supplied keywords

Cite

Register to see more suggestions