A Machine Learning Approach to Predictive Modelling of Student Performance

  • Ng H
  • bin Mohd Azha A
  • Yap T
  • et al.
N/ACitations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student performance from a data mining approach. This study presents a data mining approach in identify significant factors and predict student performance, based on two datasets collected from two secondary schools in Portugal. Methods – In this study, two datasets collected from two secondary schools in Portugal. First, the data used in the study is augmented to increase the sample size by merging the two datasets. Following that, data pre-processing is performed and the features are normalized with linear scaling to avoid bias on heavy weighted attributes.  The selected features are then assigned into four groups comprising of student background, lifestyle, history of grades and all features. Next, Boruta feature selection is performed to remove irrelevant features. Finally, the classification models of Support Vector Machine (SVM), Naïve Bayes (NB), and Multilayer Perceptron (MLP) origins are designed and their performances evaluated. Results - The models were trained and evaluated on an integrated dataset comprising 1044 student records with 33 features, after feature selection. The classification was performed with SVM, NB and MLP with 60-40 and 50-50 train-test splits and 10-fold cross validation. GridSearchCV was applied to perform hyperparameter tuning. The performance metrics were accuracy, precision, recall and F1-Score. SVM obtained the highest accuracy with scores of 77%, 80%, 91% and 90% on background, lifestyle, history of grades and all features respectively in 50-50 train-test splits for binary classification (pass or fail). SVM also obtained highest accuracy for five class classification (grade A, B, C, D and F) with 39%, 38%, 73% and 71% for the four categories respectively.

Cite

CITATION STYLE

APA

Ng, H., bin Mohd Azha, A. A., Yap, T. T. V., & Goh, V. T. (2021). A Machine Learning Approach to Predictive Modelling of Student Performance. F1000Research, 10, 1144. https://doi.org/10.12688/f1000research.73180.1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free