Exploring techniques to improve machine learning's identification of at-risk students in physics classes

1Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Machine learning models were constructed to predict student performance in an introductory mechanics class at a large land-grant university in the United States using data from 2061 students. Students were classified as either being at risk of failing the course (earning a D or F) or not at risk (earning an A, B, or C). The models focused on variables available in the first few weeks of the class which could potentially allow for early interventions to help at-risk students. Multiple types of variables were used in the model: in-class variables (average homework and clicker quiz scores), institutional variables [college grade point average (GPA)], and noncognitive variables (self-efficacy). The substantial imbalance between the pass and fail rates of the course, with only about 10% of students failing, required modification to the machine learning algorithms. Decision threshold tuning and upsampling were successful in improving performance for at-risk students. Logistic regression combined with a decision threshold tuned to maximize balanced accuracy yielded the strongest classifier, with a DF accuracy of 83% and an ABC accuracy of 81%. Measures of variable importance involving changes in balanced accuracy identified homework grades, clicker grades, college GPA, and the fraction of college classes successfully completed as the most important variables in predicting success in introductory physics. Noncognitive variables added little predictive power to the models. Classification models with performance near the best-performing models using the full set of variables could be constructed with very few variables (homework average, clicker scores, and college GPA) using straightforward to implement algorithms, suggesting the application of these technologies may be fairly easy to include in many physics classes.

Cite

CITATION STYLE

APA

Pace, J., Hansen, J., & Stewart, J. (2024). Exploring techniques to improve machine learning’s identification of at-risk students in physics classes. Physical Review Physics Education Research, 20(1). https://doi.org/10.1103/PhysRevPhysEducRes.20.010149

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free