Data mining has become an increasingly important tool for education researchers and practitioners. However, work in this field has focused on data from online educational systems. Here, we present techniques to enable data mining of handwritten coursework, which is an essential component of instruction in many disciplines. Our techniques include methods for classifying pen strokes as diagram, equation, and cross-out strokes. The latter are used to strike out erroneous work. We have also created techniques for grouping equation strokes into equation groups and then individual characters. Our results demonstrate that our classification and grouping techniques are more accurate than prior techniques for this task. We also demonstrate applications of our techniques for automated assessment of student competence. We present a novel approach for measuring the correctness of exam solutions from an analysis of lexical features of handwritten equations. This analysis demonstrates, for example, that the number of equation groups correlates positively with grade. We also use our techniques to extend graphical protocol analysis to free-form, handwritten problem solutions. While prior work in a laboratory setting suggests that long pauses are indicative of low competence, our work shows that the frequency of long pauses during exams correlates positively with competence.
Stahovich, T. F., & Lin, H. (2016). Enabling data mining of handwritten coursework. Computers and Graphics (Pergamon), 57, 31–45. https://doi.org/10.1016/j.cag.2016.01.002