This research report provides a description of the processes of evaluating the “deployability” of automated scoring (AS) systems from the perspective of large-scale educational assessments in operational settings. It discusses a comprehensive psychometric evaluation that entails analyses that take into consideration the specific purpose of AS, the test design, the quality of human scores, the data collection design needed to train and evaluate the AS model, and the application of statistics and evaluation criteria. Finally, it notes that an effective evaluation of an AS system requires professional judgment coupled with statistical and psychometric knowledge and understanding of the risk assessment and business metrics.
CITATION STYLE
Rotou, O., & Rupp, A. A. (2020). Evaluations of Automated Scoring Systems in Practice. ETS Research Report Series, 2020(1), 1–18. https://doi.org/10.1002/ets2.12293
Mendeley helps you to discover research relevant for your work.