Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study addresses two validity-related issues regarding the use of e-rater® with the independent writing task on the TOEFL iBT® (Internet-based test). First, relationships between automated scores of iBT tasks and nontest indicators of writing ability were examined. This was followed by exploration of prompt-related differences in automated scores of essays written by the same examinees. Correlations between both human and e-rater scores and nontest indicators were moderate but consistent, with few differences between e-rater and human rater scores. E-rater was more consistent across prompts than individual human raters, although there were differences in scores across prompts for the individual features used to generate total e-rater scores.
CITATION STYLE
Weigle, S. C. (2011). VALIDATION OF AUTOMATED SCORES OF TOEFL IBT® TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY. ETS Research Report Series, 2011(2), i–63. https://doi.org/10.1002/j.2333-8504.2011.tb02260.x
Mendeley helps you to discover research relevant for your work.