Evaluating the Advisory Flags and Machine Scoring Difficulty in the e‐rater ® Automated Scoring Engine

Mo Zhang; Jing Chen; Chunyi Ruan

Journal ArticleOPEN ACCESS

Evaluating the Advisory Flags and Machine Scoring Difficulty in the e‐rater ® Automated Scoring Engine

Zhang M
Chen J
Ruan C

ETS Research Report Series (2016) 2016(2) 1-14

DOI: 10.1002/ets2.12116

N/ACitations

8Readers

Abstract

Successful detection of unusual responses is critical for using machine scoring in the assessment context. This study evaluated the utility of approaches to detecting unusual responses in automated essay scoring. Two research questions were pursued. One question concerned the performance of various prescreening advisory flags, and the other related to the degree of machine scoring difficulty and whether the size of the human–machine discrepancy could be predicted. The results suggested that some advisory flags operated more consistently across measures and tasks in detecting responses that the machine was likely to score differently from human raters than did other flags, and relatively little scoring difficulty was found for three of the four tasks examined in this study, with the relationship between machine and human scores being reasonably strong. Limitations and future studies are also discussed.Report Number: ETS RR‐16–30

Cite

CITATION STYLE

APA

Zhang, M., Chen, J., & Ruan, C. (2016). Evaluating the Advisory Flags and Machine Scoring Difficulty in the e‐rater ® Automated Scoring Engine. ETS Research Report Series, 2016(2), 1–14. https://doi.org/10.1002/ets2.12116

Readers' Seniority

Professor / Associate Prof. 1

33%

PhD / Post grad / Masters / Doc 1

33%

Researcher 1

33%

Readers' Discipline

Linguistics 3

75%

Materials Science 1

25%

Evaluating the Advisory Flags and Machine Scoring Difficulty in the e‐rater ® Automated Scoring Engine

Abstract

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline