Preventing critical scoring errors in short answer scoring with confidence estimation

11Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many recent Short Answer Scoring (SAS) systems have employed Quadratic Weighted Kappa (QWK) as the evaluation measure of their systems. However, we hypothesize that QWK is unsatisfactory for the evaluation of the SAS systems when we consider measuring their effectiveness in actual usage. We introduce a new task formulation of SAS that matches the actual usage. In our formulation, the SAS systems should extract as many scoring predictions that are not critical scoring errors (CSEs). We conduct the experiments in our new task formulation and demonstrate that a typical SAS system can predict scores with zero CSE for approximately 50% of test data at maximum by filtering out low-reliablility predictions on the basis of a certain confidence estimation. This result directly indicates the possibility of reducing half the scoring cost of human raters, which is more preferable for the evaluation of SAS systems.

Cite

CITATION STYLE

APA

Funayama, H., Sasaki, S., Matsubayashi, Y., Mizumoto, T., Suzuki, J., Mita, M., & Inui, K. (2020). Preventing critical scoring errors in short answer scoring with confidence estimation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 237–243). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-srw.32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free