Ensuring Fairness of Human- and AI-Generated Test Items

William C.M. Belzak; Ben Naismith; Jill Burstein

Conference Proceedings

Ensuring Fairness of Human- and AI-Generated Test Items

Communications in Computer and Information Science (2023) 1831 CCIS 701-707

DOI: 10.1007/978-3-031-36336-8_108

1Citations

7Readers

Get full text

Abstract

Large language models (LLMs) have been a catalyst for the increased use of AI for automatic item generation on high-stakes assessments. Standard human review processes applied to human-generated content are also important for AI-generated content because AI-generated content can reflect human biases. However, human reviewers have implicit biases and gaps in cultural knowledge which may emerge where the test population is diverse. Quantitative analyses of item responses via differential item functioning (DIF) can help to identify these unknown biases. In this paper, we present DIF results based on item responses from a high-stakes English language assessment (Duolingo English Test - DET). We find that human- and AI-generated content, both of which were reviewed for fairness and bias by humans, show similar amounts of DIF overall but varying amounts by certain test-taker groups. This finding suggests that humans are unable to identify all biases beforehand, regardless of how item content is generated. To mitigate this problem, we recommend that assessment developers employ human reviewers which represent the diversity of the test-taking population. This practice may lead to more equitable use of AI in high-stakes educational assessment.

Author supplied keywords

Cite

CITATION STYLE

APA

Belzak, W. C. M., Naismith, B., & Burstein, J. (2023). Ensuring Fairness of Human- and AI-Generated Test Items. In Communications in Computer and Information Science (Vol. 1831 CCIS, pp. 701–707). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-36336-8_108

Ensuring Fairness of Human- and AI-Generated Test Items

Abstract

Author supplied keywords

Cite

Register to see more suggestions