A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Clayton Cohn; Nicole Hutchins; Tuan Le; Gautam Biswas

Conference ProceedingsOPEN ACCESS

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Proceedings of the AAAI Conference on Artificial Intelligence (2024) 38(21) 23182-23190

DOI: 10.1609/aaai.v38i21.30364

53Citations

53Readers

Abstract

This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments.

Cite

CITATION STYLE

APA

Cohn, C., Hutchins, N., Le, T., & Biswas, G. (2024). A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 23182–23190). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i21.30364

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Abstract

Cite

Register to see more suggestions