Automated Evaluation of Written Discourse Coherence Using GPT-4

85Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The popularization of large language models (LLMs) such as OpenAI’s GPT-3 and GPT-4 have led to numerous innovations in the field of AI in education. With respect to automated writing evaluation (AWE), LLMs have reduced challenges associated with assessing writing quality characteristics that are difficult to identify automatically, such as discourse coherence. In addition, LLMs can provide rationales for their evaluations (ratings) which increases score interpretability and transparency. This paper investigates one approach to producing ratings by training GPT-4 to assess discourse coherence in a manner consistent with expert human raters. The findings of the study suggest that GPT-4 has strong potential to produce discourse coherence ratings that are comparable to human ratings, accompanied by clear rationales. Furthermore, the GPT-4 ratings outperform traditional NLP coherence metrics with respect to agreement with human ratings. These results have implications for advancing AWE technology for learning and assessment.

Cite

CITATION STYLE

APA

Naismith, B., Mulcaire, P., & Burstein, J. (2023). Automated Evaluation of Written Discourse Coherence Using GPT-4. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 394–403). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.bea-1.32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free