Automated Evaluation of Written Discourse Coherence Using GPT-4

Ben Naismith; Phoebe Mulcaire; Jill Burstein

Conference Proceedings

Automated Evaluation of Written Discourse Coherence Using GPT-4

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 394-403

DOI: 10.18653/v1/2023.bea-1.32

85Citations

73Readers

Get full text

Abstract

The popularization of large language models (LLMs) such as OpenAI’s GPT-3 and GPT-4 have led to numerous innovations in the field of AI in education. With respect to automated writing evaluation (AWE), LLMs have reduced challenges associated with assessing writing quality characteristics that are difficult to identify automatically, such as discourse coherence. In addition, LLMs can provide rationales for their evaluations (ratings) which increases score interpretability and transparency. This paper investigates one approach to producing ratings by training GPT-4 to assess discourse coherence in a manner consistent with expert human raters. The findings of the study suggest that GPT-4 has strong potential to produce discourse coherence ratings that are comparable to human ratings, accompanied by clear rationales. Furthermore, the GPT-4 ratings outperform traditional NLP coherence metrics with respect to agreement with human ratings. These results have implications for advancing AWE technology for learning and assessment.

Cite

CITATION STYLE

APA

Naismith, B., Mulcaire, P., & Burstein, J. (2023). Automated Evaluation of Written Discourse Coherence Using GPT-4. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 394–403). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.bea-1.32

Automated Evaluation of Written Discourse Coherence Using GPT-4

Abstract

Cite

Register to see more suggestions