ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction

5Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

Abstract

Currently available grammatical error correction (GEC) datasets are compiled using essays or other long-form text written by language learners, limiting the applicability of these datasets to other domains such as informal writing and conversational dialog. In this paper, we present a novel GEC dataset consisting of parallel original and corrected utterances drawn from open-domain chatbot conversations; this dataset is, to our knowledge, the first GEC dataset targeted to a human-machine conversational setting. We also present a detailed annotation scheme which ranks errors by perceived impact on comprehension, making our dataset more representative of real-world language learning applications. To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model. Experimental results show the effectiveness of our data in improving GEC model performance in a conversational scenario.

Cite

CITATION STYLE

APA

Yuan, X., Pham, D., Davidson, S., & Yu, Z. (2022). ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 76–84). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-main.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free