Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

Zhaohong Wan; Xiaojun Wan; Wenguang Wang

Conference ProceedingsOPEN ACCESS

Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (2020) 2202-2212

DOI: 10.18653/v1/2020.coling-main.200

38Citations

88Readers

Abstract

The incorporation of data augmentation method in grammatical error correction task has attracted much attention. However, existing data augmentation methods mainly apply noise to tokens, which leads to the lack of diversity of generated errors. In view of this, we propose a new data augmentation method that can apply noise to the latent representation of a sentence. By editing the latent representations of grammatical sentences, we can generate synthetic samples with various error types. Combining with some pre-defined rules, our method can greatly improve the performance and robustness of existing grammatical error correction models. We evaluate our method on public benchmarks of GEC task and it achieves the state-of-the-art performance on CoNLL-2014 and FCE benchmarks.

Cite

CITATION STYLE

APA

Wan, Z., Wan, X., & Wang, W. (2020). Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 2202–2212). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.200

Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

Abstract

Cite

Register to see more suggestions