We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models (e.g., Chinese-English) of different qualities (i.e., poor and good). The poor translation model can resemble the ESL (En-gush as a second language) learner and tends to generate translations of low quality in terms of fluency and grammaticahty, while the good translation model generally generates fluent and grammatically correct translations. With the pair of translation models, we can generate unlimited numbers of poor-good English sentence pairs from text in the source language (e.g., Chinese) of the translators. Our approach can generate various error-corrected patterns and nicely complement the other data synthesis approaches for GEC. Experimental results demonstrate the data generated by our approach can effectively help a GEC model to improve the performance and approaching the state-of-the-art single-model performance in BEA-19 and CoNLL-14 benchmark datasets.
CITATION STYLE
Zhou, W., Ge, T., Mu, C., Xu, K., Wei, F., & Zhou, M. (2020). Improving grammatical error correction with machine translation pairs. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 318–328). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.30
Mendeley helps you to discover research relevant for your work.