Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

47Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an F0.5 score of 76.05 on BEA-2019 (test), even without pretraining on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on the generated Troy- datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA result with an F0.5 score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available.

Cite

CITATION STYLE

APA

Tarnavskyi, M., Chernodub, A., & Omelianchuk, K. (2022). Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3842–3852). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.266

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free