A Low-Resource Approach to the Grammatical Error Correction of Ukrainian

13Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present our system that participated in the shared task on the grammatical error correction of Ukrainian. We have implemented two approaches that make use of large pre-trained language models and synthetic data, that have been used for error correction of English as well as low-resource languages. The first approach is based on finetuning a large multilingual language model (mT5) in two stages: first, on synthetic data, and then on gold data. The second approach trains a (smaller) seq2seq Transformer model pre-trained on synthetic data and finetuned on gold data. Our mT5-based model scored first in “GEC only” track, and a very close second in the “GEC+Fluency” track. Our two key innovations are (1) finetuning in stages, first on synthetic, and then on gold data; and (2) a high-quality corruption method based on round-trip machine translation to complement existing noisification approaches.

Cite

CITATION STYLE

APA

Gomez, F. P., Rozovskaya, A., & Roth, D. (2023). A Low-Resource Approach to the Grammatical Error Correction of Ukrainian. In EACL 2023 - 2nd Ukrainian Natural Language Processing Workshop, UNLP 2023 - Proceedings of the Workshop (pp. 114–120). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.unlp-1.14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free