Abstract
We present our system that participated in the shared task on the grammatical error correction of Ukrainian. We have implemented two approaches that make use of large pre-trained language models and synthetic data, that have been used for error correction of English as well as low-resource languages. The first approach is based on finetuning a large multilingual language model (mT5) in two stages: first, on synthetic data, and then on gold data. The second approach trains a (smaller) seq2seq Transformer model pre-trained on synthetic data and finetuned on gold data. Our mT5-based model scored first in “GEC only” track, and a very close second in the “GEC+Fluency” track. Our two key innovations are (1) finetuning in stages, first on synthetic, and then on gold data; and (2) a high-quality corruption method based on round-trip machine translation to complement existing noisification approaches.
Cite
CITATION STYLE
Gomez, F. P., Rozovskaya, A., & Roth, D. (2023). A Low-Resource Approach to the Grammatical Error Correction of Ukrainian. In EACL 2023 - 2nd Ukrainian Natural Language Processing Workshop, UNLP 2023 - Proceedings of the Workshop (pp. 114–120). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.unlp-1.14
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.