Motivated by recent advancements in grammatical error correction in English and existing issues in the field, we describe a new resource, an annotated learner corpus of Russian, extracted from the Lang-8 language learning website. This new dataset is benchmarked against two grammatical error correction models that use state-of-the-art neural architectures. Results are provided on the newly-created corpus and are compared against performance on another, existing resource. We also evaluate the contribution of the Lang-8 training data to the grammatical error correction of Russian and perform type-based analysis of the models. The expert annotations are available for research purposes.
CITATION STYLE
Trinh, V. A., & Rozovskaya, A. (2021). New Dataset and Strong Baselines for the Grammatical Error Correction of Russian. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4103–4111). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.359
Mendeley helps you to discover research relevant for your work.