Grammatical error correction for low-resource languages: a review of challenges, strategies, computational and future directions

4Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Grammatical error correction (GEC) is crucial for enhancing the readability and comprehension of texts, particularly in improving text quality in low-resource languages. However, challenges such as data scarcity, linguistic diversity, and limited computational resources hinder advancements in this domain. To address these challenges, researchers have developed strategies such as synthetic data generation, multilingual pre-trained models, and cross-lingual transfer learning. This review synthesizes findings from key studies to explore effective GEC methods for low-resource languages, emphasizing approaches for handling limited annotated corpora, typological complexities, and evaluation challenges. Synthetic data generation techniques, including noise injection, adversarial error generation, and translationese-based augmentation, have proven vital for overcoming data scarcity. Multilingual and transfer learning approaches demonstrate effectiveness in adapting knowledge from high-resource languages to low-resource settings, especially when combined with fine-tuning on curated datasets. Additionally, linguistic diversity has been partially addressed through methods like morphology-aware embeddings, byte-level tokenization, and contextual data preprocessing. However, limited research exists on robust evaluation metrics tailored to diverse typologies, such as agglutinative and morphologically rich languages, and the creation of gold-standard datasets remains an ongoing challenge. Recent advancements in dataset construction and the use of large language models further enrich this field, offering scalable solutions for low-resource contexts. Despite notable progress, this review identifies gaps in evaluation methodologies and typology-specific solutions, calling for future innovations in multilingual modeling, dataset creation, and computationally efficient GEC systems tailored to the unique needs of low-resource languages.

Cite

CITATION STYLE

APA

Marier, S. M., Chen, X., Zhu, L., & Kong, X. (2025). Grammatical error correction for low-resource languages: a review of challenges, strategies, computational and future directions. PeerJ Computer Science, 11. https://doi.org/10.7717/peerj-cs.3044

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free