This paper describes the submission of the GUIT-NLP team in the "Shared Task: Low Resource Indic Language Translation" focusing on three low-resource language pairs: English-Mizo, English-Khasi, and English-Assamese. The initial phase involves an in-depth exploration of Neural Machine Translation (NMT) techniques tailored to the available data. Within this investigation, various Subword Tokenization approaches, model configurations (exploring differnt hyper-parameters etc.) of the general NMT pipeline are tested to identify the most effective method. Subsequently, we address the challenge of low-resource languages by leveraging monolingual data through an innovative and systematic application of the Back Translation technique for English-Mizo. During model training, the monolingual data is progressively integrated into the original bilingual dataset, with each iteration yielding higher-quality back translations. This iterative approach significantly enhances the model's performance, resulting in a notable increase of +3.65 in BLEU scores. Further improvements of +5.59 are achieved through fine-tuning using authentic parallel data.
CITATION STYLE
Ahmed, M. A., Talukdar, K., Boruah, P. A., Sarma, S. K., & Kashyap, K. (2023). GUIT-NLP’s submission to Shared Task: Low Resource Indic Language Translation. In Conference on Machine Translation - Proceedings (pp. 933–938). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.wmt-1.87
Mendeley helps you to discover research relevant for your work.