GUIT-NLP's submission to Shared Task: Low Resource Indic Language Translation

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

This paper describes the submission of the GUIT-NLP team in the "Shared Task: Low Resource Indic Language Translation" focusing on three low-resource language pairs: English-Mizo, English-Khasi, and English-Assamese. The initial phase involves an in-depth exploration of Neural Machine Translation (NMT) techniques tailored to the available data. Within this investigation, various Subword Tokenization approaches, model configurations (exploring differnt hyper-parameters etc.) of the general NMT pipeline are tested to identify the most effective method. Subsequently, we address the challenge of low-resource languages by leveraging monolingual data through an innovative and systematic application of the Back Translation technique for English-Mizo. During model training, the monolingual data is progressively integrated into the original bilingual dataset, with each iteration yielding higher-quality back translations. This iterative approach significantly enhances the model's performance, resulting in a notable increase of +3.65 in BLEU scores. Further improvements of +5.59 are achieved through fine-tuning using authentic parallel data.

Cite

CITATION STYLE

APA

Ahmed, M. A., Talukdar, K., Boruah, P. A., Sarma, S. K., & Kashyap, K. (2023). GUIT-NLP’s submission to Shared Task: Low Resource Indic Language Translation. In Conference on Machine Translation - Proceedings (pp. 933–938). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.wmt-1.87

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free