Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task

Nouman Ahmed; Natalia Flechas Manrique; Antonije Petrović

Conference Proceedings

Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 156-162

DOI: 10.18653/v1/2023.americasnlp-1.16

6Citations

12Readers

Get full text

Abstract

We present the LCT-EHU submission to the AmericasNLP 2023 low-resource machine translation shared task. We focus on the Spanish-Quechua language pair and explore the usage of different approaches: (1) Obtain new parallel corpora from the literature and legal domains, (2) Compare a high-resource Spanish-English pre-trained MT model with a Spanish-Finnish pre-trained model (with Finnish being chosen as a target language due to its morphological similarity to Quechua), and (3) Explore additional techniques such as copied corpus and back-translation. Overall, we show that the Spanish-Finnish pre-trained model outperforms other setups, while low-quality synthetic data reduces the performance.

Cite

CITATION STYLE

APA

Ahmed, N., Manrique, N. F., & Petrović, A. (2023). Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 156–162). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.americasnlp-1.16

Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task

Abstract

Cite

Register to see more suggestions