Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models

Varun Gumma; Pranjal A. Chitale; Kalika Bali

Conference ProceedingsOPEN ACCESS

Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models

Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (2025) 1 7158-7170

DOI: 10.18653/v1/2025.naacl-long.366

1Citations

11Readers

Get full text

Abstract

Neural Machine Translation (NMT) models have traditionally used Sinusoidal Positional Embeddings (PEs), which often struggle to capture long-range dependencies and are inefficient for handling extended context or document-level translation tasks. This work addresses the challenge of transitioning pre-trained NMT models from absolute Sinusoidal PEs to Relative PEs, such as ROPE and ALIBI, without compromising performance. We demonstrate that parameter-efficient finetuning, using only a small amount of high-quality data, can successfully facilitate this transition. Experimental results indicate that switching from Sinusoidal to Relative PEs results in competitive translation quality on sentence-level evaluation benchmarks. Additionally, models trained with ROPE consistently outperform those using ALIBI and Sinusoidal PEs on document-level benchmarks across both string-based metrics and qualitative evaluations. Moreover, we find that a small amount of long-context data in a few languages is sufficient for cross-lingual length generalization, thereby inducing long-context capabilities.

Cite

CITATION STYLE

APA

Gumma, V., Chitale, P. A., & Bali, K. (2025). Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models. In Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (Vol. 1, pp. 7158–7170). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2025.naacl-long.366

Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models

Abstract

Cite

Register to see more suggestions