Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Di Wu; Christof Monz

Conference ProceedingsOPEN ACCESS

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 9749-9764

DOI: 10.18653/v1/2023.emnlp-main.605

1Citations

10Readers

Abstract

Using a vocabulary that is shared across languages is common practice in Multilingual Neural Machine Translation (MNMT). In addition to its simple design, shared tokens play an important role in positive knowledge transfer, assuming that shared tokens refer to similar meanings across languages. However, when word overlap is small, especially due to different writing systems, transfer is inhibited. In this paper, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages. Our experiments demonstrate the advantages of our approach: 1) embeddings of words with similar meanings are better aligned across languages, 2) our method achieves consistent BLEU improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less than 1.0% additional trainable parameters are required with a limited increase in computational costs, while inference time remains identical to the baseline. We release the codebase to the community.

Cite

CITATION STYLE

APA

Wu, D., & Monz, C. (2023). Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 9749–9764). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.605

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Abstract

Cite

Register to see more suggestions