Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

42Citations
Citations of this article
117Readers
Mendeley users who have this article in their library.

Abstract

We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch. We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields crosslingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.

Cite

CITATION STYLE

APA

Gheini, M., Ren, X., & May, J. (2021). Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1754–1765). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.132

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free