Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

16Citations
Citations of this article
53Readers
Mendeley users who have this article in their library.

Abstract

Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.

Cite

CITATION STYLE

APA

Zhang, S., & Feng, Y. (2021). Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 1401–1411). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.121

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free