Syntax-Based Attention Masking for Neural Machine Translation

5Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

Abstract

We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On translation from English to various low-resource languages, and translation in both directions between English and German, our method always improves over simple linearization of the source-side parse tree and almost always improves over a sequence-to-sequence baseline, by up to +2.1% BLEU.

Cite

CITATION STYLE

APA

McDonald, C., & Chiang, D. (2021). Syntax-Based Attention Masking for Neural Machine Translation. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 47–52). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free