Dependency-based self-attention for transformer NMT

18Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by linguistically-informed self-attention (LISA). While LISA was originally designed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at subword units created by byte pair encoding. Experiments demonstrate that our model achieved a 1.0 point gain in BLEU over the baseline model on the WAT'18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.

Cite

CITATION STYLE

APA

Deguchi, H., Tamura, A., & Ninomiya, T. (2019). Dependency-based self-attention for transformer NMT. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2019-September, pp. 239–246). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_028

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free