Syntax-guided Localized Self-attention by Constituency Syntactic Distance

3Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer's performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from small to large dataset sizes, and with different source languages.

Cite

CITATION STYLE

APA

Hou, S., Kai, J., Xue, H., Zhu, B., Yuan, B., Huang, L., … Lin, Z. (2022). Syntax-guided Localized Self-attention by Constituency Syntactic Distance. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 2334–2341). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free