Convolutions and self-attention: Re-interpreting relative positions in pre-trained language models

10Citations
Citations of this article
91Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we detail the relationship between convolutions and self-attention in natural language tasks. We show that relative position embeddings in self-attention layers are equivalent to recently-proposed dynamic lightweight convolutions, and we consider multiple new ways of integrating convolutions into Transformer self-attention. Specifically, we propose composite attention, which unites previous relative position embedding methods under a convolutional framework. We conduct experiments by training BERT with composite attention, finding that convolutions consistently improve performance on multiple downstream tasks, replacing absolute position embeddings. To inform future work, we present results comparing lightweight convolutions, dynamic convolutions, and depthwise-separable convolutions in language model pretraining, considering multiple injection points for convolutions in self-attention layers.

Cite

CITATION STYLE

APA

Chang, T. A., Xu, Y., Xu, W., & Tu, Z. (2021). Convolutions and self-attention: Re-interpreting relative positions in pre-trained language models. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 4322–4333). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.333

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free