Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Yifan Chen; Qi Zeng; Dilek Hakkani-Tur; Di Jin; Heng Ji; Yun Yang

Conference ProceedingsOPEN ACCESS

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (2022) 5187-5199

DOI: 10.18653/v1/2022.naacl-main.381

2Citations

30Readers

Abstract

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

Cite

CITATION STYLE

APA

Chen, Y., Zeng, Q., Hakkani-Tur, D., Jin, D., Ji, H., & Yang, Y. (2022). Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 5187–5199). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-main.381

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Abstract

Cite

Register to see more suggestions