Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis

Ta Chung Chi; Ting Han Fan; Alexander I. Rudnicky; Peter J. Ramadge

Conference ProceedingsOPEN ACCESS

Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 1 13522-13537

DOI: 10.18653/v1/2023.acl-long.756

10Citations

23Readers

Abstract

Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to date. We dissect ALiBi via the lens of receptive field analysis empowered by a novel cumulative normalized gradient tool. The concept of receptive field further allows us to modify the vanilla Sinusoidal positional embedding to create Sandwich, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence. Sandwich shares with KERPLE and T5 the same logarithmic decaying temporal bias pattern with learnable relative positional embeddings; these elucidate future extrapolatable positional embedding design.

Cite

CITATION STYLE

APA

Chi, T. C., Fan, T. H., Rudnicky, A. I., & Ramadge, P. J. (2023). Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 13522–13537). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.756

Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis

Abstract

Cite

Register to see more suggestions