Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation

0Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.

Cite

CITATION STYLE

APA

Chi, T. C., Fan, T. H., Rudnicky, A. I., & Ramadge, P. J. (2023). Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 5972–5984). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.397

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free