Saturated Transformers are Constant-Depth Threshold Circuits

William Merrill; Ashish Sabharwal; Noah A. Smith

Journal ArticleOPEN ACCESS

Saturated Transformers are Constant-Depth Threshold Circuits

Transactions of the Association for Computational Linguistics (2022) 10 843-856

DOI: 10.1162/tacl_a_00493

44Citations

30Readers

Abstract

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al., 2022). However, hard attention is a strong assumption, which may complicate the relevance of these results in practice. In this work, we analyze the circuit complexity of transformers with saturated at-tention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We first show that saturated transformers transcend the known limitations of hard-attention transform-ers. We then prove saturated transformers with floating-point values can be simulated by constant-depth threshold circuits, giving the class TC0 as an upper bound on the formal languages they recognize.

Cite

CITATION STYLE

APA

Merrill, W., Sabharwal, A., & Smith, N. A. (2022). Saturated Transformers are Constant-Depth Threshold Circuits. Transactions of the Association for Computational Linguistics, 10, 843–856. https://doi.org/10.1162/tacl_a_00493

Saturated Transformers are Constant-Depth Threshold Circuits

Abstract

Cite

Register to see more suggestions