Can the Transformer Learn Nested Recursion with Symbol Masking?

Jean Philippe Bernardy; Adam Ek; Vladislav Maraev

Conference ProceedingsOPEN ACCESS

Can the Transformer Learn Nested Recursion with Symbol Masking?

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021) 753-760

DOI: 10.18653/v1/2021.findings-acl.67

3Citations

42Readers

Abstract

We investigate if, given a simple symbol masking strategy, self-attention models are capable of learning nested structures and generalise over their depth. We do so in the simplest setting possible, namely languages consisting of nested parentheses of several kinds. We use encoder-only models, which we train to predict randomly masked symbols, in a BERT-like fashion. We find that the accuracy is well above random baseline, with accuracy consistently above 50% both when increasing nesting depth and distances between training and testing. However, we find that the predictions made correspond to a simple parenthesis counting strategy, rather than a push-down automaton. This suggests that self-attention models are not suitable for tasks which require generalisation to more complex instances of recursive structures than those found in the training set.

Cite

CITATION STYLE

APA

Bernardy, J. P., Ek, A., & Maraev, V. (2021). Can the Transformer Learn Nested Recursion with Symbol Masking? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 753–760). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.67

Can the Transformer Learn Nested Recursion with Symbol Masking?

Abstract

Cite

Register to see more suggestions