Life after BERT: What do Other Muppets Understand about Language?

3Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.

Abstract

Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive models and evaluate GPT networks of different sizes. Our findings show that none of these models can resolve compositional questions in a zero-shot fashion, suggesting that this skill is not learnable using existing pre-training objectives. Furthermore, we find that global model decisions such as architecture, directionality, size of the dataset, and pre-training objective are not predictive of a model's linguistic capabilities. The code for this study is available on GitHub.

Cite

CITATION STYLE

APA

Lialin, V., Zhao, K., Shivagunde, N., & Rumshisky, A. (2022). Life after BERT: What do Other Muppets Understand about Language? In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3180–3193). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.227

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free