Probing BERT's priors with serial reproduction chains

Takateru Yamakoshi; Thomas L. Griffiths; Robert D. Hawkins

Conference ProceedingsOPEN ACCESS

Probing BERT's priors with serial reproduction chains

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 3977-3992

DOI: 10.18653/v1/2022.findings-acl.314

5Citations

37Readers

Abstract

Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT's priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Yamakoshi, T., Griffiths, T. L., & Hawkins, R. D. (2022). Probing BERT’s priors with serial reproduction chains. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 3977–3992). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.314

Readers' Seniority

PhD / Post grad / Masters / Doc 6

46%

Researcher 4

31%

Professor / Associate Prof. 2

15%

Lecturer / Post doc 1

Readers' Discipline

Computer Science 10

59%

Linguistics 4

24%

Medicine and Dentistry 2

12%

Neuroscience 1

Probing BERT's priors with serial reproduction chains

Abstract

References Powered by Scopus

Inference from iterative simulation using multiple sequences

On the dangers of stochastic parrots: Can language models be too big?

A survey of methods for explaining black box models

Cited by Powered by Scopus

How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN

Testing the limits of natural language models for predicting human language judgements

Deriving Language Models from Masked Language Models

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline