Truncation Sampling as Language Model Desmoothing

37Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms-like top-p or top-k-address this by setting some words' probabilities to zero at each step. This work provides framing for the aim of truncation, and an improved algorithm for that aim. We propose thinking of a neural language model as a mixture of a true distribution and a smoothing distribution that avoids infinite perplexity. In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution. Finding a good subset is crucial: we show that top-p unnecessarily truncates high-probability words, for example causing it to truncate all words but Trump for a document that starts with Donald. We introduce η-sampling, which truncates words below an entropy-dependent probability threshold. Compared to previous algorithms, η-sampling generates more plausible long English documents according to humans, is better at breaking out of repetition, and behaves more reasonably on a battery of test distributions.

Cite

CITATION STYLE

APA

Hewitt, J., Manning, C. D., & Liang, P. (2022). Truncation Sampling as Language Model Desmoothing. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 3414–3427). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.249

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free