Backpack Language Models

15Citations
Citations of this article
94Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124M-parameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

Cite

CITATION STYLE

APA

Hewitt, J., Thickstun, J., Manning, C. D., & Liang, P. (2023). Backpack Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 9103–9125). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.506

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free