How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

10Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this refram-ing to study the impact of each component. We provide evidence that multi-head atten-tions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of fine-tuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.

Cite

CITATION STYLE

APA

Mickus, T., Paperno, D., & Constant, M. (2022). How to Dissect a Muppet: The Structure of Transformer Embedding Spaces. Transactions of the Association for Computational Linguistics, 10, 981–996. https://doi.org/10.1162/tacl_a_00501

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free