Abstract
Techniques in causal analysis of language models illuminate how linguistic information is organized in LLMs. We use one such technique, AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models (mBERT and XLM-R). We train a linear classifier on a binary language identity task, to classify tokens between Language X and Language Y. Applying a counterfactual probing procedure, we use the classifier weights to project the embeddings into the null space and push the resulting embeddings either in the direction of Language X or Language Y. Then we evaluate on a masked language modeling task. We find that, given a template in Language X, pushing towards Language Y systematically increases the probability of Language Y words, above and beyond a third-party control language. But it does not specifically push the model towards translation-equivalent words in Language Y. Pushing towards Language X (the same direction as the template) has a minimal effect, but somewhat degrades these models. Overall, we take these results as further evidence of the rich structure of massive multilingual language models, which include both a language-specific and language-general component. And we show that counterfactual probing can be fruitfully applied to multilingual models.
Cite
CITATION STYLE
Srinivasan, A., Govindarajan, V. S., & Mahowald, K. (2023). Counterfactually Probing Language Identity in Multilingual Models. In MRL 2023 - 3rd Workshop on Multi-Lingual Representation Learning, Proceedings of the Workshop (pp. 125–138). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.mrl-1.3
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.