Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

0Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Multilingual transformer-based models demonstrate remarkable zero and few-shot transfer across languages by learning and reusing language-agnostic features. However, as a fixed-size model acquires more languages, its performance across all languages degrades. Those who attribute this interference phenomenon to limited model capacity address the problem by adding additional parameters, despite evidence that transformer-based models are overparameterized. In this work, we show that it is possible to reduce interference by instead identifying and pruning language-specific attention heads. First, we use Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference. Then, we show that pruning such heads from a fixed model improves performance for a target language on both sentence classification and structural prediction. Finally, we provide insights on language-agnostic and language-specific attention heads using attention visualization.

Cite

CITATION STYLE

APA

Held, W., & Yang, D. (2023). Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 2408–2419). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.eacl-main.177

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free