Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

William Held; Diyi Yang

Conference ProceedingsOPEN ACCESS

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (2023) 2408-2419

DOI: 10.18653/v1/2023.eacl-main.177

0Citations

18Readers

Abstract

Multilingual transformer-based models demonstrate remarkable zero and few-shot transfer across languages by learning and reusing language-agnostic features. However, as a fixed-size model acquires more languages, its performance across all languages degrades. Those who attribute this interference phenomenon to limited model capacity address the problem by adding additional parameters, despite evidence that transformer-based models are overparameterized. In this work, we show that it is possible to reduce interference by instead identifying and pruning language-specific attention heads. First, we use Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference. Then, we show that pruning such heads from a fixed model improves performance for a target language on both sentence classification and structural prediction. Finally, we provide insights on language-agnostic and language-specific attention heads using attention visualization.

Cite

CITATION STYLE

APA

Held, W., & Yang, D. (2023). Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 2408–2419). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.eacl-main.177

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

Abstract

Cite

Register to see more suggestions