ATTEXPLAINER: Explain Transformer via Attention by Reinforcement Learning

Runliang Niu; Zhepei Wei; Yan Wang; Qi Wang

Conference ProceedingsOPEN ACCESS

ATTEXPLAINER: Explain Transformer via Attention by Reinforcement Learning

IJCAI International Joint Conference on Artificial Intelligence (2022) 724-731

DOI: 10.24963/ijcai.2022/102

4Citations

12Readers

Abstract

Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely ATTEXPLAINER. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https://github.com/niuzaisheng/AttExplainer.

Cite

CITATION STYLE

APA

Niu, R., Wei, Z., Wang, Y., & Wang, Q. (2022). ATTEXPLAINER: Explain Transformer via Attention by Reinforcement Learning. In IJCAI International Joint Conference on Artificial Intelligence (pp. 724–731). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/102

ATTEXPLAINER: Explain Transformer via Attention by Reinforcement Learning

Abstract

Cite

Register to see more suggestions