Learning to explain: Generating stable explanations fast

30Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The importance of explaining the outcome of a machine learning model, especially a black-box model, is widely acknowledged. Recent approaches explain an outcome by identifying the contributions of input features to this outcome. In environments involving large black-box models or complex inputs, this leads to computationally demanding algorithms. Further, these algorithms often suffer from low stability, with explanations varying significantly across similar examples. In this paper, we propose a Learning to Explain (L2E) approach that learns the behaviour of an underlying explanation algorithm simultaneously from all training examples. Once the explanation algorithm is distilled into an explainer network, it can be used to explain new instances. Our experiments on three classification tasks, which compare our approach to six explanation algorithms, show that L2E is between 5 and 7.5 × 104 times faster than these algorithms, while generating more stable explanations, and having comparable faithfulness to the black-box model.

Cite

CITATION STYLE

APA

Situ, X., Zukerman, I., Paris, C., Maruf, S., & Haffari, G. (2021). Learning to explain: Generating stable explanations fast. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 5340–5355). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.415

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free