SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

34Citations
Citations of this article
91Readers
Mendeley users who have this article in their library.

Abstract

We introduce SELFEXPLAIN, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SELFEXPLAIN augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SELFEXPLAIN facilitates interpretability without sacrificing performance. Most importantly, explanations from SELFEXPLAIN show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.

Cite

CITATION STYLE

APA

Rajagopal, D., Balachandran, V., Hovy, E., & Tsvetkov, Y. (2021). SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 836–850). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.64

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free