Although explainable artificial intelligence (XAI) has achieved remarkable developments in recent years, there are few efforts have been devoted to the following problems, namely, i) how to develop an explainable method that could explain the black-box in a model-agnostic way? and ii) how to improve the performance and interpretability of the black-box using such explanations instead of pre-collected important attributions? To explore the potential solution, we propose a model-agnostic explanation method termed as Sparse Contrastive Coding (SCC) and verify its effectiveness in text classification and natural language inference. In brief, SCC explains the feature attributions which characterize the importance of words based on the hidden states of each layer of the model. With such word-level explainability, SCC adaptively divides the input sentences into foregrounds and backgrounds in terms of task relevance. Through maximizing the similarity between the foregrounds and input sentences while minimizing the similarity between the backgrounds and input sentences, SSC employs a supervised contrastive learning loss to boost the interpretability and performance of the model. Extensive experiments show the superiority of our method over five state-of-the-art methods in terms of interpretability and classification measurements. The code is available at https://pengxi.me.
CITATION STYLE
Liu, J., Lin, Y., Jiang, L., Liu, J., Wen, Z., & Peng, X. (2022). Improve Interpretability of Neural Networks via Sparse Contrastive Coding. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 460–470). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.32
Mendeley helps you to discover research relevant for your work.