A Fine-grained Interpretability Evaluation Benchmark for Neural NLP

6Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark1 and hope it can facilitate the research in building trustworthy systems.

Cite

CITATION STYLE

APA

Wang, L., Shen, Y., Peng, S., Zhang, S., Xiao, X., Liu, H., … Wang, H. (2022). A Fine-grained Interpretability Evaluation Benchmark for Neural NLP. In CoNLL 2022 - 26th Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 70–84). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.conll-1.6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free