LEXPLAIN: Improving Model Explanations via Lexicon Supervision

0Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Model explanations that shed light on the model's predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model's predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXPLAIN, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the plausibility of model's explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) on toxicity detection, improving fairness.

Cite

CITATION STYLE

APA

Ahia, O., Gonen, H., Balachandran, V., Tsvetkov, Y., & Smith, N. A. (2023). LEXPLAIN: Improving Model Explanations via Lexicon Supervision. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 207–216). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.starsem-1.19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free