Evaluating and characterizing human rationales

Samuel Carton; Anirudh Rathore; Chenhao Tan

Conference ProceedingsOPEN ACCESS

Evaluating and characterizing human rationales

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020) 9294-9307

DOI: 10.18653/v1/2020.emnlp-main.747

35Citations

85Readers

Abstract

Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using “fidelity curves” to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

Cite

CITATION STYLE

APA

Carton, S., Rathore, A., & Tan, C. (2020). Evaluating and characterizing human rationales. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 9294–9307). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-main.747

Evaluating and characterizing human rationales

Abstract

Cite

Register to see more suggestions