How Reliable are Model Diagnostics?

16Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

Abstract

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU. This paper takes a step back and asks an important and timely question: how reliable are these diagnostics in providing insight into models and training setups? We critically examine three recent diagnostic tests for pre-trained language models, and find that likelihood-based and representation-based model diagnostics are not yet as reliable as previously assumed. Based on our empirical findings, we also formulate recommendations for practitioners and researchers.

Cite

CITATION STYLE

APA

Aribandi, V., Tay, Y., & Metzler, D. (2021). How Reliable are Model Diagnostics? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 1778–1785). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.155

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free