Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

0Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Recent studies have shown that many natural language understanding and reasoning datasets contain statistical cues that can be exploited by NLP models, resulting in an overestimation of their capabilities. Existing methods, such as “hypothesis-only” tests and CheckList, are limited in identifying these cues and evaluating model weaknesses. We introduce ICQ (I-See-Cue), a lightweight, general statistical profiling framework that automatically identifies potential biases in multiple-choice NLU datasets without requiring additional test cases. ICQ assesses the extent to which models exploit these biases through black-box testing, addressing the limitations of current methods. In this work, we conduct a comprehensive evaluation of statistical biases in 10 popular NLU datasets and 4 models, confirming prior findings, revealing new insights, and offering an online demonstration system to encourage users to assess their own datasets and models. Furthermore, we present a case study on investigating ChatGPT's bias, providing valuable recommendations for practical applications.

Cite

CITATION STYLE

APA

Huang, S., & Zhu, K. Q. (2023). Statistically Profiling Biases in Natural Language Reasoning Datasets and Models. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4521–4530). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.299

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free