Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Shanshan Huang; Kenny Q. Zhu

Conference ProceedingsOPEN ACCESS

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) 4521-4530

DOI: 10.18653/v1/2023.findings-emnlp.299

0Citations

10Readers

Abstract

Recent studies have shown that many natural language understanding and reasoning datasets contain statistical cues that can be exploited by NLP models, resulting in an overestimation of their capabilities. Existing methods, such as “hypothesis-only” tests and CheckList, are limited in identifying these cues and evaluating model weaknesses. We introduce ICQ (I-See-Cue), a lightweight, general statistical profiling framework that automatically identifies potential biases in multiple-choice NLU datasets without requiring additional test cases. ICQ assesses the extent to which models exploit these biases through black-box testing, addressing the limitations of current methods. In this work, we conduct a comprehensive evaluation of statistical biases in 10 popular NLU datasets and 4 models, confirming prior findings, revealing new insights, and offering an online demonstration system to encourage users to assess their own datasets and models. Furthermore, we present a case study on investigating ChatGPT's bias, providing valuable recommendations for practical applications.

Cite

CITATION STYLE

APA

Huang, S., & Zhu, K. Q. (2023). Statistically Profiling Biases in Natural Language Reasoning Datasets and Models. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4521–4530). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.299

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Abstract

Cite

Register to see more suggestions