How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE Datasets

0Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

A central question in natural language understanding (NLU) research is whether high performance demonstrates the models’ strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. These involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high on most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that they leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used to assess the extent to which a specific dataset constitutes a proper testbed for evaluating models’ language understanding capabilities.

Cite

CITATION STYLE

APA

Talman, A., Apidianaki, M., Chatzikyriakidis, S., & Tiedemann, J. (2022). How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE Datasets. In *SEM 2022 - 11th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference (pp. 226–233). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.starsem-1.20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free