Do state-of-the-art natural language understanding models care about word order? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Although BERT embeddings are famously contextual, the contribution of each individual word to classification is almost unchanged even after its surrounding words are shuffled. BERT-based models exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequence-pair inputs in natural language inference) to make correct decisions when tokens are randomly shuffled. Encouraging models to capture word order information improves the performance on most GLUE tasks and SQuAD 2.0. Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Pham, T. M., Bui, T., Mai, L., & Nguyen, A. (2021). Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 1145–1160). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.98