Are Language Models Worse than Humans at Following Prompts? It's Complicated

2Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not “human like”. In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Our findings caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically.

Cite

CITATION STYLE

APA

Webson, A., Loo, A. M., Yu, Q., & Pavlick, E. (2023). Are Language Models Worse than Humans at Following Prompts? It’s Complicated. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 7662–7686). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.514

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free