Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design

Lindia Tjuatja; Valerie Chen; Tongshuang Wu; Ameet Talwalkwar; Graham Neubig

Journal ArticleOPEN ACCESS

Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design

Transactions of the Association for Computational Linguistics (2024) 12 1011-1026

DOI: 10.1162/tacl_a_00685

55Citations

66Readers

Get full text

Abstract

One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity to prompt wording— but interestingly, humans also display sensi-tivities to instruction changes in the form of response biases. We investigate the extent to which LLMs reflect human response biases, if at all. We look to survey design, where human response biases caused by changes in the wordings of ‘‘prompts’’ have been extensively explored in social psychology literature. Draw-ing from these works, we design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey question-naires. Our comprehensive evaluation of nine models shows that popular open and commer-cial LLMs generally fail to reflect human-like behavior, particularly in models that have un-dergone RLHF. Furthermore, even if a model shows a significant change in the same direc-tion as humans, we find that they are sensitive to perturbations that do not elicit significant changes in humans. These results highlight the pitfalls of using LLMs as human proxies, and underscore the need for finer-grained charac-terizations of model behavior.1.

Cite

CITATION STYLE

APA

Tjuatja, L., Chen, V., Wu, T., Talwalkwar, A., & Neubig, G. (2024). Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design. Transactions of the Association for Computational Linguistics, 12, 1011–1026. https://doi.org/10.1162/tacl_a_00685

Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design

Abstract

Cite

Register to see more suggestions