A Normative Account of Confirmation Bias DurinReinforcement Learning

Germain Lefebvre; Christopher Summerfield; Rafal Bogacz

Journal ArticleOPEN ACCESS

A Normative Account of Confirmation Bias DurinReinforcement Learning

Neural Computation (2022) 34(2) 307-337

DOI: 10.1162/neco_a_01455

35Citations

50Readers

Get full text

Abstract

Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.

Cite

CITATION STYLE

APA

Lefebvre, G., Summerfield, C., & Bogacz, R. (2022). A Normative Account of Confirmation Bias DurinReinforcement Learning. Neural Computation, 34(2), 307–337. https://doi.org/10.1162/neco_a_01455

A Normative Account of Confirmation Bias DurinReinforcement Learning

Abstract

Cite

Register to see more suggestions