Model based planners reflect on their modelfree propensities

Rani Moran; Mehdi Keramati; Raymond J. Dolan

Journal ArticleOPEN ACCESS

Model based planners reflect on their modelfree propensities

PLoS Computational Biology (2021) 17(1)

DOI: 10.1371/JOURNAL.PCBI.1008552

4Citations

36Readers

Abstract

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.

Cite

CITATION STYLE

APA

Moran, R., Keramati, M., & Dolan, R. J. (2021). Model based planners reflect on their modelfree propensities. PLoS Computational Biology, 17(1). https://doi.org/10.1371/JOURNAL.PCBI.1008552

Model based planners reflect on their modelfree propensities

Abstract

Cite

Register to see more suggestions