Learning and coordinating repertoires of behaviors with common reward: Credit assignment and module activation

N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Understanding extended natural behaviorwill require a theoretical understanding of the entire system as it is engaged in perception and action involving multiple concurrent goals such as foraging for different foods while avoiding different predators and looking for a mate. A promising way to do so is reinforcement learning (RL) as it considers in a very general way the problem of choosing actions in order to maximize a measure of cumulative benefits through some form of learning, and many connections between RL and animal learning have been established. Within this framework, we consider the problem faced by a single agent comprising multiple separate elemental task learners that we call modules, which jointly learn to solve tasks that arise as different combinations of concurrent individual tasks across episodes. While sometimes the goal may be to collect different types of food, at other times avoidance of several predators may be required. The individual modules have separate state representations, i.e. they obtain different inputs but have to carry out actions jointly in the common action space of the agent. Only a single measure of success is observed, which is the sum of the reward contributions from all component tasks. We provide a computational solution for learning elemental task solutions as they contribute to composite goals and a solution for how to learn to schedule these modules for different composite tasks across episodes. The algorithm learns to choose the appropriate modules for a particular task and solves the problem of calculating each module's contribution to the total reward. The latter calculation works by combining current reward estimates with an error signal resulting from the difference between the global reward and the sum of reward estimates of other co-activemodules.As the modules interact through their action value estimates, action selection is based on their composite contribution to individual task combinations. The algorithm learns good action value functions for component tasks and task combinationswhich is demonstrated on small classical problems and a more complex visuomotor navigation task.

Cite

CITATION STYLE

APA

Rothkopf, C. A., & Ballard, D. H. (2013). Learning and coordinating repertoires of behaviors with common reward: Credit assignment and module activation. In Computational and Robotic Models of the Hierarchical Organization of Behavior (Vol. 9783642398759, pp. 99–125). Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free