Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots

  • Pérez-Uribe A
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Neuroscientists have identified a neural substrate of predic- tion and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can “learn to predict” by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its per- formance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world

Cite

CITATION STYLE

APA

Pérez-Uribe, A. (2001). Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots (pp. 522–533). https://doi.org/10.1007/3-540-44597-8_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free