Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

  • Jitsev J
  • Abraham N
  • Morrison A
  • et al.
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The basal ganglia network is thought to be involved in adaptation of organism's behavior when facing its posi-tive and negative consequences, that is, in reinforcement learning. It has been hypothesized that dopamine (DA) modulated plasticity of synapses projecting from differ-ent cortical areas to the input nuclei of the basal ganglia, the striatum, plays a central role in this form of learn-ing, being responsible for updating future outcome expectations and action preferences. In this scheme, DA transmission is considered to convey a prediction error signal that is generated if internal expectations do not match the outcomes observed after action execution. So far, there has been no satisfying model for what neural circuits computing this signal within the basal ganglia may look like, how this computation is performed and what is the mechanistic role of DA release in adapting the system towards optimal behavior in a given task. Aiming towards a model of a canonical circuit for learning task-conform behavior from both reward and punishment, we extended a previously introduced spik-ing actor-critic network model of the basal ganglia [1] to contain the segregation of both the dorsal (actor) and ventral (critic) striatum into populations of D1 and D2 medium spiny neurons (MSNs). This segregation allows explicit, separate representation of both positive and negative expected outcomes by the distinct populations in the ventral striatum. The positive and negative com-ponents of expected outcome were fed to dopamine (DA) neurons in SNc/VTA region, which compute and signal reward prediction error by DA release. Based on recent experimental work [2], DA level was assumed to modulate plasticity of D1 and D2 synapses in opposing way, inducing LTP on D1 and LTD on D2 synapses if being high and vice versa if being low. Crucially, this form of opponent plasticity implements temporal-differ-ence (TD)-like update of both positive and negative out-come expectations separately and performs appropriate action selection adaptation. We implemented the network in the NEST simulator [3] using leaky integrate-and-fire spiking neurons and designed a battery of experiments involving application of reward and punishment in various grid world tasks. In each task, an agent had to explore the states and learn to maximize the total reward obtained. Number of states, magnitudes and delays of reward and punishment were manipulated across different tasks. We demon-strate that across the tasks the network can learn both to approach the delayed rewards while consequently avoiding punishments, the latter posing severe difficul-ties for the previous model without D1/D2 segregation [1]. Thus, the spiking neural network model highlights the functional role of D1/D2 MSN segregation within the striatum in implementing appropriate TD-like learn-ing from both reward and punishment and explains necessity for opponent direction of DA-dependent plas-ticity found at synapses converging on distinct striatal MSN types. This modeling approach can be extended in the future work to study how abnormal D1/D2 plasticity may lead to a reorganization of the basal ganglia net-work towards pathological, dysfunctional states, like for instance those observed in Parkinson disease under con-dition of progressive dopamine depletion.

Cite

CITATION STYLE

APA

Jitsev, J., Abraham, N., Morrison, A., & Tittgemeyer, M. (2013). Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning. BMC Neuroscience, 14(S1). https://doi.org/10.1186/1471-2202-14-s1-p199

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free