Reinforcement Learning: Chapter 15 Neuroscience

Reinforcement Learning
Brain’s Reward Systems
By Jason Tsai (蔡志順) on August 19th, 2017
@ Mozilla Community Space Taipei
For Reinforcement Learning Study Group

*Copyright Notice:
Some materials from this presentation are taken
from the book “Reinforcement Learning: An
Introduction” (2nd edition draft in progress)
authored by Richard S. Sutton and Andrew G. Barto.
The other quoted sources are mentioned in the
respective slides. This presentation itself adopts
Creative Commons license.

Typical Neuron
*Picture taken from https://en.wikipedia.org/wiki/Neuron

Synapse
*Picture taken from http://www.nature.com/npp/journal/v35/n1/fig_tab/npp2009120f2.html

Neuron’s Spike: Action Potential
*Picture taken from http://hyperphysics.phy-astr.gsu.edu/hbase/Biology/actpot.html

Input from Cortical and Dopamine Neurons
to Striatal Neurons

Dopamine Neurons Form Huge Synaptic
Contacts to Target
*Picture taken from http://www.jneurosci.org/content/29/2/444

Key Reward-related Neural Circuits
*Picture taken from http://www.nature.com/nrn/journal/v16/n3/fig_tab/nrn3877_F2.html

Optogenetic Methods for Brain Control
*Picture taken from http://www.nytimes.com/2011/05/17/science/17optics.html

Hebb’s Learning Rule
 "When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic
change takes place in one or both cells such that A's efficiency, as one of
the cells firing B, is increased."
* Donald O. Hebb, The Organization of Behavior: A Neuropsychological Theory. 1949 & 2002. Page 62.

Spike-Timing-Dependent Plasticity (STDP)
*Picture taken from https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/STDP

Temporal-Difference (TD) Backup
*Picture taken from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf

Reinforcement Signal /
Reward Prediction Errors
 The function of a reinforcement signal is to direct
the changes a learning algorithm makes in an
agent’s policy, value estimates, or environment
models.
 For a TD method, the reinforcement signal at time t
is the TD error
 Reward Prediction Errors (RPEs) specifically
measure discrepancies between the expected and the
received reward signal. TD errors are special kinds
RPEs that signal discrepancies between current and
earlier expectations of reward over the long-term.

The Reward Prediction Error Hypothesis (of
Dopamine Neuron Activity)
 It proposes that one of the functions of the phasic
activity of dopamine-producing neurons in
mammals is to deliver an error between an old and a
new estimate of expected future reward to target
areas throughout the brain.
 Experimental evidence suggests that the
neurotransmitter dopamine signals RPEs, and
further, that the phasic activity of dopamine-
producing neurons in fact conveys TD errors.

Time Course of the TD Model
*Picture modified from Yael Niv, Reinforcement learning in the brain. Journal of Mathematical Psychology
53 (3), 139-154 (2009)

The Behavior of the TD error δ
during TD Learning

Predictions of TD Learning complies with
Dopaminergic Firing Patterns

*Picture from this and last slides are taken from
Yael Niv, Reinforcement learning in the brain. 2009
TD Prediction Errors / Dopamine Neurons
Activity in Classical Conditioning Task

Response Shift of Dopamine Neurons

Actor-Critic Artificial Neural Networks /
Hypothetical Neural Mechanism

Algorithm of the Learning Rules

Advanced Topics
 Hedonistic Neuron Hypothesis (Reinforcement
Learning Agent)
 Collective Reinforcement Learning (Multi-Agent
Reinforcement Learning / Game Theory)
 Model-based Methods in the Brain (Model-based
Reinforcement Learning)
 Addiction (positive and negative reinforcement)

Perceptual Decision-Making
*Picture taken from Kyle Dunovan and Timothy Verstynen, Believer-Skeptic Meets Actor-Critic: Rethinking the Role
of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning. Front. Neurosci. 10:106 (2016)

Alternative Models of Operant Learning
*Picture taken from Hanan Shteingart and Yonatan Loewenstein, Reinforcement learning and human
behavior. Current Opinion in Neurobiology 2014, 25:93–98

Key Neural Circuits of Addiction
*Picture taken from http://www.nature.com/nrn/journal/v2/n2/fig_tab/nrn0201_119a_F1.html

Reinforcement Learning: Chapter 15 Neuroscience

Reinforcement Learning: Chapter 15 Neuroscience

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Reinforcement Learning: Chapter 15 Neuroscience

Similaire à Reinforcement Learning: Chapter 15 Neuroscience (20)

Plus de Jason Tsai

Plus de Jason Tsai (10)

Dernier

Dernier (20)

Reinforcement Learning: Chapter 15 Neuroscience