Lecture for Reinforcement Learning study group held on August 19th, 2017.
Reference book: http://incompleteideas.net/book/the-book.html
Video: https://youtu.be/xv5ZsOSf6ZQ
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/permalink/1796526840669749/)
1. Reinforcement Learning
Brain’s Reward Systems
By Jason Tsai (蔡志順) on August 19th, 2017
@ Mozilla Community Space Taipei
For Reinforcement Learning Study Group
2. *Copyright Notice:
Some materials from this presentation are taken
from the book “Reinforcement Learning: An
Introduction” (2nd edition draft in progress)
authored by Richard S. Sutton and Andrew G. Barto.
The other quoted sources are mentioned in the
respective slides. This presentation itself adopts
Creative Commons license.
8. Dopamine Neurons Form Huge Synaptic
Contacts to Target
*Picture taken from http://www.jneurosci.org/content/29/2/444
9. Key Reward-related Neural Circuits
*Picture taken from http://www.nature.com/nrn/journal/v16/n3/fig_tab/nrn3877_F2.html
10. Optogenetic Methods for Brain Control
*Picture taken from http://www.nytimes.com/2011/05/17/science/17optics.html
11. Hebb’s Learning Rule
"When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic
change takes place in one or both cells such that A's efficiency, as one of
the cells firing B, is increased."
* Donald O. Hebb, The Organization of Behavior: A Neuropsychological Theory. 1949 & 2002. Page 62.
15. Reinforcement Signal /
Reward Prediction Errors
The function of a reinforcement signal is to direct
the changes a learning algorithm makes in an
agent’s policy, value estimates, or environment
models.
For a TD method, the reinforcement signal at time t
is the TD error
Reward Prediction Errors (RPEs) specifically
measure discrepancies between the expected and the
received reward signal. TD errors are special kinds
RPEs that signal discrepancies between current and
earlier expectations of reward over the long-term.
16. The Reward Prediction Error Hypothesis (of
Dopamine Neuron Activity)
It proposes that one of the functions of the phasic
activity of dopamine-producing neurons in
mammals is to deliver an error between an old and a
new estimate of expected future reward to target
areas throughout the brain.
Experimental evidence suggests that the
neurotransmitter dopamine signals RPEs, and
further, that the phasic activity of dopamine-
producing neurons in fact conveys TD errors.
17. Time Course of the TD Model
*Picture modified from Yael Niv, Reinforcement learning in the brain. Journal of Mathematical Psychology
53 (3), 139-154 (2009)
19. Predictions of TD Learning complies with
Dopaminergic Firing Patterns
20. *Picture from this and last slides are taken from
Yael Niv, Reinforcement learning in the brain. 2009
TD Prediction Errors / Dopamine Neurons
Activity in Classical Conditioning Task
25. Advanced Topics
Hedonistic Neuron Hypothesis (Reinforcement
Learning Agent)
Collective Reinforcement Learning (Multi-Agent
Reinforcement Learning / Game Theory)
Model-based Methods in the Brain (Model-based
Reinforcement Learning)
Addiction (positive and negative reinforcement)
26. Perceptual Decision-Making
*Picture taken from Kyle Dunovan and Timothy Verstynen, Believer-Skeptic Meets Actor-Critic: Rethinking the Role
of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning. Front. Neurosci. 10:106 (2016)
27. Alternative Models of Operant Learning
*Picture taken from Hanan Shteingart and Yonatan Loewenstein, Reinforcement learning and human
behavior. Current Opinion in Neurobiology 2014, 25:93–98
28. Key Neural Circuits of Addiction
*Picture taken from http://www.nature.com/nrn/journal/v2/n2/fig_tab/nrn0201_119a_F1.html