SlideShare une entreprise Scribd logo
1  sur  30
Reinforcement Learning
Brain’s Reward Systems
By Jason Tsai (蔡志順) on August 19th, 2017
@ Mozilla Community Space Taipei
For Reinforcement Learning Study Group
*Copyright Notice:
Some materials from this presentation are taken
from the book “Reinforcement Learning: An
Introduction” (2nd edition draft in progress)
authored by Richard S. Sutton and Andrew G. Barto.
The other quoted sources are mentioned in the
respective slides. This presentation itself adopts
Creative Commons license.
Neuroscience
Typical Neuron
*Picture taken from https://en.wikipedia.org/wiki/Neuron
Synapse
*Picture taken from http://www.nature.com/npp/journal/v35/n1/fig_tab/npp2009120f2.html
Neuron’s Spike: Action Potential
*Picture taken from http://hyperphysics.phy-astr.gsu.edu/hbase/Biology/actpot.html
Input from Cortical and Dopamine Neurons
to Striatal Neurons
Dopamine Neurons Form Huge Synaptic
Contacts to Target
*Picture taken from http://www.jneurosci.org/content/29/2/444
Key Reward-related Neural Circuits
*Picture taken from http://www.nature.com/nrn/journal/v16/n3/fig_tab/nrn3877_F2.html
Optogenetic Methods for Brain Control
*Picture taken from http://www.nytimes.com/2011/05/17/science/17optics.html
Hebb’s Learning Rule
 "When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic
change takes place in one or both cells such that A's efficiency, as one of
the cells firing B, is increased."
* Donald O. Hebb, The Organization of Behavior: A Neuropsychological Theory. 1949 & 2002. Page 62.
Spike-Timing-Dependent Plasticity (STDP)
*Picture taken from https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/STDP
Models & Data
Temporal-Difference (TD) Backup
*Picture taken from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf
Reinforcement Signal /
Reward Prediction Errors
 The function of a reinforcement signal is to direct
the changes a learning algorithm makes in an
agent’s policy, value estimates, or environment
models.
 For a TD method, the reinforcement signal at time t
is the TD error
 Reward Prediction Errors (RPEs) specifically
measure discrepancies between the expected and the
received reward signal. TD errors are special kinds
RPEs that signal discrepancies between current and
earlier expectations of reward over the long-term.
The Reward Prediction Error Hypothesis (of
Dopamine Neuron Activity)
 It proposes that one of the functions of the phasic
activity of dopamine-producing neurons in
mammals is to deliver an error between an old and a
new estimate of expected future reward to target
areas throughout the brain.
 Experimental evidence suggests that the
neurotransmitter dopamine signals RPEs, and
further, that the phasic activity of dopamine-
producing neurons in fact conveys TD errors.
Time Course of the TD Model
*Picture modified from Yael Niv, Reinforcement learning in the brain. Journal of Mathematical Psychology
53 (3), 139-154 (2009)
The Behavior of the TD error δ
during TD Learning
Predictions of TD Learning complies with
Dopaminergic Firing Patterns
*Picture from this and last slides are taken from
Yael Niv, Reinforcement learning in the brain. 2009
TD Prediction Errors / Dopamine Neurons
Activity in Classical Conditioning Task
Response Shift of Dopamine Neurons
Actor-Critic Artificial Neural Networks /
Hypothetical Neural Mechanism
Algorithm of the Learning Rules
Go Beyond
Advanced Topics
 Hedonistic Neuron Hypothesis (Reinforcement
Learning Agent)
 Collective Reinforcement Learning (Multi-Agent
Reinforcement Learning / Game Theory)
 Model-based Methods in the Brain (Model-based
Reinforcement Learning)
 Addiction (positive and negative reinforcement)
Perceptual Decision-Making
*Picture taken from Kyle Dunovan and Timothy Verstynen, Believer-Skeptic Meets Actor-Critic: Rethinking the Role
of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning. Front. Neurosci. 10:106 (2016)
Alternative Models of Operant Learning
*Picture taken from Hanan Shteingart and Yonatan Loewenstein, Reinforcement learning and human
behavior. Current Opinion in Neurobiology 2014, 25:93–98
Key Neural Circuits of Addiction
*Picture taken from http://www.nature.com/nrn/journal/v2/n2/fig_tab/nrn0201_119a_F1.html
Reinforcement Learning: Chapter 15 Neuroscience
Reinforcement Learning: Chapter 15 Neuroscience

Contenu connexe

Tendances

OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
ijassn
 
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
Numenta
 

Tendances (20)

Sparsity In The Neocortex, And Its Implications For Machine Learning
Sparsity In The Neocortex,  And Its Implications For Machine LearningSparsity In The Neocortex,  And Its Implications For Machine Learning
Sparsity In The Neocortex, And Its Implications For Machine Learning
 
Lec 5
Lec 5Lec 5
Lec 5
 
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
 
Computational approaches for mapping the human connectome
Computational approaches for mapping the human connectomeComputational approaches for mapping the human connectome
Computational approaches for mapping the human connectome
 
Blue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajBlue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna Raj
 
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
 
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
 
The 5th WBA Hackathon Orientation -- Cerenaut Part
The 5th WBA Hackathon Orientation  -- Cerenaut PartThe 5th WBA Hackathon Orientation  -- Cerenaut Part
The 5th WBA Hackathon Orientation -- Cerenaut Part
 
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
 
Nencki321 day2
Nencki321 day2Nencki321 day2
Nencki321 day2
 
Open science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeOpen science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectome
 
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)
 
ICMNS Presentation: Presence of high order cell assemblies in mouse visual co...
ICMNS Presentation: Presence of high order cell assemblies in mouse visual co...ICMNS Presentation: Presence of high order cell assemblies in mouse visual co...
ICMNS Presentation: Presence of high order cell assemblies in mouse visual co...
 
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
 
[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNN[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNN
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Deep and Recurrent Neural Networks for Applications in Robotics
Deep and Recurrent Neural Networks for Applications in RoboticsDeep and Recurrent Neural Networks for Applications in Robotics
Deep and Recurrent Neural Networks for Applications in Robotics
 
Annintro
AnnintroAnnintro
Annintro
 

Similaire à Reinforcement Learning: Chapter 15 Neuroscience

Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
Shikhar Agarwal
 
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
pierstanislaopaolucc1
 
Chapter2 Power Point Lecture
Chapter2 Power Point LectureChapter2 Power Point Lecture
Chapter2 Power Point Lecture
Gladys Escalante
 
Chapter2 Power Point Lecture
Chapter2 Power Point LectureChapter2 Power Point Lecture
Chapter2 Power Point Lecture
Gladys Escalante
 

Similaire à Reinforcement Learning: Chapter 15 Neuroscience (20)

main
mainmain
main
 
Computational neuropharmacology drug designing
Computational neuropharmacology drug designingComputational neuropharmacology drug designing
Computational neuropharmacology drug designing
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
파이콘 한국 2020) 파이썬으로 구현하는 신경세포 기반의 인공 뇌 시뮬레이터
파이콘 한국 2020) 파이썬으로 구현하는 신경세포 기반의 인공 뇌 시뮬레이터파이콘 한국 2020) 파이썬으로 구현하는 신경세포 기반의 인공 뇌 시뮬레이터
파이콘 한국 2020) 파이썬으로 구현하는 신경세포 기반의 인공 뇌 시뮬레이터
 
B42010712
B42010712B42010712
B42010712
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
 
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
 
Ag044216224
Ag044216224Ag044216224
Ag044216224
 
Functional magnetic resonance imaging-based brain decoding with visual semant...
Functional magnetic resonance imaging-based brain decoding with visual semant...Functional magnetic resonance imaging-based brain decoding with visual semant...
Functional magnetic resonance imaging-based brain decoding with visual semant...
 
Mind reading computers
Mind reading computersMind reading computers
Mind reading computers
 
AS application
AS applicationAS application
AS application
 
Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...
 
BCI Paper
BCI PaperBCI Paper
BCI Paper
 
PowerPoint Presentation - Research Project 2015
PowerPoint Presentation - Research Project 2015PowerPoint Presentation - Research Project 2015
PowerPoint Presentation - Research Project 2015
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
bbbPaper
bbbPaperbbbPaper
bbbPaper
 
Chapter2 Power Point Lecture
Chapter2 Power Point LectureChapter2 Power Point Lecture
Chapter2 Power Point Lecture
 
Chapter2 Power Point Lecture
Chapter2 Power Point LectureChapter2 Power Point Lecture
Chapter2 Power Point Lecture
 

Plus de Jason Tsai

Plus de Jason Tsai (10)

基於深度學習的人臉辨識技術簡介
基於深度學習的人臉辨識技術簡介基於深度學習的人臉辨識技術簡介
基於深度學習的人臉辨識技術簡介
 
Neural Network Design: Chapter 17 Radial Basis Networks
Neural Network Design: Chapter 17 Radial Basis NetworksNeural Network Design: Chapter 17 Radial Basis Networks
Neural Network Design: Chapter 17 Radial Basis Networks
 
Neural Network Design: Chapter 18 Grossberg Network
Neural Network Design: Chapter 18 Grossberg NetworkNeural Network Design: Chapter 18 Grossberg Network
Neural Network Design: Chapter 18 Grossberg Network
 
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
 
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
Convolutional Neural Networks (CNN) — 卷積神經網路的前世今生
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Reinforcement Learning: Chapter 15 Neuroscience

  • 1. Reinforcement Learning Brain’s Reward Systems By Jason Tsai (蔡志順) on August 19th, 2017 @ Mozilla Community Space Taipei For Reinforcement Learning Study Group
  • 2. *Copyright Notice: Some materials from this presentation are taken from the book “Reinforcement Learning: An Introduction” (2nd edition draft in progress) authored by Richard S. Sutton and Andrew G. Barto. The other quoted sources are mentioned in the respective slides. This presentation itself adopts Creative Commons license.
  • 4. Typical Neuron *Picture taken from https://en.wikipedia.org/wiki/Neuron
  • 5. Synapse *Picture taken from http://www.nature.com/npp/journal/v35/n1/fig_tab/npp2009120f2.html
  • 6. Neuron’s Spike: Action Potential *Picture taken from http://hyperphysics.phy-astr.gsu.edu/hbase/Biology/actpot.html
  • 7. Input from Cortical and Dopamine Neurons to Striatal Neurons
  • 8. Dopamine Neurons Form Huge Synaptic Contacts to Target *Picture taken from http://www.jneurosci.org/content/29/2/444
  • 9. Key Reward-related Neural Circuits *Picture taken from http://www.nature.com/nrn/journal/v16/n3/fig_tab/nrn3877_F2.html
  • 10. Optogenetic Methods for Brain Control *Picture taken from http://www.nytimes.com/2011/05/17/science/17optics.html
  • 11. Hebb’s Learning Rule  "When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." * Donald O. Hebb, The Organization of Behavior: A Neuropsychological Theory. 1949 & 2002. Page 62.
  • 12. Spike-Timing-Dependent Plasticity (STDP) *Picture taken from https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/STDP
  • 14. Temporal-Difference (TD) Backup *Picture taken from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf
  • 15. Reinforcement Signal / Reward Prediction Errors  The function of a reinforcement signal is to direct the changes a learning algorithm makes in an agent’s policy, value estimates, or environment models.  For a TD method, the reinforcement signal at time t is the TD error  Reward Prediction Errors (RPEs) specifically measure discrepancies between the expected and the received reward signal. TD errors are special kinds RPEs that signal discrepancies between current and earlier expectations of reward over the long-term.
  • 16. The Reward Prediction Error Hypothesis (of Dopamine Neuron Activity)  It proposes that one of the functions of the phasic activity of dopamine-producing neurons in mammals is to deliver an error between an old and a new estimate of expected future reward to target areas throughout the brain.  Experimental evidence suggests that the neurotransmitter dopamine signals RPEs, and further, that the phasic activity of dopamine- producing neurons in fact conveys TD errors.
  • 17. Time Course of the TD Model *Picture modified from Yael Niv, Reinforcement learning in the brain. Journal of Mathematical Psychology 53 (3), 139-154 (2009)
  • 18. The Behavior of the TD error δ during TD Learning
  • 19. Predictions of TD Learning complies with Dopaminergic Firing Patterns
  • 20. *Picture from this and last slides are taken from Yael Niv, Reinforcement learning in the brain. 2009 TD Prediction Errors / Dopamine Neurons Activity in Classical Conditioning Task
  • 21. Response Shift of Dopamine Neurons
  • 22. Actor-Critic Artificial Neural Networks / Hypothetical Neural Mechanism
  • 23. Algorithm of the Learning Rules
  • 25. Advanced Topics  Hedonistic Neuron Hypothesis (Reinforcement Learning Agent)  Collective Reinforcement Learning (Multi-Agent Reinforcement Learning / Game Theory)  Model-based Methods in the Brain (Model-based Reinforcement Learning)  Addiction (positive and negative reinforcement)
  • 26. Perceptual Decision-Making *Picture taken from Kyle Dunovan and Timothy Verstynen, Believer-Skeptic Meets Actor-Critic: Rethinking the Role of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning. Front. Neurosci. 10:106 (2016)
  • 27. Alternative Models of Operant Learning *Picture taken from Hanan Shteingart and Yonatan Loewenstein, Reinforcement learning and human behavior. Current Opinion in Neurobiology 2014, 25:93–98
  • 28. Key Neural Circuits of Addiction *Picture taken from http://www.nature.com/nrn/journal/v2/n2/fig_tab/nrn0201_119a_F1.html