SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Reinforcement Learning and Neuroscience
Michael Bosello
Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy
Intelligent Robotic Systems – Exam
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 1 / 37
Outline
1 Introduction
2 Temporal Difference
Computational Temporal Difference
Temporal Difference in the Brain
Classical Conditioning
TD Model
The Reward Prediction Error Hypothesis
TD Error / Dopamine Correspondences
3 Agent navigation Inspired by Neuroscience
Navigation using grid-like representation
The Network architecture
Navigation Experiments
4 Conclusions and Suggestions
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 2 / 37
Introduction
Next in Line...
1 Introduction
2 Temporal Difference
Computational Temporal Difference
Temporal Difference in the Brain
Classical Conditioning
TD Model
The Reward Prediction Error Hypothesis
TD Error / Dopamine Correspondences
3 Agent navigation Inspired by Neuroscience
Navigation using grid-like representation
The Network architecture
Navigation Experiments
4 Conclusions and Suggestions
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 3 / 37
Introduction
Machine Learning and Neuroscience
Intertwined history [Hassabis et al., 2017][Pillow and Sahani, 2019]
Since the dawn of artificial intelligence (AI), neuroscience and AI have virtuously
influenced each other
Some of the most important tools that are available to machine learning (ML)
originate from neuroscientific metaphors
Reinforcement Learning (RL)
Deep Learning (DL)
ML metaphors have guided the exploration of neural functions by providing
functional metaphors useful to formulate hypotheses
Why it is important now
ML Leading researchers have particularly emphasized the opportunity of
drawing inspiration from neuroscience for the next generation of AI
[Hassabis et al., 2017][Lake et al., 2017][Ford, 2018][Ullman, 2019]
Neuroscience AI offers ideas about the possible mechanism in the brain and means
for formalizing concepts.
Also, the use of ML to analyze neuroimaging datasets to find patterns
[Hassabis et al., 2017][Pillow and Sahani, 2019]
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 4 / 37
Introduction
Reinforcement Learning and Neuroscience
This topic is very wide. We will focus on two points:
RL [Sutton and Barto, 2018]
The most remarkable virtuous circle of ML and neuroscience
Trial and error Inspired by research into animal learning (Law of effect)
Temporal Difference (TD) learning inspired by classical conditioning
Evidence that the brain implements a form of TD learning
Core parallel: TD learning / dopamine
Why it is useful now
RL Findings in neuroscience can provide a direction to more effective
agents [Hassabis et al., 2017] e.g. agent navigation inspired by the
entorhinal cortex [Banino et al., 2018]
Neuroscience RL provides insight for psychiatric diseases
[Pillow and Sahani, 2019][Sutton and Barto, 2018]
Suggestions for further reading are provided
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 5 / 37
Introduction
Disclaimer
Background
We assume prior knowledge of RL (and DL)
We don’t assume prior knowledge of neuroscience (and psychology)
Sources
The part on TD is based on [Sutton and Barto, 2018], unless otherwise stated
The part on navigation is based on [Banino et al., 2018], unless otherwise stated
All the images come from these two sources (but adapted for the context)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 6 / 37
Temporal Difference
Next in Line...
1 Introduction
2 Temporal Difference
Computational Temporal Difference
Temporal Difference in the Brain
Classical Conditioning
TD Model
The Reward Prediction Error Hypothesis
TD Error / Dopamine Correspondences
3 Agent navigation Inspired by Neuroscience
Navigation using grid-like representation
The Network architecture
Navigation Experiments
4 Conclusions and Suggestions
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 7 / 37
Temporal Difference Computational Temporal Difference
Temporal Difference in Agents
Central idea in RL
Learn from experience (sampling like Monte Carlo methods)
→ Doesn’t need a model
Estimation updates are based on other estimations (bootstrapping like Dynamic
Programming)
→ Doesn’t need to wait for the end of an episode to update (online learning)
Estimation of vπ – TD(0)
V (St) ← V (St) + α[Rt+1 + γV (St+1) − V (St)]
TD error
δt = Rt+1 + γV (St+1) − V (St)
Sarsa
Q(St, At) ← Q(St, At) + α[Rt+1 + γQ(St+1, At+1) − Q(St, At)]
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 8 / 37
Temporal Difference Temporal Difference in the Brain
Classical Conditioning (Pavlovian Conditioning) I
The physiologist Ivan Pavlov discovered that animals’ innate reflexes to certain stimuli
can come to be triggered also by other unrelated stimuli
In his experiment, a dog receives food after the sound of a metronome
Initially, the dog produces more saliva only in response to the sight of food
After some trials, the dog starts salivating also in response to the sound stimulus
Unconditioned stimulus (US)
The natural trigger (food)
Unconditioned response (UR)
The unborn reflex (salivation)
Conditioned stimulus (CS)
The new predictive stimulus (sound of the metronome)
Conditioned response (CR)
The acquired response (salivation)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 9 / 37
Temporal Difference Temporal Difference in the Brain
Classical Conditioning (Pavlovian Conditioning) II
The animal learns the predictive relationship between the CS and the US
So that the animal can anticipate the US and prepare or protect himself with a CR
(which can differ from the UR and be more effective)
We are considering only the prediction part i.e. policy evaluation
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 10 / 37
Temporal Difference Temporal Difference in the Brain
Classical Conditioning (Pavlovian Conditioning) III
Blocking
When the learning of one CR to a potential CS is blocked by another CS
If you use a tone as a CS, and after that, you add light as a CS, there will be no
response to the light alone
Do conditioning depends only on simple temporal contiguity?
Higher-order conditioning
When a learned CS acts as a US in conditioning another CS
Like in the previous experiment, a dog is conditioned with a metronome sound
In the following trials, a black box is placed in the dog’s line of vision before the
metronome sound, but no food is given
The dog starts responding with salivation also to the sight of the black box,
even though it has never anticipated the original US (food)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 11 / 37
Temporal Difference Temporal Difference in the Brain
The Rescorla-Wagner Model
It explains blocking: the animal learns only when his expectations are violated
(surprising)
Each CS has an associative strength that represent its reliability
Let’s assume we have a US Y , a CS A, a CS X and the compound CS AX with
the respectively associative strengths VA, VX , VAX
Aggregate associative strength Vax = VA + VX
The associative strengths change over successive trials according to:
∆VA = αAβY (RY − VAX )
∆VX = αX βY (RY − VAX )
Where αAβY αX βY are the step-size parameters, and RY is the associative strength
supported by the US Y
The associative strengths increase until they reach the supported level RY
If an animal is conditioned with a CS A, adding a CS X will have almost no effect
since the prediction error is already reduced to a low value – there is no surprise –
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 12 / 37
Temporal Difference Temporal Difference in the Brain
TD Model
From Rescorla-Wagner to TD
The TD model is based on the Rescorla-Wagner one
A state is defined as a vector: x(s) = (x1(s), x2(s), . . . , xn(s))
In Rescorla-Wagner it is the set of CS meanwhile in TD it is more abstract
t is a time step, not a trial
The aggregate associative strength is: ˆv(s, w) = wT
x(s)
w is the associative strength vector
Like a value estimate
The new update formula
Associative strength vector update wt+1 = wt + αδtx(St)
TD error δt = Rt+1 + γˆv(St+1, wt) − ˆv(St, wt)
With γ = 0 you return to Rescorla-Wagner
To check the origins of the model: [Sutton and Barto, 1981] [Sutton and Barto, 1987]
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 13 / 37
Temporal Difference Temporal Difference in the Brain
Neuroscience Basics
Neuroscience
Is the study of the nervous system, its functions, and its changes over time
Neurons
Neurons are cells that process and transfer electrical and chemical signals
A neuron is said to fire when it generates a spike (electrical pulses)
A neuron can reach many other neurons
The background activity of a neuron is its firing rate not related to the stimuli of an
experiment
The phasic activity of a neuron is caused by synaptic input
Synapses are structures that mediate neurons communication
A synapse can produce a chemical neurotransmitter when the neuron fires
A neurotransmitter can inhibit or excite the postsynaptic neuron
Neuromodulators are neurotransmitters having additional effects that can alter the
operation of synapses
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 14 / 37
Temporal Difference Temporal Difference in the Brain
Two assumptions
Since the firing rate of a neuron can’t be negative, the neuron
activity is δt−1 + bt, where bt is the background firing rate.
→ A negative error corresponds to a firing rate below b
The representation of the states allow keeping track of time
passed between the cue and the reward
There is a different signal for each time step → the state is
time-dependent
It is used the complete serial compound representation
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 15 / 37
Temporal Difference Temporal Difference in the Brain
The Reward Prediction Error Hypothesis
States that “one of the functions of the phasic activity of dopamine-producing
neurons in mammals is to deliver an error between an old and a new estimate of
expected future reward to target areas throughout the brain.” [Sutton and Barto, 2018]
[Montague et al., 1996] showed that the concept of TD error aligns with the feature
of dopamine neurons
Dopamine
is a neuromodulator
that broadcasts reward prediction errors (not rewards as was previously thought)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 16 / 37
Temporal Difference Temporal Difference in the Brain
Experimental Support for the Hypothesis
Schultz’s group conducted a series of experiments supporting that the responses of
dopamine neurons correspond actually to a TD error and not a simpler error like the
Rescorla-Wagner one.
Experiment
Task 1: monkeys are trained to depress a lever after a light (trigger cue) is
illuminated to obtain juice
Task 2: there are two levers, each one with a light (instruction cue) indicating which
lever will produce juice. The instruction precedes the trigger that must be awaited
Initially, dopamine respond to reward
During training, dopamine response
shift to the earlier stimulus
A landmark of TD learning
When the task is learned, dopamine
response decrease
When moving to task 2, dopamine
response increase
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 17 / 37
Temporal Difference Temporal Difference in the Brain
Correspondence I
Dopamine neurons respond to unpredicted rewards
δt−1 = Rt + Vt − Vt−1 = Rt + 0 − 0 = Rt
We consider tabular TD(0)
Without discounting
⇒ The return to be predicted is
simply R* for each state
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 18 / 37
Temporal Difference Temporal Difference in the Brain
Correspondence II
Dopamine neurons respond to the earliest predictor
From any predicting state to another one δt−1 = Rt + Vt − Vt−1 = 0 + R − R = 0
From last predicting state to end δt−1 = Rt + Vt − Vt−1 = R + 0 − R = 0
From any state to the earliest predicting state δt−1 = Rt + Vt − Vt−1 = 0 + R − 0 = R
Reward spreads backward
until convergence
The states preceding the
earliest reward-predicting
state are not reliable
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 19 / 37
Temporal Difference Temporal Difference in the Brain
Correspondence III
Dopamine neurons firing rates decrease below baseline if a reward does not
occur at its expected time.
When monkeys pull the wrong lever, they receive no juice
They internally keep track of time somehow
δt−1 = Rt + Vt − Vt−1 = 0 + 0 − R = −R
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 20 / 37
Temporal Difference Temporal Difference in the Brain
Further readings
Actor-critic in the brain
Dopamine mainly targets two parts of the striatum
Effects depend on the proprieties of the target
It is supposed that
The dorsal striatum acts as an actor
The ventral striatum acts as a critic
Other parallels
More parallels and topic are introduced in [Sutton and Barto, 2018]
Eligibility traces and Hedonistic Neurons
...
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 21 / 37
Agent navigation Inspired by Neuroscience
Next in Line...
1 Introduction
2 Temporal Difference
Computational Temporal Difference
Temporal Difference in the Brain
Classical Conditioning
TD Model
The Reward Prediction Error Hypothesis
TD Error / Dopamine Correspondences
3 Agent navigation Inspired by Neuroscience
Navigation using grid-like representation
The Network architecture
Navigation Experiments
4 Conclusions and Suggestions
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 22 / 37
Agent navigation Inspired by Neuroscience Navigation using grid-like representation
Navigation using grid-like representations I
Entorhinal cortex basics [Rowland et al., 2016] [Banino et al., 2018]
The entorhinal cortex creates a neural representation of space
through functionally dedicated cell types
whose firing rates depend on the animal’s position
Grid cells respond to the animal’s location in the environment
There are also: border cells, speed cells, head direction cells
Grid cells have hexagonally arranged firing fields that tile the environment surface
A population of grid cell provide a unique representation (code) of locations
Grid cells perform path integration by taking inputs from speed cells and head
direction cells and resulting in place cells
Path integration is the ability to self-localize based on self-motion
Grid cells are critical for vector-based navigation as they provide Euclidean spatial
metric supporting calculation of goal-directed vectors
Vector-based navigation is the process to follow direct routes to a remembered goal
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 23 / 37
Agent navigation Inspired by Neuroscience Navigation using grid-like representation
Navigation using grid-like representations II
[Banino et al., 2018] developed a deep RL agent with mammal-like navigation abilities
They trained a recurrent network to perform path integration, leading to the
emergence of representations resembling entorhinal cells
The network is incorporated in a deep RL architecture to perform navigation
Results
Grid cells representations endow agents with the ability to perform proficient
vector-based navigation
The emergent representation provides the Euclidean spatial metric to
calculate goal-directed vectors and the relative positions of two points by examining the
difference in the current vector code and the code of a remembered goal
locate goals in challenging, unfamiliar, and changeable environments
The results provide strong empirical support to
Theories that see grid cells critical in vector-based navigation (support that was
previously missing)
Grid cells’ role in providing location code updated by self-motion cues
The performance of the agent surpassed comparison agents and an expert human
The agents conducted shortcut behaviors like those performed by mammals
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 24 / 37
Agent navigation Inspired by Neuroscience The Network architecture
Neural Network Performing Path Integration
The network is a long short-term memory (LSTM)
As it happens in the brain:
It must update its estimate of location and head direction (by
predicting place and head-direction cells activations)
It takes as input transactional and angular velocities with
perturbation and a visual input
It is trained with simulated place and head cells activations
during trajectories modeled on those of foraging rodents
This form of supervision is also present in rodent pups where
place and head cells guide the growth of grid cells
The visual input is processed by a CNN
That mimics the correction performed by place cells based on
environmental cues
It generates place and head cell activations
The output is silenced 95% of the time (to mimic imperfect
observations from behaving animals)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 25 / 37
Agent navigation Inspired by Neuroscience The Network architecture
Grid-Like Representations
The last one is a linear layer with dropout
Individual units in the linear layer developed stable spatial activity profiles similar
to those of neurons within the entorhinal cortex
Grid like representations didn’t emerge without dropout regularization
Dropout is also present in the brain – it is an inductive bias [Hassabis et al., 2017]
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 26 / 37
Agent navigation Inspired by Neuroscience The Network architecture
Neural Network Performing Vector-Based Navigation
Another LSTM that control the agent and takes as input:
The current grid code (a grid code is the activity of the linear layer of the previous NN)
The goal grid code after it is reached the first time (one-shot learning)
A preprocessed visual cue, the last action, and current reward
The first output is a discrete action (the actor)
The second output is the value estimate (the critic)
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 27 / 37
Agent navigation Inspired by Neuroscience Navigation Experiments
Navigation Experiments
A goal grid code provides sufficient information to navigate to an arbitrary location
The authors substituted the goal grid code with a ‘fake’ one sampled randomly
The agent followed a direct path to the newly specified location, circling the absent goal
Like rodents in probe trials of the Morris water maze
Grid cells are crucial: silencing most of grid-like units (simulating targeted lesion),
rather than other units, has a dramatic effect on performance
Only the grid cell agent was able to exploit shortcuts
At the beginning of an episode, the agent explores to find an unmarked goal.
When the agent reaches the goal it is teleported to a new random location. Then
the agent exploits the goal code until the episode ends (fixed step number)
Mazes’ layout, texture, landmark and goal change at each episode
The state of a door changes randomly during an episode at every new run
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 28 / 37
Conclusions and Suggestions
Next in Line...
1 Introduction
2 Temporal Difference
Computational Temporal Difference
Temporal Difference in the Brain
Classical Conditioning
TD Model
The Reward Prediction Error Hypothesis
TD Error / Dopamine Correspondences
3 Agent navigation Inspired by Neuroscience
Navigation using grid-like representation
The Network architecture
Navigation Experiments
4 Conclusions and Suggestions
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 29 / 37
Conclusions and Suggestions
Deepening I
Takeaways
Classical conditioning was essential to formulate the core RL rule
Viceversa, RL was crucial to determine the functioning of the brain’s reward
system
Neuroscience continues to inspire novel and more powerful algorithms like the
navigation one
More on RL and Neuroscience
Meta-RL [Hassabis et al., 2017]
RL optimize an RNN that brings out a second RL algorithm, faster than the original
It could be inspired by the recurrent activity of the prefrontal cortex
Study of addiction [Sutton and Barto, 2018]
RL theory could be used to understand the neural basis of drug abuse
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 30 / 37
Conclusions and Suggestions
Deepening II
Where to start (Neuroscience for AI)
Building machines that learn and think like people [Lake et al., 2017]
Historical recall of neural inspiration
Propose cognitive challenges for agents
Define the essential ingredients for building human-like intelligence
Neuroscience-inspired artificial intelligence [Hassabis et al., 2017]
Highlight the importance of the human brain as an inspiration to AI
Analyze past and current influences from neuroscience to ML techniques
Underline key areas to bridge the gap between machine and human-level intelligence
Using neuroscience to develop artificial intelligence [Ullman, 2019]
Ullman calls into question current highly reductionist approaches
We should use knowledge about biological neurons, like their structure, type, and
connectivity to guide the building of brain-like network models
Intelligence likely lies in both experience and preexisting structures (inductive biases)
Where to start (AI for Neuroscience)
A deep learning framework for neuroscience [Richards et al., 2019]
Neuroscientists need an approach to deal with large experimental data
The three components of an ANN – (i) objective functions, (ii) learning rules and (iii)
architectures – could be used to produce compact (tractable) brain model
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 31 / 37
Conclusions and Suggestions
Deepening III
Practical Insight
Prior knowledge and inbuilt capacities (inductive biases) are crucial to fast learning
and making inferences [Ullman, 2019][Richards et al., 2019]
Which neuron feature - type, connectivity, structure - could be used to improve ANNs?
A new report claims grid cells could be critical also for abstract reasoning and
concept representation [Constantinescu et al., 2016]
Could ANNs featuring grid-like regularities be used to process abstract concepts?
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 32 / 37
Conclusions and Suggestions
References I
Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., Pritzel, A.,
Chadwick, M. J., Degris, T., Modayil, J., Wayne, G., Soyer, H., Viola, F., Zhang, B.,
Goroshin, R., Rabinowitz, N., Pascanu, R., Beattie, C., Petersen, S., Sadik, A.,
Gaffney, S., King, H., Kavukcuoglu, K., Hassabis, D., Hadsell, R., and Kumaran, D.
(2018).
Vector-based navigation using grid-like representations in artificial agents.
Nature, 557(7705):429–433.
Constantinescu, A. O., O’Reilly, J. X., and Behrens, T. E. J. (2016).
Organizing conceptual knowledge in humans with a gridlike code.
Science, 352(6292):1464–1468.
Ford, M. (2018).
Architects of Intelligence: The truth about AI from the people building it.
Packt Publishing Ltd.
Hassabis, D., Kumaran, D., Summerfield, C., and Botvinick, M. (2017).
Neuroscience-inspired artificial intelligence.
Neuron, 95(2):245–258.
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 33 / 37
Conclusions and Suggestions
References II
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. (2017).
Building machines that learn and think like people.
Behavioral and Brain Sciences, 40:e253.
Montague, P., Dayan, P., and Sejnowski, T. (1996).
A framework for mesencephalic dopamine systems based on predictive hebbian
learning.
The Journal of neuroscience : the official journal of the Society for Neuroscience,
16:1936–47.
Pillow, J. and Sahani, M. (2019).
Editorial overview: Machine learning, big data, and neuroscience.
Current Opinion in Neurobiology, 55:iii – iv.
Machine Learning, Big Data, and Neuroscience.
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 34 / 37
Conclusions and Suggestions
References III
Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen,
A., Clopath, C., Costa, R. P., de Berker, A., Ganguli, S., Gillon, C. J., Hafner, D.,
Kepecs, A., Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud, R.,
Pack, C. C., Poirazi, P., Roelfsema, P., Sacramento, J., Saxe, A., Scellier, B.,
Schapiro, A. C., Senn, W., Wayne, G., Yamins, D., Zenke, F., Zylberberg, J.,
Therien, D., and Kording, K. P. (2019).
A deep learning framework for neuroscience.
Nature Neuroscience, 22(11):1761–1770.
Rowland, D. C., Roudi, Y., Moser, M.-B., and Moser, E. I. (2016).
Ten years of grid cells.
Annual Review of Neuroscience, 39(1):19–40.
Sutton, R. and Barto, A. (1981).
Toward a modern theory of adaptive networks: Expectation and prediction.
Psychological review, 88:135–70.
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 35 / 37
Conclusions and Suggestions
References IV
Sutton, R. S. and Barto, A. G. (1987).
A temporal-difference model of classical conditioning.
In Proceedings of the ninth annual conference of the cognitive science society,
pages 355–378. Seattle, WA.
Sutton, R. S. and Barto, A. G. (2018).
Reinforcement learning : an introduction.
The MIT Press.
Ullman, S. (2019).
Using neuroscience to develop artificial intelligence.
Science, 363(6428):692–693.
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 36 / 37
Reinforcement Learning and Neuroscience
Michael Bosello
Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy
Intelligent Robotic Systems – Exam
Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 37 / 37

Contenu connexe

Tendances

abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docbutest
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksGiuseppe Broccolo
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
MLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf
 
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesBayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesJinYeong Bak
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingSho Takase
 
Visualising Quantum Physics using Mathematica
Visualising Quantum Physics using MathematicaVisualising Quantum Physics using Mathematica
Visualising Quantum Physics using MathematicaAndreas Dewanto
 
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONSFUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONSijcsity
 
Extension principle
Extension principleExtension principle
Extension principleSavo Delić
 
Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents Mustafa Jarrar
 
Jarrar: Description Logic
Jarrar: Description LogicJarrar: Description Logic
Jarrar: Description LogicMustafa Jarrar
 

Tendances (18)

Artificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain SituationsArtificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain Situations
 
Chapter 9
Chapter 9Chapter 9
Chapter 9
 
abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.doc
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
MLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf NYC Animashree Anandkumar
MLconf NYC Animashree Anandkumar
 
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesBayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
Visualising Quantum Physics using Mathematica
Visualising Quantum Physics using MathematicaVisualising Quantum Physics using Mathematica
Visualising Quantum Physics using Mathematica
 
Lesson 28
Lesson 28Lesson 28
Lesson 28
 
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONSFUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
 
Fuzzy hypersoft sets and its weightage operator for decision making
Fuzzy hypersoft sets and its weightage operator for decision makingFuzzy hypersoft sets and its weightage operator for decision making
Fuzzy hypersoft sets and its weightage operator for decision making
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 
Extension principle
Extension principleExtension principle
Extension principle
 
Chapter 5 (final)
Chapter 5 (final)Chapter 5 (final)
Chapter 5 (final)
 
Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents
 
Jarrar: Description Logic
Jarrar: Description LogicJarrar: Description Logic
Jarrar: Description Logic
 

Similaire à Reinforcement Learning and Neuroscience

Building better models in cognitive neuroscience. Part 1: Theory
Building better models in cognitive neuroscience. Part 1: TheoryBuilding better models in cognitive neuroscience. Part 1: Theory
Building better models in cognitive neuroscience. Part 1: TheoryBrian Spiering
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...butest
 
Neuroeconomic Class
Neuroeconomic ClassNeuroeconomic Class
Neuroeconomic Classtkvaran
 
Analogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thoughtAnalogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thoughtCITE
 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4CSITSansar
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimizationMahesh Tibrewal
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治The Whole Brain Architecture Initiative
 
Continuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep ArchitecturesContinuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep ArchitecturesVincenzo Lomonaco
 
Math viva [www.onlinebcs.com]
Math viva [www.onlinebcs.com]Math viva [www.onlinebcs.com]
Math viva [www.onlinebcs.com]Itmona
 
Learning about the brain: Neuroimaging and Beyond
Learning about the brain: Neuroimaging and BeyondLearning about the brain: Neuroimaging and Beyond
Learning about the brain: Neuroimaging and BeyondIrina Rish
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsMark Gerstein
 

Similaire à Reinforcement Learning and Neuroscience (20)

Building better models in cognitive neuroscience. Part 1: Theory
Building better models in cognitive neuroscience. Part 1: TheoryBuilding better models in cognitive neuroscience. Part 1: Theory
Building better models in cognitive neuroscience. Part 1: Theory
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
 
Neuroeconomic Class
Neuroeconomic ClassNeuroeconomic Class
Neuroeconomic Class
 
Analogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thoughtAnalogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thought
 
Lec4
Lec4Lec4
Lec4
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
 
PPT
PPTPPT
PPT
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Soft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar MauryaSoft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar Maurya
 
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
 
Continuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep ArchitecturesContinuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep Architectures
 
Math viva [www.onlinebcs.com]
Math viva [www.onlinebcs.com]Math viva [www.onlinebcs.com]
Math viva [www.onlinebcs.com]
 
Learning about the brain: Neuroimaging and Beyond
Learning about the brain: Neuroimaging and BeyondLearning about the brain: Neuroimaging and Beyond
Learning about the brain: Neuroimaging and Beyond
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 

Dernier

Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Dernier (20)

Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Reinforcement Learning and Neuroscience

  • 1. Reinforcement Learning and Neuroscience Michael Bosello Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy Intelligent Robotic Systems – Exam Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 1 / 37
  • 2. Outline 1 Introduction 2 Temporal Difference Computational Temporal Difference Temporal Difference in the Brain Classical Conditioning TD Model The Reward Prediction Error Hypothesis TD Error / Dopamine Correspondences 3 Agent navigation Inspired by Neuroscience Navigation using grid-like representation The Network architecture Navigation Experiments 4 Conclusions and Suggestions Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 2 / 37
  • 3. Introduction Next in Line... 1 Introduction 2 Temporal Difference Computational Temporal Difference Temporal Difference in the Brain Classical Conditioning TD Model The Reward Prediction Error Hypothesis TD Error / Dopamine Correspondences 3 Agent navigation Inspired by Neuroscience Navigation using grid-like representation The Network architecture Navigation Experiments 4 Conclusions and Suggestions Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 3 / 37
  • 4. Introduction Machine Learning and Neuroscience Intertwined history [Hassabis et al., 2017][Pillow and Sahani, 2019] Since the dawn of artificial intelligence (AI), neuroscience and AI have virtuously influenced each other Some of the most important tools that are available to machine learning (ML) originate from neuroscientific metaphors Reinforcement Learning (RL) Deep Learning (DL) ML metaphors have guided the exploration of neural functions by providing functional metaphors useful to formulate hypotheses Why it is important now ML Leading researchers have particularly emphasized the opportunity of drawing inspiration from neuroscience for the next generation of AI [Hassabis et al., 2017][Lake et al., 2017][Ford, 2018][Ullman, 2019] Neuroscience AI offers ideas about the possible mechanism in the brain and means for formalizing concepts. Also, the use of ML to analyze neuroimaging datasets to find patterns [Hassabis et al., 2017][Pillow and Sahani, 2019] Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 4 / 37
  • 5. Introduction Reinforcement Learning and Neuroscience This topic is very wide. We will focus on two points: RL [Sutton and Barto, 2018] The most remarkable virtuous circle of ML and neuroscience Trial and error Inspired by research into animal learning (Law of effect) Temporal Difference (TD) learning inspired by classical conditioning Evidence that the brain implements a form of TD learning Core parallel: TD learning / dopamine Why it is useful now RL Findings in neuroscience can provide a direction to more effective agents [Hassabis et al., 2017] e.g. agent navigation inspired by the entorhinal cortex [Banino et al., 2018] Neuroscience RL provides insight for psychiatric diseases [Pillow and Sahani, 2019][Sutton and Barto, 2018] Suggestions for further reading are provided Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 5 / 37
  • 6. Introduction Disclaimer Background We assume prior knowledge of RL (and DL) We don’t assume prior knowledge of neuroscience (and psychology) Sources The part on TD is based on [Sutton and Barto, 2018], unless otherwise stated The part on navigation is based on [Banino et al., 2018], unless otherwise stated All the images come from these two sources (but adapted for the context) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 6 / 37
  • 7. Temporal Difference Next in Line... 1 Introduction 2 Temporal Difference Computational Temporal Difference Temporal Difference in the Brain Classical Conditioning TD Model The Reward Prediction Error Hypothesis TD Error / Dopamine Correspondences 3 Agent navigation Inspired by Neuroscience Navigation using grid-like representation The Network architecture Navigation Experiments 4 Conclusions and Suggestions Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 7 / 37
  • 8. Temporal Difference Computational Temporal Difference Temporal Difference in Agents Central idea in RL Learn from experience (sampling like Monte Carlo methods) → Doesn’t need a model Estimation updates are based on other estimations (bootstrapping like Dynamic Programming) → Doesn’t need to wait for the end of an episode to update (online learning) Estimation of vπ – TD(0) V (St) ← V (St) + α[Rt+1 + γV (St+1) − V (St)] TD error δt = Rt+1 + γV (St+1) − V (St) Sarsa Q(St, At) ← Q(St, At) + α[Rt+1 + γQ(St+1, At+1) − Q(St, At)] Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 8 / 37
  • 9. Temporal Difference Temporal Difference in the Brain Classical Conditioning (Pavlovian Conditioning) I The physiologist Ivan Pavlov discovered that animals’ innate reflexes to certain stimuli can come to be triggered also by other unrelated stimuli In his experiment, a dog receives food after the sound of a metronome Initially, the dog produces more saliva only in response to the sight of food After some trials, the dog starts salivating also in response to the sound stimulus Unconditioned stimulus (US) The natural trigger (food) Unconditioned response (UR) The unborn reflex (salivation) Conditioned stimulus (CS) The new predictive stimulus (sound of the metronome) Conditioned response (CR) The acquired response (salivation) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 9 / 37
  • 10. Temporal Difference Temporal Difference in the Brain Classical Conditioning (Pavlovian Conditioning) II The animal learns the predictive relationship between the CS and the US So that the animal can anticipate the US and prepare or protect himself with a CR (which can differ from the UR and be more effective) We are considering only the prediction part i.e. policy evaluation Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 10 / 37
  • 11. Temporal Difference Temporal Difference in the Brain Classical Conditioning (Pavlovian Conditioning) III Blocking When the learning of one CR to a potential CS is blocked by another CS If you use a tone as a CS, and after that, you add light as a CS, there will be no response to the light alone Do conditioning depends only on simple temporal contiguity? Higher-order conditioning When a learned CS acts as a US in conditioning another CS Like in the previous experiment, a dog is conditioned with a metronome sound In the following trials, a black box is placed in the dog’s line of vision before the metronome sound, but no food is given The dog starts responding with salivation also to the sight of the black box, even though it has never anticipated the original US (food) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 11 / 37
  • 12. Temporal Difference Temporal Difference in the Brain The Rescorla-Wagner Model It explains blocking: the animal learns only when his expectations are violated (surprising) Each CS has an associative strength that represent its reliability Let’s assume we have a US Y , a CS A, a CS X and the compound CS AX with the respectively associative strengths VA, VX , VAX Aggregate associative strength Vax = VA + VX The associative strengths change over successive trials according to: ∆VA = αAβY (RY − VAX ) ∆VX = αX βY (RY − VAX ) Where αAβY αX βY are the step-size parameters, and RY is the associative strength supported by the US Y The associative strengths increase until they reach the supported level RY If an animal is conditioned with a CS A, adding a CS X will have almost no effect since the prediction error is already reduced to a low value – there is no surprise – Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 12 / 37
  • 13. Temporal Difference Temporal Difference in the Brain TD Model From Rescorla-Wagner to TD The TD model is based on the Rescorla-Wagner one A state is defined as a vector: x(s) = (x1(s), x2(s), . . . , xn(s)) In Rescorla-Wagner it is the set of CS meanwhile in TD it is more abstract t is a time step, not a trial The aggregate associative strength is: ˆv(s, w) = wT x(s) w is the associative strength vector Like a value estimate The new update formula Associative strength vector update wt+1 = wt + αδtx(St) TD error δt = Rt+1 + γˆv(St+1, wt) − ˆv(St, wt) With γ = 0 you return to Rescorla-Wagner To check the origins of the model: [Sutton and Barto, 1981] [Sutton and Barto, 1987] Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 13 / 37
  • 14. Temporal Difference Temporal Difference in the Brain Neuroscience Basics Neuroscience Is the study of the nervous system, its functions, and its changes over time Neurons Neurons are cells that process and transfer electrical and chemical signals A neuron is said to fire when it generates a spike (electrical pulses) A neuron can reach many other neurons The background activity of a neuron is its firing rate not related to the stimuli of an experiment The phasic activity of a neuron is caused by synaptic input Synapses are structures that mediate neurons communication A synapse can produce a chemical neurotransmitter when the neuron fires A neurotransmitter can inhibit or excite the postsynaptic neuron Neuromodulators are neurotransmitters having additional effects that can alter the operation of synapses Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 14 / 37
  • 15. Temporal Difference Temporal Difference in the Brain Two assumptions Since the firing rate of a neuron can’t be negative, the neuron activity is δt−1 + bt, where bt is the background firing rate. → A negative error corresponds to a firing rate below b The representation of the states allow keeping track of time passed between the cue and the reward There is a different signal for each time step → the state is time-dependent It is used the complete serial compound representation Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 15 / 37
  • 16. Temporal Difference Temporal Difference in the Brain The Reward Prediction Error Hypothesis States that “one of the functions of the phasic activity of dopamine-producing neurons in mammals is to deliver an error between an old and a new estimate of expected future reward to target areas throughout the brain.” [Sutton and Barto, 2018] [Montague et al., 1996] showed that the concept of TD error aligns with the feature of dopamine neurons Dopamine is a neuromodulator that broadcasts reward prediction errors (not rewards as was previously thought) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 16 / 37
  • 17. Temporal Difference Temporal Difference in the Brain Experimental Support for the Hypothesis Schultz’s group conducted a series of experiments supporting that the responses of dopamine neurons correspond actually to a TD error and not a simpler error like the Rescorla-Wagner one. Experiment Task 1: monkeys are trained to depress a lever after a light (trigger cue) is illuminated to obtain juice Task 2: there are two levers, each one with a light (instruction cue) indicating which lever will produce juice. The instruction precedes the trigger that must be awaited Initially, dopamine respond to reward During training, dopamine response shift to the earlier stimulus A landmark of TD learning When the task is learned, dopamine response decrease When moving to task 2, dopamine response increase Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 17 / 37
  • 18. Temporal Difference Temporal Difference in the Brain Correspondence I Dopamine neurons respond to unpredicted rewards δt−1 = Rt + Vt − Vt−1 = Rt + 0 − 0 = Rt We consider tabular TD(0) Without discounting ⇒ The return to be predicted is simply R* for each state Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 18 / 37
  • 19. Temporal Difference Temporal Difference in the Brain Correspondence II Dopamine neurons respond to the earliest predictor From any predicting state to another one δt−1 = Rt + Vt − Vt−1 = 0 + R − R = 0 From last predicting state to end δt−1 = Rt + Vt − Vt−1 = R + 0 − R = 0 From any state to the earliest predicting state δt−1 = Rt + Vt − Vt−1 = 0 + R − 0 = R Reward spreads backward until convergence The states preceding the earliest reward-predicting state are not reliable Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 19 / 37
  • 20. Temporal Difference Temporal Difference in the Brain Correspondence III Dopamine neurons firing rates decrease below baseline if a reward does not occur at its expected time. When monkeys pull the wrong lever, they receive no juice They internally keep track of time somehow δt−1 = Rt + Vt − Vt−1 = 0 + 0 − R = −R Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 20 / 37
  • 21. Temporal Difference Temporal Difference in the Brain Further readings Actor-critic in the brain Dopamine mainly targets two parts of the striatum Effects depend on the proprieties of the target It is supposed that The dorsal striatum acts as an actor The ventral striatum acts as a critic Other parallels More parallels and topic are introduced in [Sutton and Barto, 2018] Eligibility traces and Hedonistic Neurons ... Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 21 / 37
  • 22. Agent navigation Inspired by Neuroscience Next in Line... 1 Introduction 2 Temporal Difference Computational Temporal Difference Temporal Difference in the Brain Classical Conditioning TD Model The Reward Prediction Error Hypothesis TD Error / Dopamine Correspondences 3 Agent navigation Inspired by Neuroscience Navigation using grid-like representation The Network architecture Navigation Experiments 4 Conclusions and Suggestions Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 22 / 37
  • 23. Agent navigation Inspired by Neuroscience Navigation using grid-like representation Navigation using grid-like representations I Entorhinal cortex basics [Rowland et al., 2016] [Banino et al., 2018] The entorhinal cortex creates a neural representation of space through functionally dedicated cell types whose firing rates depend on the animal’s position Grid cells respond to the animal’s location in the environment There are also: border cells, speed cells, head direction cells Grid cells have hexagonally arranged firing fields that tile the environment surface A population of grid cell provide a unique representation (code) of locations Grid cells perform path integration by taking inputs from speed cells and head direction cells and resulting in place cells Path integration is the ability to self-localize based on self-motion Grid cells are critical for vector-based navigation as they provide Euclidean spatial metric supporting calculation of goal-directed vectors Vector-based navigation is the process to follow direct routes to a remembered goal Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 23 / 37
  • 24. Agent navigation Inspired by Neuroscience Navigation using grid-like representation Navigation using grid-like representations II [Banino et al., 2018] developed a deep RL agent with mammal-like navigation abilities They trained a recurrent network to perform path integration, leading to the emergence of representations resembling entorhinal cells The network is incorporated in a deep RL architecture to perform navigation Results Grid cells representations endow agents with the ability to perform proficient vector-based navigation The emergent representation provides the Euclidean spatial metric to calculate goal-directed vectors and the relative positions of two points by examining the difference in the current vector code and the code of a remembered goal locate goals in challenging, unfamiliar, and changeable environments The results provide strong empirical support to Theories that see grid cells critical in vector-based navigation (support that was previously missing) Grid cells’ role in providing location code updated by self-motion cues The performance of the agent surpassed comparison agents and an expert human The agents conducted shortcut behaviors like those performed by mammals Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 24 / 37
  • 25. Agent navigation Inspired by Neuroscience The Network architecture Neural Network Performing Path Integration The network is a long short-term memory (LSTM) As it happens in the brain: It must update its estimate of location and head direction (by predicting place and head-direction cells activations) It takes as input transactional and angular velocities with perturbation and a visual input It is trained with simulated place and head cells activations during trajectories modeled on those of foraging rodents This form of supervision is also present in rodent pups where place and head cells guide the growth of grid cells The visual input is processed by a CNN That mimics the correction performed by place cells based on environmental cues It generates place and head cell activations The output is silenced 95% of the time (to mimic imperfect observations from behaving animals) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 25 / 37
  • 26. Agent navigation Inspired by Neuroscience The Network architecture Grid-Like Representations The last one is a linear layer with dropout Individual units in the linear layer developed stable spatial activity profiles similar to those of neurons within the entorhinal cortex Grid like representations didn’t emerge without dropout regularization Dropout is also present in the brain – it is an inductive bias [Hassabis et al., 2017] Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 26 / 37
  • 27. Agent navigation Inspired by Neuroscience The Network architecture Neural Network Performing Vector-Based Navigation Another LSTM that control the agent and takes as input: The current grid code (a grid code is the activity of the linear layer of the previous NN) The goal grid code after it is reached the first time (one-shot learning) A preprocessed visual cue, the last action, and current reward The first output is a discrete action (the actor) The second output is the value estimate (the critic) Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 27 / 37
  • 28. Agent navigation Inspired by Neuroscience Navigation Experiments Navigation Experiments A goal grid code provides sufficient information to navigate to an arbitrary location The authors substituted the goal grid code with a ‘fake’ one sampled randomly The agent followed a direct path to the newly specified location, circling the absent goal Like rodents in probe trials of the Morris water maze Grid cells are crucial: silencing most of grid-like units (simulating targeted lesion), rather than other units, has a dramatic effect on performance Only the grid cell agent was able to exploit shortcuts At the beginning of an episode, the agent explores to find an unmarked goal. When the agent reaches the goal it is teleported to a new random location. Then the agent exploits the goal code until the episode ends (fixed step number) Mazes’ layout, texture, landmark and goal change at each episode The state of a door changes randomly during an episode at every new run Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 28 / 37
  • 29. Conclusions and Suggestions Next in Line... 1 Introduction 2 Temporal Difference Computational Temporal Difference Temporal Difference in the Brain Classical Conditioning TD Model The Reward Prediction Error Hypothesis TD Error / Dopamine Correspondences 3 Agent navigation Inspired by Neuroscience Navigation using grid-like representation The Network architecture Navigation Experiments 4 Conclusions and Suggestions Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 29 / 37
  • 30. Conclusions and Suggestions Deepening I Takeaways Classical conditioning was essential to formulate the core RL rule Viceversa, RL was crucial to determine the functioning of the brain’s reward system Neuroscience continues to inspire novel and more powerful algorithms like the navigation one More on RL and Neuroscience Meta-RL [Hassabis et al., 2017] RL optimize an RNN that brings out a second RL algorithm, faster than the original It could be inspired by the recurrent activity of the prefrontal cortex Study of addiction [Sutton and Barto, 2018] RL theory could be used to understand the neural basis of drug abuse Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 30 / 37
  • 31. Conclusions and Suggestions Deepening II Where to start (Neuroscience for AI) Building machines that learn and think like people [Lake et al., 2017] Historical recall of neural inspiration Propose cognitive challenges for agents Define the essential ingredients for building human-like intelligence Neuroscience-inspired artificial intelligence [Hassabis et al., 2017] Highlight the importance of the human brain as an inspiration to AI Analyze past and current influences from neuroscience to ML techniques Underline key areas to bridge the gap between machine and human-level intelligence Using neuroscience to develop artificial intelligence [Ullman, 2019] Ullman calls into question current highly reductionist approaches We should use knowledge about biological neurons, like their structure, type, and connectivity to guide the building of brain-like network models Intelligence likely lies in both experience and preexisting structures (inductive biases) Where to start (AI for Neuroscience) A deep learning framework for neuroscience [Richards et al., 2019] Neuroscientists need an approach to deal with large experimental data The three components of an ANN – (i) objective functions, (ii) learning rules and (iii) architectures – could be used to produce compact (tractable) brain model Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 31 / 37
  • 32. Conclusions and Suggestions Deepening III Practical Insight Prior knowledge and inbuilt capacities (inductive biases) are crucial to fast learning and making inferences [Ullman, 2019][Richards et al., 2019] Which neuron feature - type, connectivity, structure - could be used to improve ANNs? A new report claims grid cells could be critical also for abstract reasoning and concept representation [Constantinescu et al., 2016] Could ANNs featuring grid-like regularities be used to process abstract concepts? Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 32 / 37
  • 33. Conclusions and Suggestions References I Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., Pritzel, A., Chadwick, M. J., Degris, T., Modayil, J., Wayne, G., Soyer, H., Viola, F., Zhang, B., Goroshin, R., Rabinowitz, N., Pascanu, R., Beattie, C., Petersen, S., Sadik, A., Gaffney, S., King, H., Kavukcuoglu, K., Hassabis, D., Hadsell, R., and Kumaran, D. (2018). Vector-based navigation using grid-like representations in artificial agents. Nature, 557(7705):429–433. Constantinescu, A. O., O’Reilly, J. X., and Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292):1464–1468. Ford, M. (2018). Architects of Intelligence: The truth about AI from the people building it. Packt Publishing Ltd. Hassabis, D., Kumaran, D., Summerfield, C., and Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95(2):245–258. Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 33 / 37
  • 34. Conclusions and Suggestions References II Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253. Montague, P., Dayan, P., and Sejnowski, T. (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. The Journal of neuroscience : the official journal of the Society for Neuroscience, 16:1936–47. Pillow, J. and Sahani, M. (2019). Editorial overview: Machine learning, big data, and neuroscience. Current Opinion in Neurobiology, 55:iii – iv. Machine Learning, Big Data, and Neuroscience. Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 34 / 37
  • 35. Conclusions and Suggestions References III Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., de Berker, A., Ganguli, S., Gillon, C. J., Hafner, D., Kepecs, A., Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud, R., Pack, C. C., Poirazi, P., Roelfsema, P., Sacramento, J., Saxe, A., Scellier, B., Schapiro, A. C., Senn, W., Wayne, G., Yamins, D., Zenke, F., Zylberberg, J., Therien, D., and Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11):1761–1770. Rowland, D. C., Roudi, Y., Moser, M.-B., and Moser, E. I. (2016). Ten years of grid cells. Annual Review of Neuroscience, 39(1):19–40. Sutton, R. and Barto, A. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological review, 88:135–70. Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 35 / 37
  • 36. Conclusions and Suggestions References IV Sutton, R. S. and Barto, A. G. (1987). A temporal-difference model of classical conditioning. In Proceedings of the ninth annual conference of the cognitive science society, pages 355–378. Seattle, WA. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning : an introduction. The MIT Press. Ullman, S. (2019). Using neuroscience to develop artificial intelligence. Science, 363(6428):692–693. Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 36 / 37
  • 37. Reinforcement Learning and Neuroscience Michael Bosello Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy Intelligent Robotic Systems – Exam Michael Bosello Reinforcement Learning and Neuroscience Intelligent Robotic Systems – Exam 37 / 37