basal-ganglia

Sequence learning in a
model of the basal ganglia
Thesis submitted for MSc in computer science
NTNU, Trondheim 2006-09-08
Stian Soiland
http://soiland.no/master

This presentation
Theory
Control theory & Actor-Critic
Basal Ganglia and pathways
CTRNN
Previous work by Berns & Sejnowski
What was done?
Results
Globus pallidus, Weight learning
Error function
Experiments & Code
Noise, Integrator update rule
CTRNN library
Discussion
Equation or code?

Actor-critic architecture
System
Feedback
Reference signal
Disturbances
Output
Control
signals
Controller
_
+
error
Environment
(Controlled
System)
Context
Disturbances
Output
Actions
(Control
signals)
Actor
(Controller)
Critic
Feedback
Reinforcement
signal
Classic control system

Basal ganglia
Caudate
Globus pallidus
external internal
Substantia nigra
pars reticulata
Putamen
Wikipedia 2006
Adopted from Purves et al. 2003

Basal ganglia pathways
excitatory
inhibitory
dopaminergic
direct pathway
indirect pathway
Globus
pallidus
externalSubthalamic
nucleus
Subtantia
nigra pars
compacta
Subtantia
nigra pars
reticulata
Cerebral
cortex
Frontal
cortex
Striatum
Thalamus
Globus
pallidus
internal

Inhibitory connections
excitatory
inhibitory
ﬁring
at rest
cortex inputs
sensory and
other inputs
Frontal
cortex
Striatum ThalamusGlobus
pallidus
STR excited GP inhibited Thalamus
disinhibited
Motor cortex
excited
STR at rest GP tonically
active
Thalamus
inhibited
Motor cortex
not excited
Purves et al. 2003

Basal ganglia in action
d in alert monkeys while they per-
vioral acts and receive rewards.
delivery (Fig. 1, top). After several days of
training, the animal learns to reach for the
puts appear to code for a d
between the actual reward
dictions of the time and ma
These neurons are activated
of the reward is uncertain, th
by any preceding cues. Dopa
therefore excellent feature
“goodness” of environment
to learned predictions abo
They emit a positive signa
production) if an appetitiv
than predicted, no signal (n
production) if an appetitiv
predicted, and a negative
spike production) if an ap
worse than predicted (Fig. 1
Computational Theo
The TD algorithm (6, 7) is
suited to understanding th
played by the dopamine s
the information it construc
(8, 10, 12). This work has
in dopamine activity in d
supervisory signal for
changes (8, 10, 12) and (
influence directly and indi
Reward predicted
Reward occurs
No prediction
Reward occurs
Reward predicted
No reward occurs
(No CS)
(No R)CS
-1 0 1 2 s
CS
R
R
Do dopamine neurons report an error
in the prediction of reward?
hanges in dopamine neurons’
e for an error in the prediction of
events. (Top) Before learning, a
petitive fruit juice occurs in the
f prediction—hence a positive
prediction of reward. The do-
uron is activated by this unpre-
urrence of juice. (Middle) After
e conditioned stimulus predicts
d the reward occurs according
diction—hence no error in the
of reward. The dopamine neu-
ated by the reward-predicting
ut fails to be activated by the
reward (right). (Bottom) After
e conditioned stimulus predicts
ut the reward fails to occur be-
mistake in the behavioral re-
the monkey. The activity of the
neuron is depressed exactly at
hen the reward would have oc-
e depression occurs more than
e conditioned stimulus without
ning stimuli, revealing an inter-
ntation of the time of the pre-
ard. Neuronal activity is aligned Scultz et al. 1997

Continious time recurrent neural
network (CTRNN)
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20
voltage
time
potential
input
I
i
(t)
y
2
(t)
τ=1.0
y
1
(t)
τ=4 w
1,2
w
2,1
θ
1
θ
2

Berns & Sejnowski's model
error
signal
indirect
pathway
striatum
globus pallidus
error
substantia
nigra
direct
pathway
subthalamic
nucleus
input
output
fast
slow
input
output
fast
slow
input
output
fast
slow
excitatory
inhibitory
weight-
modulating
ﬁxed link
dynamic link
Thalamus
Cortex

What have I done?
Early attempts: C++ implementations of CTRNN & B&S
Direct reproduction of Berns & Sejnowski, equation to code
Experiments and tweaks on the direct reproduction
Implementation of CTRNN library
Reproducing B&S using CTRNN (failed)
Reproducing Prescott (2006) using CTRNN (worked)

Reproducing response of
globus pallidus
GP 5
4
3
2
1
GP 5
4
3
2
1
1 20
200180
Berns & Sejnowski 1998
Berns & Sejnowski Soiland

Weight learning
Figure 5. Changes in connection strengths, wij, from learning the se-
quence 1, 2, 3, 4, 2, 5. The ve weights from the ve STN units with
hort time constants to GP unit 2 are shown. The three weights that
ncreased to saturation levels were from STN units 2, 3, and 5 (i.e.,
hose STN units that were not active prior to GP unit 2 being ac-
ive). Conversely, the weights from STN units 1 and 4 did not in-
Berns & Sejnowski
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160 180
weight
timestep
STN 1
STN 2
STN 3
STN 4
STN 5
Soiland

Error function
he GP output is compared to the weighted STR inpu
e(s) =
i
Gi(s) − vi(s)Si(s)
weight representing the connection between STR un
ed for updating the weight of the connection between ST
STN unit has connections to all GP units, and the co
using a Hebbian learning rule (Sutton and Barto, 1981
def calc_error(self):
sum = 0.0
for i in range(self.inputs):
sum += self.GP[i]
sum -= self.v[i]* self.STR[i]
return sum
1
1.5
2
2.5
3
3.5
4
0 20 40 60 80 100 120 140 160 180 200
error
timestep
Berns & Sejnowski Soiland

0.85
0.9
0.95
1
1.05
0 20 40 60 80 100
GPfiringrate
Timestep
without noise
w/noise #1
w/noise #2
fixed bias=1
Experiment: Noise

Experiment: Sigmoidal update ruleAs shown in the discrete update rule in equation 3.5 on page 24
is applied several times, as simplified in:
B(s) = σ (αB(s − 1) + β)
B(s + 1) = σ (ασ (αB(s − 1) + β) + β)
To investigate the effect of using an iterative sigmoidal update ru
were applied in a Octave (Murphy, 1997) simulation, a function
normal leaky integrator potential, while function v(t) represents
rule as in Berns and Sejnowski:
u(t) = u(t − 1) +
1
τ
− u(t − 1) + I(t)
v(t) = σ v(t − 1) +
1
τ
− v(t − 1) + I(t)
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
voltage
input
u(t)
sigmoid(u(t))
v(t)
as a modified version of equation 2.3 on page 16, includ
σ, which is an extended version of equation 2.2 on p
bias β:
σ(x) =
1
1 + e−γ(x−β)
Berns and Sejnowski (1998) introduce a term λ rep
by the size of the time step:
λ =
τ
τ + ∆t
Substituting equation 3.4 into equation 3.2 yields th
def calc_STN(self, i):
return self.sigmoid(lambd * self.STN[i] -
(1-lambd) * self.effect * self.GP[i/2])
(..)
STN[i] = calc_STN(i)
def sigmoid(self, x):
return 1.0 / (1.0 +
math.exp(-self.gain * (x-self.bias)))

CTRNN library for Python
import ctrnn
net = ctrnn.CTRNN(2)
net.bias[0] = 0.4
net.timeconst[1] = 1.4
net.weight[0,1] = 1.5
net.weight[1,0] = -1.0
net.calc_timestep(); print net.output
net.calc_timestep(); print net.output
[ 0.59868766 0.5 ]
[ 0.47502081 0.6550814 ]
y1(t)
τ=1.4
y0(t)
τ=1.0
1.5
-1.0
0.4

Equations or code?
inputs = numpy.matrixmultiply(output, weight)
change = timestep/timeconst *
(-potential + inputs + bias)
potential += change
output = map(transfer, potential)
More readable, but also more verbose
ors, while weight is an n × n sized matrix. The code is easily compared
ion of equation 2.3 on page 16:
y(t + ∆t) = y(t) +
∆t
τ
− y(t) + u(t) × W + θ (3.10)
u(t) = σ y(t) (3.11)
to making code clearer, the matrix operations of NumPy are implemented
mpiled languages as C and FORTRAN and exploit CPU vector features
vec engine (Diefendorﬀ et al., 2000), which in informal tests on the basal
gave a considerable speed-up compared to pure Python code, in some
or 30.
k, experimenting with network shapes and layout was essential, and so
evel view of the calculations seemed like a reasonable approach, justifying
ing NumPy.
Concise, but can be difﬁcult to understand
Mathematics can be general, but code is reproducible.
Maybe the best is a combination?

Questions?
All code, thesis and presentation at:
http://soiland.no/master

basal-ganglia

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à basal-ganglia

Similaire à basal-ganglia (20)

Plus de Stian Soiland-Reyes

Plus de Stian Soiland-Reyes (8)

basal-ganglia