1. Sequence learning in a
model of the basal ganglia
Thesis submitted for MSc in computer science
NTNU, Trondheim 2006-09-08
Stian Soiland
http://soiland.no/master
2. This presentation
Theory
Control theory & Actor-Critic
Basal Ganglia and pathways
CTRNN
Previous work by Berns & Sejnowski
What was done?
Results
Globus pallidus, Weight learning
Error function
Experiments & Code
Noise, Integrator update rule
CTRNN library
Discussion
Equation or code?
5. Basal ganglia pathways
excitatory
inhibitory
dopaminergic
direct pathway
indirect pathway
Globus
pallidus
externalSubthalamic
nucleus
Subtantia
nigra pars
compacta
Subtantia
nigra pars
reticulata
Cerebral
cortex
Frontal
cortex
Striatum
Thalamus
Globus
pallidus
internal
6. Inhibitory connections
excitatory
inhibitory
firing
at rest
cortex inputs
sensory and
other inputs
Frontal
cortex
Striatum ThalamusGlobus
pallidus
STR excited GP inhibited Thalamus
disinhibited
Motor cortex
excited
STR at rest GP tonically
active
Thalamus
inhibited
Motor cortex
not excited
Purves et al. 2003
7. Basal ganglia in action
d in alert monkeys while they per-
vioral acts and receive rewards.
delivery (Fig. 1, top). After several days of
training, the animal learns to reach for the
puts appear to code for a d
between the actual reward
dictions of the time and ma
These neurons are activated
of the reward is uncertain, th
by any preceding cues. Dopa
therefore excellent feature
“goodness” of environment
to learned predictions abo
They emit a positive signa
production) if an appetitiv
than predicted, no signal (n
production) if an appetitiv
predicted, and a negative
spike production) if an ap
worse than predicted (Fig. 1
Computational Theo
The TD algorithm (6, 7) is
suited to understanding th
played by the dopamine s
the information it construc
(8, 10, 12). This work has
in dopamine activity in d
supervisory signal for
changes (8, 10, 12) and (
influence directly and indi
Reward predicted
Reward occurs
No prediction
Reward occurs
Reward predicted
No reward occurs
(No CS)
(No R)CS
-1 0 1 2 s
CS
R
R
Do dopamine neurons report an error
in the prediction of reward?
hanges in dopamine neurons’
e for an error in the prediction of
events. (Top) Before learning, a
petitive fruit juice occurs in the
f prediction—hence a positive
prediction of reward. The do-
uron is activated by this unpre-
urrence of juice. (Middle) After
e conditioned stimulus predicts
d the reward occurs according
diction—hence no error in the
of reward. The dopamine neu-
ated by the reward-predicting
ut fails to be activated by the
reward (right). (Bottom) After
e conditioned stimulus predicts
ut the reward fails to occur be-
mistake in the behavioral re-
the monkey. The activity of the
neuron is depressed exactly at
hen the reward would have oc-
e depression occurs more than
e conditioned stimulus without
ning stimuli, revealing an inter-
ntation of the time of the pre-
ard. Neuronal activity is aligned Scultz et al. 1997
8. Continious time recurrent neural
network (CTRNN)
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20
voltage
time
potential
input
I
i
(t)
y
2
(t)
τ=1.0
y
1
(t)
τ=4 w
1,2
w
2,1
θ
1
θ
2
9. Berns & Sejnowski's model
error
signal
indirect
pathway
striatum
globus pallidus
error
substantia
nigra
direct
pathway
subthalamic
nucleus
input
output
fast
slow
input
output
fast
slow
input
output
fast
slow
excitatory
inhibitory
weight-
modulating
fixed link
dynamic link
Thalamus
Cortex
10. What have I done?
Early attempts: C++ implementations of CTRNN & B&S
Direct reproduction of Berns & Sejnowski, equation to code
Experiments and tweaks on the direct reproduction
Implementation of CTRNN library
Reproducing B&S using CTRNN (failed)
Reproducing Prescott (2006) using CTRNN (worked)
11. Reproducing response of
globus pallidus
GP 5
4
3
2
1
GP 5
4
3
2
1
1 20
200180
Berns & Sejnowski 1998
Berns & Sejnowski Soiland
12. Weight learning
Figure 5. Changes in connection strengths, wij, from learning the se-
quence 1, 2, 3, 4, 2, 5. The ve weights from the ve STN units with
hort time constants to GP unit 2 are shown. The three weights that
ncreased to saturation levels were from STN units 2, 3, and 5 (i.e.,
hose STN units that were not active prior to GP unit 2 being ac-
ive). Conversely, the weights from STN units 1 and 4 did not in-
Berns & Sejnowski 1998
Berns & Sejnowski
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160 180
weight
timestep
STN 1
STN 2
STN 3
STN 4
STN 5
Soiland
13. Error function
he GP output is compared to the weighted STR inpu
e(s) =
i
Gi(s) − vi(s)Si(s)
weight representing the connection between STR un
ed for updating the weight of the connection between ST
STN unit has connections to all GP units, and the co
using a Hebbian learning rule (Sutton and Barto, 1981
def calc_error(self):
sum = 0.0
for i in range(self.inputs):
sum += self.GP[i]
sum -= self.v[i]* self.STR[i]
return sum
1
1.5
2
2.5
3
3.5
4
0 20 40 60 80 100 120 140 160 180 200
error
timestep
Berns & Sejnowski Soiland
Berns & Sejnowski 1998
15. Experiment: Sigmoidal update ruleAs shown in the discrete update rule in equation 3.5 on page 24
is applied several times, as simplified in:
B(s) = σ (αB(s − 1) + β)
B(s + 1) = σ (ασ (αB(s − 1) + β) + β)
To investigate the effect of using an iterative sigmoidal update ru
were applied in a Octave (Murphy, 1997) simulation, a function
normal leaky integrator potential, while function v(t) represents
rule as in Berns and Sejnowski:
u(t) = u(t − 1) +
1
τ
− u(t − 1) + I(t)
v(t) = σ v(t − 1) +
1
τ
− v(t − 1) + I(t)
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
voltage
input
u(t)
sigmoid(u(t))
v(t)
as a modified version of equation 2.3 on page 16, includ
σ, which is an extended version of equation 2.2 on p
bias β:
σ(x) =
1
1 + e−γ(x−β)
Berns and Sejnowski (1998) introduce a term λ rep
by the size of the time step:
λ =
τ
τ + ∆t
Substituting equation 3.4 into equation 3.2 yields th
def calc_STN(self, i):
return self.sigmoid(lambd * self.STN[i] -
(1-lambd) * self.effect * self.GP[i/2])
(..)
STN[i] = calc_STN(i)
def sigmoid(self, x):
return 1.0 / (1.0 +
math.exp(-self.gain * (x-self.bias)))
17. Equations or code?
inputs = numpy.matrixmultiply(output, weight)
change = timestep/timeconst *
(-potential + inputs + bias)
potential += change
output = map(transfer, potential)
More readable, but also more verbose
ors, while weight is an n × n sized matrix. The code is easily compared
ion of equation 2.3 on page 16:
y(t + ∆t) = y(t) +
∆t
τ
− y(t) + u(t) × W + θ (3.10)
u(t) = σ y(t) (3.11)
to making code clearer, the matrix operations of NumPy are implemented
mpiled languages as C and FORTRAN and exploit CPU vector features
vec engine (Diefendorff et al., 2000), which in informal tests on the basal
gave a considerable speed-up compared to pure Python code, in some
or 30.
k, experimenting with network shapes and layout was essential, and so
evel view of the calculations seemed like a reasonable approach, justifying
ing NumPy.
Concise, but can be difficult to understand
Mathematics can be general, but code is reproducible.
Maybe the best is a combination?