Moment closure inference for stochastic kinetic models

Moment closure inference for
stochastic kinetic models

Colin Gillespie

School of Mathematics & Statistics

Talk outline
An introduction to moment closure
Case study: Aphids
Conclusion

2/43

Birth-death process

Birth-death model
X −→ 2X and 2X −→ X

which has the propensity functions λX and µX .

Deterministic representation
The deterministic model is

dX (t )
= ( λ − µ )X (t ) ,
dt

which can be solved to give X (t ) = X (0) exp[(λ − µ)t ].

3/43

Stochastic representation
In the stochastic framework, each
reaction has a probability of occurring
50

The analogous version of the
40
birth-death process is the difference

Population
equation 30

20
dpn
= λ(n − 1)pn−1 + µ(n + 1)pn+1 10
dt
− (λ + µ)npn 0

0 1 2 3 4
Time
Usually called the forward Kolmogorov
equation or chemical master equation

4/43

Moment equations
Multiply the CME by enθ and sum over n, to obtain

∂M ∂M
= [λ(eθ − 1) + µ(e−θ − 1)]
∂t ∂θ
where
∞
M (θ; t ) = ∑ e n θ pn ( t )
n =0

If we differentiate this p.d.e. w.r.t θ and set θ = 0, we get

dE[N (t )]
= (λ − µ)E[N (t )]
dt

where E[N (t )] is the mean

5/43

The mean equation

dE[N (t )]
= (λ − µ)E[N (t )]
dt

This ODE is solvable - the associated forward Kolmogorov equation is
also solvable
The equation for the mean and deterministic ODE are identical
When the rate laws are linear, the stochastic mean and deterministic
solution always correspond

6/43

The variance equation
If we differentiate the p.d.e. w.r.t θ twice and set θ = 0, we get:

dE[N (t )2 ]
= (λ − µ)E[N (t )] + 2(λ − µ)E[N (t )2 ]
dt

and hence the variance Var[N (t )] = E[N (t )2 ] − E[N (t )]2 .
Differentiating three times gives an expression for the skewness, etc

7/43

Simple dimerisation model

Dimerisation
2X1 −→ X2 and X2 −→ 2X1

with propensities 0.5k1 X1 (X1 − 1) and k2 X2 .

8/43

Dimerisation moment equations
We formulate the dimer model in terms of moment equations

dE[X1 ] 2
= 0.5k1 (E[X1 ] − E[X1 ]) − k2 E[X1 ]
dt
2
dE[X1 ] 2 2
= k1 (E[X1 X2 ] − E[X1 X2 ]) + 0.5k1 (E[X1 ] − E[X1 ])
dt
2
+ k2 (E[X1 ] − 2E[X1 ])

where E[X1 ] is the mean of X1 and E[X1 ] − E[X1 ]2 is the variance
2

The i th moment equation depends on the (i + 1)th equation

9/43

Deterministic approximates stochastic
Rewriting
dE[X1 ] 2
= 0.5k1 (E[X1 ] − E[X1 ]) − k2 E[X1 ]
dt
in terms of its variance, i.e. E[X1 ] = Var[X1 ] + E[X1 ]2 , we get
2

dE[X1 ]
= 0.5k1 E [X1 ](E[X1 ] − 1) + 0.5k1 Var[X1 ] − k2 E[X1 ] (1)
dt

Setting Var[X1 ] = 0 in (1), recovers the deterministic equation
So we can consider the deterministic models as an approximation to
the stochastic
When we have polynomial rate laws, setting the variance to zero
results in the deterministic equation

10/43

Simple dimerisation model
To close the equations, we assume an underlying distribution
The easiest option is to assume an underlying Normal distribution, i.e.

E[X1 ] = 3E[X1 ]E[X1 ] − 2E[X1 ]3
3 2

But we could also use, the Poisson

3
E[X1 ] = E[X1 ] + 3E[X1 ]2 + E[X1 ]3

or the Log normal
2 3
3 E [ X1 ]
E [ X1 ] =
E [ X1 ]

11/43

Heat shock model
Proctor et al, 2005. Stochastic kinetic model of the heat shock system
twenty-three reactions
seventeen chemical species
A single stochastic simulation up to t = 2000 takes about 35 minutes.
If we convert the model to moment equations, we get 139 equations
ADP Native Protein

1200 6000000

5950000
1000

5900000
800
Population

5850000
600

5800000

400
5750000

200
5700000

0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time
Gillespie, CS, 2009

12/43

Density plots: heat shock model

Time t=200 Time t=2000

0.006
Density

0.004

0.002

0.000

600 800 1000 1200 1400 600 800 1000 1200 1400
ADP population

13/43

P53-Mdm2 oscillation model

Proctor and Grey, 2008 300
16 chemical species
250
Around a dozen reactions
200

Population
The model contains an events
At t = 1, set X = 0 150

If we convert the model to moment 100

equations, we get 139 equations. 50

However, in this case the moment 0

closure approximation doesn’t do to 0 5 10 15 20 25 30
Time
well!

14/43

P53-Mdm2 oscillation model
Proctor and Grey, 2008
300
16 chemical species
Around a dozen reactions 250

The model contains an events 200

Population
At t = 1, set X = 0 150

If we convert the model to moment 100
equations, we get 139 equations.
50
However, in this case the moment
0
closure approximation doesn’t do to
0 5 10 15 20 25 30
well! Time

14/43

What went wrong?
The Moment closure (tends) to fail when there is a large difference
between the deterministic and stochastic formulations
In this particular case, strongly correlated species
Typically when the MC approximation fails, it gives a negative
variance
The MC approximation does work well for other parameter values for
the p53 model

15/43

Part II

Cotton aphids

16/43

Cotton aphids

Aphid infestation (G & Golightly, 2010)
A cotton aphid infestation of a cotton plant can result in:
leaves that curl and pucker
seedling plants become stunted and may die
a late season infestation can result in stained cotton
cotton aphids have developed resistance to many chemical
treatments and so can be difﬁcult to treat
Basically it costs someone a lot of money

17/43

Cotton aphids
The data consists of
ﬁve observations at each plot
the sampling times are t=0, 1.14, 2.29, 3.57 and 4.57 weeks (i.e.
every 7 to 8 days)
three blocks, each being in a distinct area
three irrigation treatments (low, medium and high)
three nitrogen levels (blanket, variable and none)

18/43

The data

Zero Variable Block
q
2500

2000 q

1500

Low
q
q
1000 q
q q
q q q

500 q q q
q q
q q q q
q q q q
q
q
q q
q q q
q q
0 q q q

2500
q
2000

Medium
q
1500
q q
q
q q 19/43
1000 q q
q

Zero Variable Block
The data
q
2500

2000 q

1500

Low
q
q
1000 q
q q
q q q

500 q q q
q q
q q q q
q q q q
q
q
q q
q q q
q q
0 q q q

2500
q
No. of aphids

2000

Medium
q
1500
q q
q
q q
1000 q q
q q

500 q q
q q
q q
q q
q q q q q q
q q q q
q q q
0

2500

2000
q
q

High
1500
q
q
q q
1000 q q
q q
q q
500 q
q q q
q q q q q
q q q q
q q q q q
q q q q
0 q q

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Time

19/43

Some notation
Let
n (t ) to be the size of the aphid population at time t
c (t ) to be the cumulative aphid population at time t
1. We observe n (t ) at discrete time points
2. We don’t observe c (t )
3. c (t ) ≥ n (t )

20/43

The model
We assume, based on previous modelling (Matis et al., 2004)
An aphid birth rate of λn (t )
An aphid death rate of µn (t )c (t )
So extinction is certain, as eventually µnc > λn for large t

21/43

The model

Deterministic representation
Previous modelling efforts have focused on deterministic models:

dN (t )
= λN (t ) − µC (t )N (t )
dt
dC (t )
= λN (t )
dt

Some problems
Initial and ﬁnal aphid populations are quite small
No allowance for ‘natural’ random variation
Solution: use a stochastic model

22/43

The model

Stochastic representation
Let pn,c (t ) denote the probability:
there are n aphids in the population at time t
a cumulative population size of c at time t
This gives the forward Kolmogorov equation

dpn,c (t )
= λ(n − 1)pn−1,c −1 (t ) + µc (n + 1)pn+1,c (t )
dt
− n ( λ + µ c ) p n ,c ( t )

Even though this equation is fairly simple, it still can’t be solved exactly.

23/43

Some simulations

800

600
Aphid pop.

400

200

0

0 2 4 6 8 10
Time (days)

Parameters: n (0) = c (0) = 1, λ = 1.7 and µ = 0.001 24/43

Stochastic parameter estimation
Let X(tu ) = (n (tu ), c (tu )) be the vector of observed aphid counts
and unobserved cumulative population size at time tu ;
To infer λ and µ, we need to estimate

Pr[X(tu )| X(tu −1 ), λ, µ]

i.e. the solution of the forward Kolmogorov equation
We will use moment closure to estimate this distribution

25/43

Moment equations for the means

dE[n (t )]
= λE[n(t )] − µ(E[n(t )]E[c (t )] + Cov[n(t ), c (t )])
dt
dE[c (t )]
= λE[n(t )]
dt

The equation for the E[n (t )] depends on the Cov[n (t ), c (t )]
Setting Cov[n (t ), c (t )]=0 gives the deterministic model
We obtain similar equations for higher-order moments

26/43

Parameter inference
Given
the parameters: {λ, µ}
the initial states: X(tu −1 ) = (n (tu −1 ), c (tu −1 ));
We have
X(tu ) | X(tu −1 ), λ, µ ∼ N (ψu −1 , Σu −1 )

where ψu −1 and Σu −1 are calculated using the moment closure
approximation

27/43

Parameter inference
Summarising our beliefs about {λ, µ} and the unobserved
cumulative population c (t0 ) via priors p (λ, µ) and p (c (t0 ))
The joint posterior for parameters and unobserved states (for a single
data set) is

4
p (λ, µ, c | n) ∝ p (λ, µ) p (c(t0 )) ∏ p (x(tu ) | x(tu−1 ), λ, µ)
u =1

For the results shown, we used a simple random walk MH step to
explore the parameter and state spaces
For more complicated models, we can use a Durham & Gallant style
bridge (Milner, G & Wilkinson, 2012).

28/43

Simulation study
Three treatments & two blocks
Baseline birth and death rates: {λ = 1.75, µ = 0.00095}
Treatment 2 increases µ by 0.0004
Treatment 3 increases λ by 0.35
The block effect reduces µ by 0.0003

Treatment 1 Treatment 2 Treatment 3
Block 1 {1.75, 0.00095} {1.75, 0.00135} {2.1, 0.00095}
Block 2 {1.75, 0.00065} {1.75, 0.00105} {2.1, 0.00065}

29/43

Simulated data

Treament 1 Treatment 2 Treatment 3
1500

q Block
Population

1000
q q 1

q 2
500 q
q
q q
q
q
q q
q q q
0 q

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Time

30/43

Parameter structure
Let i , k represent the block and treatments level, i ∈ {1, 2} and
k ∈ {1, 2, 3}
For each data set, we assume birth rates of the form:

λik = λ + αi + β k

where α1 = β 1 = 0
So for block 1, treatment 1 we have:

λ11 = λ

and for block 2, treatment 1 we have:

λ21 = λ + α2
31/43

MCMC scheme
Using the MCMC scheme described previously, we generated 2M
iterates and thinned by 1K
This took a few hours and convergence was fairly quick
We used independent proper uniform priors for the parameters
For the initial unobserved cumulative population, we had

c (t0 ) = n (t0 ) +

where has a Gamma distribution with shape 1 and scale 10.
This set up mirrors the scheme that we used for the real data set

32/43

Marginal posterior distributions for
λ and µ

20000

6
15000
Density

Density
4
10000

2
5000

0
X 0
X
1.6 1.7 1.8 1.9 2.0 0.00090 0.00095 0.00100

Birth Rate Death Rate

33/43

Marginal posterior distributions for birth
rates
−0.2 0.0 0.2 0.4

Block 2 Treatment 2 Treatment 3

6
Density

4

2

0 X X X
−0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4

Birth Rate

We obtained similar densities for the death rates.

34/43

Application to the cotton aphid data set
Recall that the data consists of
ﬁve observations on twenty randomly chosen leaves in each plot;
three blocks, each being in a distinct area;
three irrigation treatments (low, medium and high);
three nitrogen levels (blanket, variable and none);
the sampling times are t=0, 1.14, 2.29, 3.57 and 4.57 weeks (i.e.
every 7 to 8 days).
Following in the same vein as the simulated data, we are estimating 38
parameters (including interaction terms) and the latent cumulative aphid
population.

35/43

Cotton aphid data
Marginal posterior distributions

6
15000
Density

Density
4
10000

2 5000

0 0

1.6 1.7 1.8 1.9 2.0 0.00090 0.00095 0.00100

Birth Rate Death Rate

36/43

Does the model ﬁt the data?
We simulate predictive distributions from the MCMC output, i.e. we
randomly sample parameter values (λ, µ) and the unobserved state
c and simulate forward
We simulate forward using the Gillespie simulator
not the moment closure approximation

37/43

Does the model ﬁt the data?

Predictive distributions for 6 of the 27 Aphid data sets
D 123 D 121 D131

2500

2000

1500
X
q
q q
q 1000
X
q

q
X q

q
q q
Aphid Population

q q
q
q q
q
q 500
X
q q

q
X
q q

q
q q X
q
q

q
q X X

q
q
q
X
q X q

q
q
X X 0
q
D 112 D 122 D 113
q
q
X
2500

q
q

2000

1500 q
q
X
q
q q
q
1000
q
q q
q
X q
q X
q
q
q
q

q
q
q
500 X q q
X q
q
X q

q
q q
q
X
q

q
q
X X q
X X
q
0
q
1.14 2.29 3.57 4.57 1.14 2.29 3.57 4.57 1.14 2.29 3.57 4.57

Time

38/43

Summarising the results
Consider the additional number of aphids per treatment combination
Set c (0) = n (0) = 1 and tmax = 6
We now calculate the number of aphids we would see for each
parameter combination in addition to the baseline
For example, the effect due to medium water:

∗
λ211 = λ + αWater (M) and µ211 = µ + αWater (M)

So
i i
Additional aphids = cWater (M) − cbaseline

39/43

Aphids over baseline
Main Effects
0 2000 6000 10000

Nitrogen (V) Water (H) Water (M)

0.0025

0.0020

0.0015

0.0010

0.0005

0.0000
Density

Block 3 Block 2 Nitrogen (Z)

0.0025

0.0020

0.0015

0.0010

0.0005

0.0000

0 2000 6000 10000 0 2000 6000 10000

Aphids

40/43

Aphids over baseline
Interactions
0 2000 6000 10000 0 2000 6000 10000

W(H) N(Z) W(M) N(Z) W(H) N(V) W(M) N(V)

0.003

0.002

0.001

0.000
B3 W(H) B2 W(H) B3 W(M) B2 W(M)

0.003
Density

0.002

0.001

0.000
B3 N(Z) B2 N(Z) B3 N(V) B2 N(V)

0.003

0.002

0.001

0.000

0 2000 6000 10000 0 2000 6000 10000

Aphids

40/43

Conclusions
The 95% credible intervals for the baseline birth and death rates are
(1.64, 1.86) and (0.00090, 0.00099).
Main effects have little effect by themselves
However block 2 appears to have a very strong interaction with
nitrogen
Moment closure parameter inference is a very useful technique for
estimating parameters in stochastic population models

41/43

Future work

Aphid model
Other data sets suggest that there is aphid immigration in the early
stages
Model selection for stochastic models
Incorporate measurement error

Moment closure
Better closure techniques
Assessing the ﬁt

42/43

Acknowledgements
Andrew Golightly Richard Boys
Peter Milner
Darren Wilkinson Jim Matis (Texas A & M)

References
Gillespie, CS Moment closure approximations for mass-action models. IET Systems Biology 2009.

Gillespie, CS, Golightly, A Bayesian inference for generalized stochastic population growth models with application to aphids.
Journal of the Royal Statistical Society, Series C 2010.
Milner, P, Gillespie, CS, Wilkinson, DJ Moment closure approximations for stochastic kinetic models with rational rate laws.
Mathematical Biosciences 2011.
Milner, P, Gillespie, CS and Wilkinson, DJ Moment closure based parameter inference of stochastic kinetic models.
Statistics and Computing 2012.

43/43

Moment closure inference for stochastic kinetic models

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (8)

Similaire à Moment closure inference for stochastic kinetic models

Similaire à Moment closure inference for stochastic kinetic models (20)

Plus de Colin Gillespie

Plus de Colin Gillespie (7)

Dernier

Dernier (20)

Moment closure inference for stochastic kinetic models