SlideShare une entreprise Scribd logo
1  sur  117
Télécharger pour lire hors ligne
ABC random forests for Bayesian testing and
parameter inference
Christian P. Robert
Universit´e Paris-Dauphine, Paris & University of Warwick, Coventry
Joint work with A. Estoup, J.M. Marin, P. Pudlo, L Raynal, & M. Ribatet
Outline
Approximate Bayesian computation
ABC for model choice
ABC model choice via random forests
ABC estimation via random forests
Approximate Bayesian computation
Approximate Bayesian computation
ABC basics
Exact ABC simulation of
approximate targets
Automated summary selection
ABC for model choice
ABC model choice via random forests
ABC estimation via random forests
Untractable likelihoods
Cases when the likelihood function
f (y|θ) is unavailable and when the
completion step
f (y|θ) =
Z
f (y, z|θ) dz
is impossible or too costly because of
the dimension of z
c MCMC cannot be implemented
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavar´e et al., 1997]
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavar´e et al., 1997]
A as A...pproximative
When y is a continuous random variable, equality z = y is
replaced with a tolerance condition,
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
A as A...pproximative
When y is a continuous random variable, equality z = y is
replaced with a tolerance condition,
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler 2
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)}
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
MA example
MA(q) model
xt = t +
q
i=1
ϑi t−i
Simple prior: uniform over the inverse [real and complex] roots in
Q(u) = 1 −
q
i=1
ϑi ui
under the identifiability conditions
MA example
MA(q) model
xt = t +
q
i=1
ϑi t−i
Simple prior: uniform prior over the identifiability zone, e.g.
triangle for MA(2)
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1, ϑ2) in the triangle
2. generating an iid sequence ( t)−q<t T
3. producing a simulated series (xt )1 t T
Distance: basic distance between the series
ρ((xt )1 t T , (xt)1 t T ) =
T
t=1
(xt − xt )2
or distance between summary statistics like the q autocorrelations
τj =
T
t=j+1
xtxt−j
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1, ϑ2) in the triangle
2. generating an iid sequence ( t)−q<t T
3. producing a simulated series (xt )1 t T
Distance: basic distance between the series
ρ((xt )1 t T , (xt)1 t T ) =
T
t=1
(xt − xt )2
or distance between summary statistics like the q autocorrelations
τj =
T
t=j+1
xtxt−j
Comparison of distance impact
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
Comparison of distance impact
0.0 0.2 0.4 0.6 0.8
01234
θ1
−2.0 −1.0 0.0 0.5 1.0 1.5
0.00.51.01.5
θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
Comparison of distance impact
0.0 0.2 0.4 0.6 0.8
01234
θ1
−2.0 −1.0 0.0 0.5 1.0 1.5
0.00.51.01.5
θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC consistency
Recent studies on large sample properties of ABC posterior
distributions and ABC posterior means
[Liu & Fearnhead, 2016; Frazier et al., 2016]
Under regularity conditions on summary statistics,
incl. convergence at speed dT , characterisation of rate of posterior
concentration as a function of tolerance convergence
less stringent condition on tolerance decrease than for
asymptotic normality of posterior;
asymptotic normality of posterior mean does not require
asymptotic normality of posterior itself
Cases for limiting ABC distributions
1. dT T −→ +∞;
2. dT T −→ c;
3. dT T −→ 0
and limiting ABC mean convergent for 2
T = o(1/dT )
[Frazier et al., 2016]
ABC consistency
Recent studies on large sample properties of ABC posterior
distributions and ABC posterior means
[Liu & Fearnhead, 2016; Frazier et al., 2016]
Under regularity conditions on summary statistics,
incl. convergence at speed dT , characterisation of rate of posterior
concentration as a function of tolerance convergence
less stringent condition on tolerance decrease than for
asymptotic normality of posterior;
asymptotic normality of posterior mean does not require
asymptotic normality of posterior itself
Cases for limiting ABC distributions
1. dT T −→ +∞;
2. dT T −→ c;
3. dT T −→ 0
and limiting ABC mean convergent for 2
T = o(1/dT )
[Frazier et al., 2016]
Wilkinson’s exact BC
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π (θ, z|y) =
π(θ)f (z|θ)K (y − z)
π(θ)f (z|θ)K (y − z)dzdθ
,
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = ˜y + ξ, ξ ∼ K , and an acceptance probability of
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
Wilkinson’s exact BC
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π (θ, z|y) =
π(θ)f (z|θ)K (y − z)
π(θ)f (z|θ)K (y − z)dzdθ
,
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = ˜y + ξ, ξ ∼ K , and an acceptance probability of
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
How exact a BC?
“Using to represent measurement error is
straightforward, whereas using to model the model
discrepancy is harder to conceptualize and not as
commonly used”
[Richard Wilkinson, 2008, 2013]
How exact a BC?
Pros
Pseudo-data from true model and observed data from noisy
model
Interesting perspective in that outcome is completely
controlled
Link with ABCµ and assuming y is observed with a
measurement error with density K
Relates to the theory of model approximation
[Kennedy & O’Hagan, 2001]
Cons
Requires K to be bounded by M
True approximation error never assessed
Requires a modification of the standard ABC algorithm
Noisy ABC
Idea: Modify the data from the start
˜y = y0 + ζ1
with the same scale as ABC
[ see Fearnhead-Prangle ]
run ABC on ˜y
Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y)
[Dean et al., 2011; Fearnhead and Prangle, 2012]
Noisy ABC
Idea: Modify the data from the start
˜y = y0 + ζ1
with the same scale as ABC
[ see Fearnhead-Prangle ]
run ABC on ˜y
Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y)
[Dean et al., 2011; Fearnhead and Prangle, 2012]
Consistent noisy ABC
Degrading the data improves the estimation performances:
Noisy ABC-MLE is asymptotically (in n) consistent
under further assumptions, the noisy ABC-MLE is
asymptotically normal
increase in variance of order −2
likely degradation in precision or computing time due to the
lack of summary statistic [curse of dimensionality]
Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
˜η(y) = η(y) + τ
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
Optimality of the posterior expectation E[θ|y] of the parameter of
interest as summary statistics η(y)!
Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
˜η(y) = η(y) + τ
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
Optimality of the posterior expectation E[θ|y] of the parameter of
interest as summary statistics η(y)!
ABC for model choice
Approximate Bayesian computation
ABC for model choice
ABC model choice via random forests
ABC estimation via random forests
Bayesian model choice
Several models M1, M2, . . . are considered simultaneously for a
dataset y and the model index M is part of the inference.
Use of a prior distribution. π(M = m), plus a prior distribution on
the parameter conditional on the value m of the model index,
πm(θm)
Goal is to derive the posterior distribution of M, challenging
computational target when models are complex.
Generic ABC for model choice
Algorithm 2 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm(θm)
Generate z from the model fm(z|θm)
until ρ{η(z), η(y)} <
Set m(t) = m and θ(t)
= θm
end for
[Cornuet et al., DIYABC, 2009]
ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
1
T
T
t=1
Im(t)=m .
Issues with implementation:
should tolerances be the same for all models?
should summary statistics vary across models (incl. their
dimension)?
should the distance measure ρ vary as well?
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 and
η2(x) sufficient statistic for model m = 2 and parameter θ2,
(η1(x), η2(x)) is not always sufficient for (m, θm)
c Potential loss of information at the testing level
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 and
η2(x) sufficient statistic for model m = 2 and parameter θ2,
(η1(x), η2(x)) is not always sufficient for (m, θm)
c Potential loss of information at the testing level
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 and
η2(x) sufficient statistic for model m = 2 and parameter θ2,
(η1(x), η2(x)) is not always sufficient for (m, θm)
c Potential loss of information at the testing level
Limiting behaviour of B12 (T → ∞)
ABC approximation
B12(y) =
T
t=1 Imt =1 Iρ{η(zt ),η(y)}
T
t=1 Imt =2 Iρ{η(zt ),η(y)}
,
where the (mt, zt)’s are simulated from the (joint) prior
As T go to infinity, limit
B12(y) =
Iρ{η(z),η(y)} π1(θ1)f1(z|θ1) dz dθ1
Iρ{η(z),η(y)} π2(θ2)f2(z|θ2) dz dθ2
=
Iρ{η,η(y)} π1(θ1)f η
1 (η|θ1) dη dθ1
Iρ{η,η(y)} π2(θ2)f η
2 (η|θ2) dη dθ2
,
where f η
1 (η|θ1) and f η
2 (η|θ2) distributions of η(z)
Limiting behaviour of B12 (T → ∞)
ABC approximation
B12(y) =
T
t=1 Imt =1 Iρ{η(zt ),η(y)}
T
t=1 Imt =2 Iρ{η(zt ),η(y)}
,
where the (mt, zt)’s are simulated from the (joint) prior
As T go to infinity, limit
B12(y) =
Iρ{η(z),η(y)} π1(θ1)f1(z|θ1) dz dθ1
Iρ{η(z),η(y)} π2(θ2)f2(z|θ2) dz dθ2
=
Iρ{η,η(y)} π1(θ1)f η
1 (η|θ1) dη dθ1
Iρ{η,η(y)} π2(θ2)f η
2 (η|θ2) dη dθ2
,
where f η
1 (η|θ1) and f η
2 (η|θ2) distributions of η(z)
Limiting behaviour of B12 ( → 0)
When goes to zero,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
c Bayes factor based on the sole observation of η(y)
Limiting behaviour of B12 ( → 0)
When goes to zero,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
c Bayes factor based on the sole observation of η(y)
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) =
Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) =
Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
Poisson/geometric example
Sample
x = (x1, . . . , xn)
from either a Poisson P(λ) or from a geometric G(p) Then
S =
n
i=1
yi = η(x)
sufficient statistic for either model but not simultaneously
Discrepancy ratio
g1(x)
g2(x)
=
S!n−S / i yi !
1 n+S−1
S
Poisson/geometric discrepancy
Range of B12(x) versus Bη
12(x) B12(x): The values produced have
nothing in common.
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
In the Poisson/geometric case, if i xi ! is added to S, no
discrepancy
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
Only applies in genuine sufficiency settings...
c Inability to evaluate loss brought by summary statistics
MA(q) divergence
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor
equal to 17.71.
MA(q) divergence
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
equal to .004.
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
Note/warnin: c drawn on T(y) through BT
12(y) necessarily differs
from c drawn on y through B12(y)
[Marin, Pillai, X, & Rousseau, JRSS B, 2013]
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
Note/warnin: c drawn on T(y) through BT
12(y) necessarily differs
from c drawn on y through B12(y)
[Marin, Pillai, X, & Rousseau, JRSS B, 2013]
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
Four possible statistics
1. sample mean y (sufficient for M1 if not M2);
2. sample median med(y) (insufficient);
3. sample variance var(y) (ancillary);
4. median absolute deviation mad(y) = med(|y − med(y)|);
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.10.20.30.40.50.60.7
n=100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.20.40.60.81.0
n=100
Framework
Starting from sample
y = (y1, . . . , yn)
the observed sample, not necessarily iid with true distribution
y ∼ Pn
Summary statistics
T(y) = Tn
= (T1(y), T2(y), · · · , Td (y)) ∈ Rd
with true distribution Tn
∼ Gn.
Framework
c Comparison of
– under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1
– under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2
turned into
– under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn
)
– under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn
)
Assumptions
A collection of asymptotic “standard” assumptions:
[A1] is a standard central limit theorem under the true model with
asymptotic mean µ0
[A2] controls the large deviations of the estimator Tn
from the
model mean µ(θ)
[A3] is the standard prior mass condition found in Bayesian
asymptotics (di effective dimension of the parameter)
[A4] restricts the behaviour of the model density against the true
density
[Think CLT!]
Asymptotic marginals
Asymptotically, under [A1]–[A4]
mi (t) =
Θi
gi (t|θi ) πi (θi ) dθi
is such that
(i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0,
Cl vd−di
n mi (Tn
) Cuvd−di
n
and
(ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0
mi (Tn
) = oPn [vd−τi
n + vd−αi
n ].
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Cl v
−(d1−d2)
n m1(Tn
) m2(Tn
) Cuv
−(d1−d2)
n ,
where Cl , Cu = OPn (1), irrespective of the true model.
c Only depends on the difference d1 − d2: no consistency
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
m1(Tn
)
m2(Tn
)
Cu min v
−(d1−α2)
n , v
−(d1−τ2)
n
Checking for adequate statistics
Run a practical check of the relevance (or non-relevance) of Tn
null hypothesis that both models are compatible with the statistic
Tn
H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0
against
H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0
testing procedure provides estimates of mean of Tn
under each
model and checks for equality
Checking in practice
Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L
For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn
(yi,l ) and
compute
^µi =
1
L
L
l=1
Tn
(yi,l ), i = 1, 2 .
Conditionally on Tn
(y),
√
L { ^µi − Eπ
[µi (θi )|Tn
(y)]} N(0, Vi ),
Test for a common mean
H0 : ^µ1 ∼ N(µ0, V1) , ^µ2 ∼ N(µ0, V2)
against the alternative of different means
H1 : ^µi ∼ N(µi , Vi ), with µ1 = µ2 .
Toy example: Laplace versus Gauss
qqqqqqqqqqqqqqq
qqqqqqqqqq
q
qq
q
q
Gauss Laplace Gauss Laplace
010203040
Normalised χ2 without and with mad
ABC model choice via random forests
Approximate Bayesian computation
ABC for model choice
ABC model choice via random forests
Random forests
ABC with random forests
Illustrations
ABC estimation via random forests
Leaning towards machine learning
Main notions:
ABC-MC seen as learning about which model is most
appropriate from a huge (reference) table
exploiting a large number of summary statistics not an issue
for machine learning methods intended to estimate efficient
combinations
abandoning (temporarily?) the idea of estimating posterior
probabilities of the models, poorly approximated by machine
learning methods, and replacing those by posterior predictive
expected loss
[Cornuet et al., 2016]
Random forests
Technique that stemmed from Leo Breiman’s bagging (or
bootstrap aggregating) machine learning algorithm for both
classification and regression
[Breiman, 1996]
Improved classification performances by averaging over
classification schemes of randomly generated training sets, creating
a “forest” of (CART) decision trees, inspired by Amit and Geman
(1997) ensemble learning
[Breiman, 2001]
Growing the forest
Breiman’s solution for inducing random features in the trees of the
forest:
boostrap resampling of the dataset and
random subset-ing [of size
√
t] of the covariates driving the
classification at every node of each tree
Covariate xτ that drives the node separation
xτ cτ
and the separation bound cτ chosen by minimising entropy or Gini
index
Breiman and Cutler’s algorithm
Algorithm 3 Random forests
for t = 1 to T do
//*T is the number of trees*//
Draw a bootstrap sample of size nboot
Grow an unpruned decision tree
for b = 1 to B do
//*B is the number of nodes*//
Select ntry of the predictors at random
Determine the best split from among those predictors
end for
end for
Predict new data by aggregating the predictions of the T trees
[ c Tae-Kyun Kim & Bjorn Stenger, 2009]
Subsampling
Due to both large datasets [practical] and theoretical
recommendation from G´erard Biau [private communication], from
independence between trees to convergence issues, boostrap
sample of much smaller size than original data size
N = o(n)
Each CART tree stops when number of observations per node is 1:
no culling of the branches
Subsampling
Due to both large datasets [practical] and theoretical
recommendation from G´erard Biau [private communication], from
independence between trees to convergence issues, boostrap
sample of much smaller size than original data size
N = o(n)
Each CART tree stops when number of observations per node is 1:
no culling of the branches
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
at each step O(
√
p) indices sampled at random and most
discriminating statistic selected, by minimising entropy Gini loss
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
Average of the trees is resulting summary statistics, highly
non-linear predictor of the model index
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observed
dataset: The predictor provided by the forest is “sufficient” to
select the most likely model but not to derive associated posterior
probability
exploit entire forest by computing how many trees lead to
picking each of the models under comparison but variability
too high to be trusted
frequency of trees associated with majority model is no proper
substitute to the true posterior probability
And usual ABC-MC approximation equally highly variable and
hard to assess
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observed
dataset: The predictor provided by the forest is “sufficient” to
select the most likely model but not to derive associated posterior
probability
exploit entire forest by computing how many trees lead to
picking each of the models under comparison but variability
too high to be trusted
frequency of trees associated with majority model is no proper
substitute to the true posterior probability
And usual ABC-MC approximation equally highly variable and
hard to assess
Posterior predictive expected losses
We suggest replacing unstable approximation of
P(M = m|xo)
with xo observed sample and m model index, by average of the
selection errors across all models given the data xo,
P( ^M(X) = M|xo)
where pair (M, X) generated from the predictive
f (x|θ)π(θ, M|xo)dθ
and ^M(x) denotes the random forest model (MAP) predictor
Posterior predictive expected losses
Arguments:
Bayesian estimate of the posterior error
integrates error over most likely part of the parameter space
gives an averaged error rather than the posterior probability of
the null hypothesis
easily computed: Given ABC subsample of parameters from
reference table, simulate pseudo-samples associated with
those and derive error frequency
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using first two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #1: values of p(m|x) [obtained by numerical integration]
and p(m|S(x)) [obtained by mixing ABC outcome and density
estimation] highly differ!
toy: MA(1) vs. MA(2)
Difference between the posterior probability of MA(2) given either
x or S(x). Blue stands for data from MA(1), orange for data from
MA(2)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #2: Embedded models, with simulations from MA(1)
within those from MA(2), hence linear classification poor
toy: MA(1) vs. MA(2)
Simulations of S(x) under MA(1) (blue) and MA(2) (orange)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #3: On such a small dimension problem, random forests
should come second to k-nn ou kernel discriminant analyses
toy: MA(1) vs. MA(2)
classification prior
method error rate (in %)
LDA 27.43
Logist. reg. 28.34
SVM (library e1071) 17.17
“na¨ıve” Bayes (with G marg.) 19.52
“na¨ıve” Bayes (with NP marg.) 18.25
ABC k-nn (k = 100) 17.23
ABC k-nn (k = 50) 16.97
Local log. reg. (k = 1000) 16.82
Random Forest 17.04
Kernel disc. ana. (KDA) 16.95
True MAP 12.36
Comments
unlimited aggregation of arbitrary summary statistics
recovery of discriminant statistics when available
automated implementation with reduced calibration
self-evaluation by posterior predictive error
soon to be included within DIYABC
ABC estimation via random forests
Approximate Bayesian computation
ABC for model choice
ABC model choice via random forests
ABC estimation via random forests
Two basic issues with ABC
ABC compares numerous simulated dataset to the observed one
Two major difficulties:
to decrease approximation error (or tolerance ) and hence
ensure reliability of ABC, total number of simulations very
large;
calibration of ABC (tolerance, distance, summary statistics,
post-processing, &tc) critical and hard to automatise
classification of summaries by random forests
Given a large collection of summary statistics, rather than selecting
a subset and excluding the others, estimate each parameter of
interest by a machine learning tool like random forests
RF can handle thousands of predictors
ignore useless components
fast estimation method with good local properties
automatised method with few calibration steps
substitute to Fearnhead and Prangle (2012) preliminary
estimation of ^θ(yobs)
includes a natural (classification) distance measure that avoids
choice of both distance and tolerance
[Marin et al., 2016]
random forests as non-parametric regression
CART means Classification and Regression Trees
For regression purposes, i.e., to predict y as f (x), similar binary
trees in random forests
1. at each tree node, split data into two daughter nodes
2. split variable and bound chosen to minimise heterogeneity
criterion
3. stop splitting when enough homogeneity in current branch
4. predicted values at terminal nodes (or leaves) are average
response variable y for all observations in final leaf
Illustration
conditional expectation f (x) and well-specified dataset
Illustration
single regression tree
Illustration
ten regression trees obtained by bagging (Bootstrap AGGregatING)
Illustration
average of 100 regression trees
bagging reduces learning variance
When growing forest with many trees,
grow each tree on an independent bootstrap sample
at each node, select m variables at random out of all M
possible variables
Find the best dichotomous split on the selected m variables
predictor function estimated by averaging trees
Improve on CART with respect to accuracy and stability
bagging reduces learning variance
When growing forest with many trees,
grow each tree on an independent bootstrap sample
at each node, select m variables at random out of all M
possible variables
Find the best dichotomous split on the selected m variables
predictor function estimated by averaging trees
Improve on CART with respect to accuracy and stability
prediction error
A given simulation (ysim, xsim) in the training table is not used in
about 1/3 of the trees (“out-of-bag” case)
Average predictions ^Foob(xsim) of these trees to give out-of-bag
predictor of ysim
Related methods
adjusted local linear: Beaumont et al. (2002) Approximate Bayesian
computation in population genetics, Genetics
ridge regression: Blum et al. (2013) A Comparative Review of
Dimension Reduction Methods in Approximate Bayesian Computation,
Statistical Science
linear discriminant analysis: Estoup et al. (2012) Estimation of
demo-genetic model probabilities with Approximate Bayesian
Computation using linear discriminant analysis on summary statistics,
Molecular Ecology Resources
adjusted neural networks: Blum and Fran¸cois (2010) Non-linear
regression models for Approximate Bayesian Computation, Statistics and
Computing
ABC parameter estimation (ODOF)
One dimension = one forest (ODOF) methodology
parametric statistical model:
{f (y; θ): y ∈ Y, θ ∈ Θ}, Y ⊆ Rn
, Θ ⊆ Rp
with intractable density f (·; θ)
plus prior distribution π(θ)
Inference on quantity of interest
ψ(θ) ∈ R
(posterior means, variances, quantiles or covariances)
ABC parameter estimation (ODOF)
One dimension = one forest (ODOF) methodology
parametric statistical model:
{f (y; θ): y ∈ Y, θ ∈ Θ}, Y ⊆ Rn
, Θ ⊆ Rp
with intractable density f (·; θ)
plus prior distribution π(θ)
Inference on quantity of interest
ψ(θ) ∈ R
(posterior means, variances, quantiles or covariances)
common reference table
Given η: Y → Rk a collection of summary statistics
produce reference table (RT) used as learning dataset for
multiple random forests
meaning, for 1 t N
1. simulate θ(t)
∼ π(θ)
2. simulate ˜yt = (˜y1,t, . . . , ˜yn,t) ∼ f (y; θ(t)
)
3. compute η(˜yt) = {η1(˜yt), . . . , ηk (˜yt)}
ABC posterior expectations
Recall that θ = (θ1, . . . , θd ) ∈ Rd
For each θj , construct a separate RF regression with predictors
variables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)}
If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗)
—leaf reached through path of binary choices in tree—, with |Lb|
response variables
E(θj | η(y∗)) =
1
B
B
b=1
1
|Lb(η(y∗))|
t:η(yt )∈Lb(η(y∗))
θ
(t)
j
is our ABC estimate
ABC posterior expectations
For each θj , construct a separate RF regression with predictors
variables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)}
If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗)
—leaf reached through path of binary choices in tree—, with |Lb|
response variables
E(θj | η(y∗)) =
1
B
B
b=1
1
|Lb(η(y∗))|
t:η(yt )∈Lb(η(y∗))
θ
(t)
j
is our ABC estimate
ABC posterior quantile estimate
Random forests also available for quantile regression
[Meinshausen, 2006, JMLR]
Since
^E(θj | η(y∗
)) =
N
t=1
wt(η(y∗
))θ
(t)
j
with
wt(η(y∗
)) =
1
B
B
b=1
ILb(η(y∗))(η(yt))
|Lb(η(y∗))|
natural estimate of the cdf of θj is
^F(u | η(y∗
)) =
N
t=1
wt(η(y∗
))I{θ
(t)
j u}
.
ABC posterior quantile estimate
Since
^E(θj | η(y∗
)) =
N
t=1
wt(η(y∗
))θ
(t)
j
with
wt(η(y∗
)) =
1
B
B
b=1
ILb(η(y∗))(η(yt))
|Lb(η(y∗))|
natural estimate of the cdf of θj is
^F(u | η(y∗
)) =
N
t=1
wt(η(y∗
))I{θ
(t)
j u}
.
ABC posterior quantiles + credible intervals given by ^F−1
ABC variances
Even though approximation of Var(θj | η(y∗)) available based on
^F, choice of alternative and slightly more involved version
In a given tree b in a random forest, existence of out-of-baf entries,
i.e., not sampled in associated bootstrap subsample
Use of out-of-bag simulations to produce estimate of E{θj | η(yt)},
˜θj
(t)
,
Apply weights ωt(η(y∗)) to out-of-bag residuals:
Var(θj | η(y∗
)) =
N
t=1
ωt(η(y∗
)) (θ
(t)
j − ˜θj
(t) 2
ABC variances
Even though approximation of Var(θj | η(y∗)) available based on
^F, choice of alternative and slightly more involved version
In a given tree b in a random forest, existence of out-of-baf entries,
i.e., not sampled in associated bootstrap subsample
Use of out-of-bag simulations to produce estimate of E{θj | η(yt)},
˜θj
(t)
,
Apply weights ωt(η(y∗)) to out-of-bag residuals:
Var(θj | η(y∗
)) =
N
t=1
ωt(η(y∗
)) (θ
(t)
j − ˜θj
(t) 2
ABC covariances
For estimating Cov(θj , θ | η(y∗)), construction of a specific
random forest
product of out-of-bag errors for θj and θ
θ
(t)
j − ˜θj
(t)
θ
(t)
− ˜θ
(t)
with again predictors variables the summary statistics
η(y) = {η1(y), . . . , ηk(y)}
Gaussian toy example
Take
(y1, . . . , yn) | θ1, θ2 ∼iid N(θ1, θ2), n = 10
θ1 | θ2 ∼ N(0, θ2)
θ2 ∼ IG(4, 3)
θ1 | y ∼ T n + 8, (n¯y)/(n + 1), (s2
+ 6)/((n + 1)(n + 8))
θ2 | y ∼ IG n/2 + 4, s2
/2 + 3
Closed-form theoretical values like
ψ1(y) = E(θ1 | y), ψ2(y) = E(θ2 | y), ψ3(y) = Var(θ1 | y) and
ψ4(y) = Var(θ2 | y)
Gaussian toy example
Reference table of N = 10, 000 Gaussian replicates
Independent Gaussian test set of size Npred = 100
k = 53 summary statistics: the sample mean, the sample
variance and the sample median absolute deviation, and 50
independent pure-noise variables (uniform [0,1])
Gaussian toy example
−2 −1 0 1 2
−2−1012
ψ1
ψ~
1
0.5 1.0 1.5 2.0 2.5
0.51.01.52.02.5
ψ2
ψ~
2
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.050.150.250.35
ψ3
ψ~
3
0.0 0.2 0.4 0.6 0.80.00.20.40.60.8
ψ4
ψ~
4
Scatterplot of the theoretical values with their corresponding
estimates
Gaussian toy example
−4 −3 −2 −1 0 1 2
−4−3−2−1012
Q0.025(θ1 | y)
Q
~
0.025(θ1|y)
−1 0 1 2 3 4
−101234
Q0.975(θ1 | y)
Q
~
0.975(θ1|y)
0.2 0.4 0.6 0.8 1.0 1.2
0.20.40.60.81.01.2
Q0.025(θ2 | y)
Q
~
0.025(θ2|y)
1 2 3 4 512345
Q0.975(θ2 | y)
Q
~
0.975(θ2|y)
Scatterplot of the theoretical values of 2.5% and 97.5% posterior
quantiles for θ1 and θ2 with their corresponding estimates
Gaussian toy example
ODOF adj local linear adj ridge adj neural net
ψ1(y) = E(θ1 | y) 0.21 0.42 0.38 0.42
ψ2(y) = E(θ2 | y) 0.11 0.20 0.26 0.22
ψ3(y) = Var(θ1 | y) 0.47 0.66 0.75 0.48
ψ4(y) = Var(θ2 | y) 0.46 0.85 0.73 0.98
Q0.025(θ1|y) 0.69 0.55 0.78 0.53
Q0.025(θ2|y) 0.06 0.45 0.68 1.02
Q0.975(θ1|y) 0.48 0.55 0.79 0.50
Q0.975(θ2|y) 0.18 0.23 0.23 0.38
Comparison of normalized mean absolute errors
Gaussian toy example
True ODOF loc linear ridge Neural net
0.00.10.20.30.40.5
Var
~
(θ1|y)
True ODOF loc linear ridge Neural net
0.00.20.40.60.81.0
Var
~
(θ2|y)
Boxplot comparison of Var(θ1 | y), Var(θ2 | y) with the true
values, ODOF and usual ABC methods
Comments
ABC RF methods mostly insensitive both to strong correlations
between the summary statistics and to the presence of noisy
variables.
implies less number of simulations and no calibration
Next steps: adaptive schemes, deep learning, inclusion in DIYABC

Contenu connexe

Tendances

Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Christian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 

Tendances (20)

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 

En vedette

from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forestsHoang Nguyen
 
Ratio of uniforms and beyond
Ratio of uniforms and beyondRatio of uniforms and beyond
Ratio of uniforms and beyondChristian Robert
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"Akisato Kimura
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysispotaters
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Salesforce Engineering
 

En vedette (9)

from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
 
Ratio of uniforms and beyond
Ratio of uniforms and beyondRatio of uniforms and beyond
Ratio of uniforms and beyond
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 

Similaire à ABC random forests

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsChristian Robert
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschChristian Robert
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10FredrikRonquist
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013Christian Robert
 

Similaire à ABC random forests (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Boston talk
Boston talkBoston talk
Boston talk
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
 
DIC
DICDIC
DIC
 
mcmc
mcmcmcmc
mcmc
 

Plus de Christian Robert

How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 

Plus de Christian Robert (15)

discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 

Dernier

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 

Dernier (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 

ABC random forests

  • 1. ABC random forests for Bayesian testing and parameter inference Christian P. Robert Universit´e Paris-Dauphine, Paris & University of Warwick, Coventry Joint work with A. Estoup, J.M. Marin, P. Pudlo, L Raynal, & M. Ribatet
  • 2. Outline Approximate Bayesian computation ABC for model choice ABC model choice via random forests ABC estimation via random forests
  • 3. Approximate Bayesian computation Approximate Bayesian computation ABC basics Exact ABC simulation of approximate targets Automated summary selection ABC for model choice ABC model choice via random forests ABC estimation via random forests
  • 4. Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = Z f (y, z|θ) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented
  • 5. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  • 6. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  • 7. A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 8. A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 9. ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 10. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
  • 11. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
  • 12. MA example MA(q) model xt = t + q i=1 ϑi t−i Simple prior: uniform over the inverse [real and complex] roots in Q(u) = 1 − q i=1 ϑi ui under the identifiability conditions
  • 13. MA example MA(q) model xt = t + q i=1 ϑi t−i Simple prior: uniform prior over the identifiability zone, e.g. triangle for MA(2)
  • 14. MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1, ϑ2) in the triangle 2. generating an iid sequence ( t)−q<t T 3. producing a simulated series (xt )1 t T Distance: basic distance between the series ρ((xt )1 t T , (xt)1 t T ) = T t=1 (xt − xt )2 or distance between summary statistics like the q autocorrelations τj = T t=j+1 xtxt−j
  • 15. MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1, ϑ2) in the triangle 2. generating an iid sequence ( t)−q<t T 3. producing a simulated series (xt )1 t T Distance: basic distance between the series ρ((xt )1 t T , (xt)1 t T ) = T t=1 (xt − xt )2 or distance between summary statistics like the q autocorrelations τj = T t=j+1 xtxt−j
  • 16. Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 17. Comparison of distance impact 0.0 0.2 0.4 0.6 0.8 01234 θ1 −2.0 −1.0 0.0 0.5 1.0 1.5 0.00.51.01.5 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 18. Comparison of distance impact 0.0 0.2 0.4 0.6 0.8 01234 θ1 −2.0 −1.0 0.0 0.5 1.0 1.5 0.00.51.01.5 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 19. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 20. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 21. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 22. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 23. ABC consistency Recent studies on large sample properties of ABC posterior distributions and ABC posterior means [Liu & Fearnhead, 2016; Frazier et al., 2016] Under regularity conditions on summary statistics, incl. convergence at speed dT , characterisation of rate of posterior concentration as a function of tolerance convergence less stringent condition on tolerance decrease than for asymptotic normality of posterior; asymptotic normality of posterior mean does not require asymptotic normality of posterior itself Cases for limiting ABC distributions 1. dT T −→ +∞; 2. dT T −→ c; 3. dT T −→ 0 and limiting ABC mean convergent for 2 T = o(1/dT ) [Frazier et al., 2016]
  • 24. ABC consistency Recent studies on large sample properties of ABC posterior distributions and ABC posterior means [Liu & Fearnhead, 2016; Frazier et al., 2016] Under regularity conditions on summary statistics, incl. convergence at speed dT , characterisation of rate of posterior concentration as a function of tolerance convergence less stringent condition on tolerance decrease than for asymptotic normality of posterior; asymptotic normality of posterior mean does not require asymptotic normality of posterior itself Cases for limiting ABC distributions 1. dT T −→ +∞; 2. dT T −→ c; 3. dT T −→ 0 and limiting ABC mean convergent for 2 T = o(1/dT ) [Frazier et al., 2016]
  • 25. Wilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π (θ, z|y) = π(θ)f (z|θ)K (y − z) π(θ)f (z|θ)K (y − z)dzdθ , with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = ˜y + ξ, ξ ∼ K , and an acceptance probability of K (y − z)/M gives draws from the posterior distribution π(θ|y).
  • 26. Wilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π (θ, z|y) = π(θ)f (z|θ)K (y − z) π(θ)f (z|θ)K (y − z)dzdθ , with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = ˜y + ξ, ξ ∼ K , and an acceptance probability of K (y − z)/M gives draws from the posterior distribution π(θ|y).
  • 27. How exact a BC? “Using to represent measurement error is straightforward, whereas using to model the model discrepancy is harder to conceptualize and not as commonly used” [Richard Wilkinson, 2008, 2013]
  • 28. How exact a BC? Pros Pseudo-data from true model and observed data from noisy model Interesting perspective in that outcome is completely controlled Link with ABCµ and assuming y is observed with a measurement error with density K Relates to the theory of model approximation [Kennedy & O’Hagan, 2001] Cons Requires K to be bounded by M True approximation error never assessed Requires a modification of the standard ABC algorithm
  • 29. Noisy ABC Idea: Modify the data from the start ˜y = y0 + ζ1 with the same scale as ABC [ see Fearnhead-Prangle ] run ABC on ˜y Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y) [Dean et al., 2011; Fearnhead and Prangle, 2012]
  • 30. Noisy ABC Idea: Modify the data from the start ˜y = y0 + ζ1 with the same scale as ABC [ see Fearnhead-Prangle ] run ABC on ˜y Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y) [Dean et al., 2011; Fearnhead and Prangle, 2012]
  • 31. Consistent noisy ABC Degrading the data improves the estimation performances: Noisy ABC-MLE is asymptotically (in n) consistent under further assumptions, the noisy ABC-MLE is asymptotically normal increase in variance of order −2 likely degradation in precision or computing time due to the lack of summary statistic [curse of dimensionality]
  • 32. Semi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics ˜η(y) = η(y) + τ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior] Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
  • 33. Semi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics ˜η(y) = η(y) + τ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior] Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
  • 34. ABC for model choice Approximate Bayesian computation ABC for model choice ABC model choice via random forests ABC estimation via random forests
  • 35. Bayesian model choice Several models M1, M2, . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm(θm) Goal is to derive the posterior distribution of M, challenging computational target when models are complex.
  • 36. Generic ABC for model choice Algorithm 2 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(y)} < Set m(t) = m and θ(t) = θm end for [Cornuet et al., DIYABC, 2009]
  • 37. ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m 1 T T t=1 Im(t)=m . Issues with implementation: should tolerances be the same for all models? should summary statistics vary across models (incl. their dimension)? should the distance measure ρ vary as well?
  • 38. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1(x) sufficient statistic for model m = 1 and parameter θ1 and η2(x) sufficient statistic for model m = 2 and parameter θ2, (η1(x), η2(x)) is not always sufficient for (m, θm) c Potential loss of information at the testing level
  • 39. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1(x) sufficient statistic for model m = 1 and parameter θ1 and η2(x) sufficient statistic for model m = 2 and parameter θ2, (η1(x), η2(x)) is not always sufficient for (m, θm) c Potential loss of information at the testing level
  • 40. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1(x) sufficient statistic for model m = 1 and parameter θ1 and η2(x) sufficient statistic for model m = 2 and parameter θ2, (η1(x), η2(x)) is not always sufficient for (m, θm) c Potential loss of information at the testing level
  • 41. Limiting behaviour of B12 (T → ∞) ABC approximation B12(y) = T t=1 Imt =1 Iρ{η(zt ),η(y)} T t=1 Imt =2 Iρ{η(zt ),η(y)} , where the (mt, zt)’s are simulated from the (joint) prior As T go to infinity, limit B12(y) = Iρ{η(z),η(y)} π1(θ1)f1(z|θ1) dz dθ1 Iρ{η(z),η(y)} π2(θ2)f2(z|θ2) dz dθ2 = Iρ{η,η(y)} π1(θ1)f η 1 (η|θ1) dη dθ1 Iρ{η,η(y)} π2(θ2)f η 2 (η|θ2) dη dθ2 , where f η 1 (η|θ1) and f η 2 (η|θ2) distributions of η(z)
  • 42. Limiting behaviour of B12 (T → ∞) ABC approximation B12(y) = T t=1 Imt =1 Iρ{η(zt ),η(y)} T t=1 Imt =2 Iρ{η(zt ),η(y)} , where the (mt, zt)’s are simulated from the (joint) prior As T go to infinity, limit B12(y) = Iρ{η(z),η(y)} π1(θ1)f1(z|θ1) dz dθ1 Iρ{η(z),η(y)} π2(θ2)f2(z|θ2) dz dθ2 = Iρ{η,η(y)} π1(θ1)f η 1 (η|θ1) dη dθ1 Iρ{η,η(y)} π2(θ2)f η 2 (η|θ2) dη dθ2 , where f η 1 (η|θ1) and f η 2 (η|θ2) distributions of η(z)
  • 43. Limiting behaviour of B12 ( → 0) When goes to zero, Bη 12(y) = π1(θ1)f η 1 (η(y)|θ1) dθ1 π2(θ2)f η 2 (η(y)|θ2) dθ2 , c Bayes factor based on the sole observation of η(y)
  • 44. Limiting behaviour of B12 ( → 0) When goes to zero, Bη 12(y) = π1(θ1)f η 1 (η(y)|θ1) dθ1 π2(θ2)f η 2 (η(y)|θ2) dθ2 , c Bayes factor based on the sole observation of η(y)
  • 45. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency
  • 46. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency
  • 47. Poisson/geometric example Sample x = (x1, . . . , xn) from either a Poisson P(λ) or from a geometric G(p) Then S = n i=1 yi = η(x) sufficient statistic for either model but not simultaneously Discrepancy ratio g1(x) g2(x) = S!n−S / i yi ! 1 n+S−1 S
  • 48. Poisson/geometric discrepancy Range of B12(x) versus Bη 12(x) B12(x): The values produced have nothing in common.
  • 49. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011]
  • 50. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011] In the Poisson/geometric case, if i xi ! is added to S, no discrepancy
  • 51. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011] Only applies in genuine sufficiency settings... c Inability to evaluate loss brought by summary statistics
  • 52. MA(q) divergence 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor equal to 17.71.
  • 53. MA(q) divergence 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21 equal to .004.
  • 54. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? Note/warnin: c drawn on T(y) through BT 12(y) necessarily differs from c drawn on y through B12(y) [Marin, Pillai, X, & Rousseau, JRSS B, 2013]
  • 55. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? Note/warnin: c drawn on T(y) through BT 12(y) necessarily differs from c drawn on y through B12(y) [Marin, Pillai, X, & Rousseau, JRSS B, 2013]
  • 56. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). Four possible statistics 1. sample mean y (sufficient for M1 if not M2); 2. sample median med(y) (insufficient); 3. sample variance var(y) (ancillary); 4. median absolute deviation mad(y) = med(|y − med(y)|);
  • 57. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). q q q q q q q q q q q Gauss Laplace 0.00.10.20.30.40.50.60.7 n=100 q q q q q q q q q q q q q q q q q q Gauss Laplace 0.00.20.40.60.81.0 n=100
  • 58. Framework Starting from sample y = (y1, . . . , yn) the observed sample, not necessarily iid with true distribution y ∼ Pn Summary statistics T(y) = Tn = (T1(y), T2(y), · · · , Td (y)) ∈ Rd with true distribution Tn ∼ Gn.
  • 59. Framework c Comparison of – under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1 – under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2 turned into – under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn ) – under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn )
  • 60. Assumptions A collection of asymptotic “standard” assumptions: [A1] is a standard central limit theorem under the true model with asymptotic mean µ0 [A2] controls the large deviations of the estimator Tn from the model mean µ(θ) [A3] is the standard prior mass condition found in Bayesian asymptotics (di effective dimension of the parameter) [A4] restricts the behaviour of the model density against the true density [Think CLT!]
  • 61. Asymptotic marginals Asymptotically, under [A1]–[A4] mi (t) = Θi gi (t|θi ) πi (θi ) dθi is such that (i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0, Cl vd−di n mi (Tn ) Cuvd−di n and (ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0 mi (Tn ) = oPn [vd−τi n + vd−αi n ].
  • 62. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value!
  • 63. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then Cl v −(d1−d2) n m1(Tn ) m2(Tn ) Cuv −(d1−d2) n , where Cl , Cu = OPn (1), irrespective of the true model. c Only depends on the difference d1 − d2: no consistency
  • 64. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Else, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then m1(Tn ) m2(Tn ) Cu min v −(d1−α2) n , v −(d1−τ2) n
  • 65. Checking for adequate statistics Run a practical check of the relevance (or non-relevance) of Tn null hypothesis that both models are compatible with the statistic Tn H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0 against H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0 testing procedure provides estimates of mean of Tn under each model and checks for equality
  • 66. Checking in practice Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn (yi,l ) and compute ^µi = 1 L L l=1 Tn (yi,l ), i = 1, 2 . Conditionally on Tn (y), √ L { ^µi − Eπ [µi (θi )|Tn (y)]} N(0, Vi ), Test for a common mean H0 : ^µ1 ∼ N(µ0, V1) , ^µ2 ∼ N(µ0, V2) against the alternative of different means H1 : ^µi ∼ N(µi , Vi ), with µ1 = µ2 .
  • 67. Toy example: Laplace versus Gauss qqqqqqqqqqqqqqq qqqqqqqqqq q qq q q Gauss Laplace Gauss Laplace 010203040 Normalised χ2 without and with mad
  • 68. ABC model choice via random forests Approximate Bayesian computation ABC for model choice ABC model choice via random forests Random forests ABC with random forests Illustrations ABC estimation via random forests
  • 69. Leaning towards machine learning Main notions: ABC-MC seen as learning about which model is most appropriate from a huge (reference) table exploiting a large number of summary statistics not an issue for machine learning methods intended to estimate efficient combinations abandoning (temporarily?) the idea of estimating posterior probabilities of the models, poorly approximated by machine learning methods, and replacing those by posterior predictive expected loss [Cornuet et al., 2016]
  • 70. Random forests Technique that stemmed from Leo Breiman’s bagging (or bootstrap aggregating) machine learning algorithm for both classification and regression [Breiman, 1996] Improved classification performances by averaging over classification schemes of randomly generated training sets, creating a “forest” of (CART) decision trees, inspired by Amit and Geman (1997) ensemble learning [Breiman, 2001]
  • 71. Growing the forest Breiman’s solution for inducing random features in the trees of the forest: boostrap resampling of the dataset and random subset-ing [of size √ t] of the covariates driving the classification at every node of each tree Covariate xτ that drives the node separation xτ cτ and the separation bound cτ chosen by minimising entropy or Gini index
  • 72. Breiman and Cutler’s algorithm Algorithm 3 Random forests for t = 1 to T do //*T is the number of trees*// Draw a bootstrap sample of size nboot Grow an unpruned decision tree for b = 1 to B do //*B is the number of nodes*// Select ntry of the predictors at random Determine the best split from among those predictors end for end for Predict new data by aggregating the predictions of the T trees [ c Tae-Kyun Kim & Bjorn Stenger, 2009]
  • 73. Subsampling Due to both large datasets [practical] and theoretical recommendation from G´erard Biau [private communication], from independence between trees to convergence issues, boostrap sample of much smaller size than original data size N = o(n) Each CART tree stops when number of observations per node is 1: no culling of the branches
  • 74. Subsampling Due to both large datasets [practical] and theoretical recommendation from G´erard Biau [private communication], from independence between trees to convergence issues, boostrap sample of much smaller size than original data size N = o(n) Each CART tree stops when number of observations per node is 1: no culling of the branches
  • 75. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi )
  • 76. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi ) at each step O( √ p) indices sampled at random and most discriminating statistic selected, by minimising entropy Gini loss
  • 77. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi ) Average of the trees is resulting summary statistics, highly non-linear predictor of the model index
  • 78. Outcome of ABC-RF Random forest predicts a (MAP) model index, from the observed dataset: The predictor provided by the forest is “sufficient” to select the most likely model but not to derive associated posterior probability exploit entire forest by computing how many trees lead to picking each of the models under comparison but variability too high to be trusted frequency of trees associated with majority model is no proper substitute to the true posterior probability And usual ABC-MC approximation equally highly variable and hard to assess
  • 79. Outcome of ABC-RF Random forest predicts a (MAP) model index, from the observed dataset: The predictor provided by the forest is “sufficient” to select the most likely model but not to derive associated posterior probability exploit entire forest by computing how many trees lead to picking each of the models under comparison but variability too high to be trusted frequency of trees associated with majority model is no proper substitute to the true posterior probability And usual ABC-MC approximation equally highly variable and hard to assess
  • 80. Posterior predictive expected losses We suggest replacing unstable approximation of P(M = m|xo) with xo observed sample and m model index, by average of the selection errors across all models given the data xo, P( ^M(X) = M|xo) where pair (M, X) generated from the predictive f (x|θ)π(θ, M|xo)dθ and ^M(x) denotes the random forest model (MAP) predictor
  • 81. Posterior predictive expected losses Arguments: Bayesian estimate of the posterior error integrates error over most likely part of the parameter space gives an averaged error rather than the posterior probability of the null hypothesis easily computed: Given ABC subsample of parameters from reference table, simulate pseudo-samples associated with those and derive error frequency
  • 82. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using first two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #1: values of p(m|x) [obtained by numerical integration] and p(m|S(x)) [obtained by mixing ABC outcome and density estimation] highly differ!
  • 83. toy: MA(1) vs. MA(2) Difference between the posterior probability of MA(2) given either x or S(x). Blue stands for data from MA(1), orange for data from MA(2)
  • 84. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #2: Embedded models, with simulations from MA(1) within those from MA(2), hence linear classification poor
  • 85. toy: MA(1) vs. MA(2) Simulations of S(x) under MA(1) (blue) and MA(2) (orange)
  • 86. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #3: On such a small dimension problem, random forests should come second to k-nn ou kernel discriminant analyses
  • 87. toy: MA(1) vs. MA(2) classification prior method error rate (in %) LDA 27.43 Logist. reg. 28.34 SVM (library e1071) 17.17 “na¨ıve” Bayes (with G marg.) 19.52 “na¨ıve” Bayes (with NP marg.) 18.25 ABC k-nn (k = 100) 17.23 ABC k-nn (k = 50) 16.97 Local log. reg. (k = 1000) 16.82 Random Forest 17.04 Kernel disc. ana. (KDA) 16.95 True MAP 12.36
  • 88. Comments unlimited aggregation of arbitrary summary statistics recovery of discriminant statistics when available automated implementation with reduced calibration self-evaluation by posterior predictive error soon to be included within DIYABC
  • 89. ABC estimation via random forests Approximate Bayesian computation ABC for model choice ABC model choice via random forests ABC estimation via random forests
  • 90. Two basic issues with ABC ABC compares numerous simulated dataset to the observed one Two major difficulties: to decrease approximation error (or tolerance ) and hence ensure reliability of ABC, total number of simulations very large; calibration of ABC (tolerance, distance, summary statistics, post-processing, &tc) critical and hard to automatise
  • 91. classification of summaries by random forests Given a large collection of summary statistics, rather than selecting a subset and excluding the others, estimate each parameter of interest by a machine learning tool like random forests RF can handle thousands of predictors ignore useless components fast estimation method with good local properties automatised method with few calibration steps substitute to Fearnhead and Prangle (2012) preliminary estimation of ^θ(yobs) includes a natural (classification) distance measure that avoids choice of both distance and tolerance [Marin et al., 2016]
  • 92. random forests as non-parametric regression CART means Classification and Regression Trees For regression purposes, i.e., to predict y as f (x), similar binary trees in random forests 1. at each tree node, split data into two daughter nodes 2. split variable and bound chosen to minimise heterogeneity criterion 3. stop splitting when enough homogeneity in current branch 4. predicted values at terminal nodes (or leaves) are average response variable y for all observations in final leaf
  • 93. Illustration conditional expectation f (x) and well-specified dataset
  • 95. Illustration ten regression trees obtained by bagging (Bootstrap AGGregatING)
  • 96. Illustration average of 100 regression trees
  • 97. bagging reduces learning variance When growing forest with many trees, grow each tree on an independent bootstrap sample at each node, select m variables at random out of all M possible variables Find the best dichotomous split on the selected m variables predictor function estimated by averaging trees Improve on CART with respect to accuracy and stability
  • 98. bagging reduces learning variance When growing forest with many trees, grow each tree on an independent bootstrap sample at each node, select m variables at random out of all M possible variables Find the best dichotomous split on the selected m variables predictor function estimated by averaging trees Improve on CART with respect to accuracy and stability
  • 99. prediction error A given simulation (ysim, xsim) in the training table is not used in about 1/3 of the trees (“out-of-bag” case) Average predictions ^Foob(xsim) of these trees to give out-of-bag predictor of ysim
  • 100. Related methods adjusted local linear: Beaumont et al. (2002) Approximate Bayesian computation in population genetics, Genetics ridge regression: Blum et al. (2013) A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation, Statistical Science linear discriminant analysis: Estoup et al. (2012) Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics, Molecular Ecology Resources adjusted neural networks: Blum and Fran¸cois (2010) Non-linear regression models for Approximate Bayesian Computation, Statistics and Computing
  • 101. ABC parameter estimation (ODOF) One dimension = one forest (ODOF) methodology parametric statistical model: {f (y; θ): y ∈ Y, θ ∈ Θ}, Y ⊆ Rn , Θ ⊆ Rp with intractable density f (·; θ) plus prior distribution π(θ) Inference on quantity of interest ψ(θ) ∈ R (posterior means, variances, quantiles or covariances)
  • 102. ABC parameter estimation (ODOF) One dimension = one forest (ODOF) methodology parametric statistical model: {f (y; θ): y ∈ Y, θ ∈ Θ}, Y ⊆ Rn , Θ ⊆ Rp with intractable density f (·; θ) plus prior distribution π(θ) Inference on quantity of interest ψ(θ) ∈ R (posterior means, variances, quantiles or covariances)
  • 103. common reference table Given η: Y → Rk a collection of summary statistics produce reference table (RT) used as learning dataset for multiple random forests meaning, for 1 t N 1. simulate θ(t) ∼ π(θ) 2. simulate ˜yt = (˜y1,t, . . . , ˜yn,t) ∼ f (y; θ(t) ) 3. compute η(˜yt) = {η1(˜yt), . . . , ηk (˜yt)}
  • 104. ABC posterior expectations Recall that θ = (θ1, . . . , θd ) ∈ Rd For each θj , construct a separate RF regression with predictors variables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)} If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗) —leaf reached through path of binary choices in tree—, with |Lb| response variables E(θj | η(y∗)) = 1 B B b=1 1 |Lb(η(y∗))| t:η(yt )∈Lb(η(y∗)) θ (t) j is our ABC estimate
  • 105. ABC posterior expectations For each θj , construct a separate RF regression with predictors variables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)} If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗) —leaf reached through path of binary choices in tree—, with |Lb| response variables E(θj | η(y∗)) = 1 B B b=1 1 |Lb(η(y∗))| t:η(yt )∈Lb(η(y∗)) θ (t) j is our ABC estimate
  • 106. ABC posterior quantile estimate Random forests also available for quantile regression [Meinshausen, 2006, JMLR] Since ^E(θj | η(y∗ )) = N t=1 wt(η(y∗ ))θ (t) j with wt(η(y∗ )) = 1 B B b=1 ILb(η(y∗))(η(yt)) |Lb(η(y∗))| natural estimate of the cdf of θj is ^F(u | η(y∗ )) = N t=1 wt(η(y∗ ))I{θ (t) j u} .
  • 107. ABC posterior quantile estimate Since ^E(θj | η(y∗ )) = N t=1 wt(η(y∗ ))θ (t) j with wt(η(y∗ )) = 1 B B b=1 ILb(η(y∗))(η(yt)) |Lb(η(y∗))| natural estimate of the cdf of θj is ^F(u | η(y∗ )) = N t=1 wt(η(y∗ ))I{θ (t) j u} . ABC posterior quantiles + credible intervals given by ^F−1
  • 108. ABC variances Even though approximation of Var(θj | η(y∗)) available based on ^F, choice of alternative and slightly more involved version In a given tree b in a random forest, existence of out-of-baf entries, i.e., not sampled in associated bootstrap subsample Use of out-of-bag simulations to produce estimate of E{θj | η(yt)}, ˜θj (t) , Apply weights ωt(η(y∗)) to out-of-bag residuals: Var(θj | η(y∗ )) = N t=1 ωt(η(y∗ )) (θ (t) j − ˜θj (t) 2
  • 109. ABC variances Even though approximation of Var(θj | η(y∗)) available based on ^F, choice of alternative and slightly more involved version In a given tree b in a random forest, existence of out-of-baf entries, i.e., not sampled in associated bootstrap subsample Use of out-of-bag simulations to produce estimate of E{θj | η(yt)}, ˜θj (t) , Apply weights ωt(η(y∗)) to out-of-bag residuals: Var(θj | η(y∗ )) = N t=1 ωt(η(y∗ )) (θ (t) j − ˜θj (t) 2
  • 110. ABC covariances For estimating Cov(θj , θ | η(y∗)), construction of a specific random forest product of out-of-bag errors for θj and θ θ (t) j − ˜θj (t) θ (t) − ˜θ (t) with again predictors variables the summary statistics η(y) = {η1(y), . . . , ηk(y)}
  • 111. Gaussian toy example Take (y1, . . . , yn) | θ1, θ2 ∼iid N(θ1, θ2), n = 10 θ1 | θ2 ∼ N(0, θ2) θ2 ∼ IG(4, 3) θ1 | y ∼ T n + 8, (n¯y)/(n + 1), (s2 + 6)/((n + 1)(n + 8)) θ2 | y ∼ IG n/2 + 4, s2 /2 + 3 Closed-form theoretical values like ψ1(y) = E(θ1 | y), ψ2(y) = E(θ2 | y), ψ3(y) = Var(θ1 | y) and ψ4(y) = Var(θ2 | y)
  • 112. Gaussian toy example Reference table of N = 10, 000 Gaussian replicates Independent Gaussian test set of size Npred = 100 k = 53 summary statistics: the sample mean, the sample variance and the sample median absolute deviation, and 50 independent pure-noise variables (uniform [0,1])
  • 113. Gaussian toy example −2 −1 0 1 2 −2−1012 ψ1 ψ~ 1 0.5 1.0 1.5 2.0 2.5 0.51.01.52.02.5 ψ2 ψ~ 2 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.050.150.250.35 ψ3 ψ~ 3 0.0 0.2 0.4 0.6 0.80.00.20.40.60.8 ψ4 ψ~ 4 Scatterplot of the theoretical values with their corresponding estimates
  • 114. Gaussian toy example −4 −3 −2 −1 0 1 2 −4−3−2−1012 Q0.025(θ1 | y) Q ~ 0.025(θ1|y) −1 0 1 2 3 4 −101234 Q0.975(θ1 | y) Q ~ 0.975(θ1|y) 0.2 0.4 0.6 0.8 1.0 1.2 0.20.40.60.81.01.2 Q0.025(θ2 | y) Q ~ 0.025(θ2|y) 1 2 3 4 512345 Q0.975(θ2 | y) Q ~ 0.975(θ2|y) Scatterplot of the theoretical values of 2.5% and 97.5% posterior quantiles for θ1 and θ2 with their corresponding estimates
  • 115. Gaussian toy example ODOF adj local linear adj ridge adj neural net ψ1(y) = E(θ1 | y) 0.21 0.42 0.38 0.42 ψ2(y) = E(θ2 | y) 0.11 0.20 0.26 0.22 ψ3(y) = Var(θ1 | y) 0.47 0.66 0.75 0.48 ψ4(y) = Var(θ2 | y) 0.46 0.85 0.73 0.98 Q0.025(θ1|y) 0.69 0.55 0.78 0.53 Q0.025(θ2|y) 0.06 0.45 0.68 1.02 Q0.975(θ1|y) 0.48 0.55 0.79 0.50 Q0.975(θ2|y) 0.18 0.23 0.23 0.38 Comparison of normalized mean absolute errors
  • 116. Gaussian toy example True ODOF loc linear ridge Neural net 0.00.10.20.30.40.5 Var ~ (θ1|y) True ODOF loc linear ridge Neural net 0.00.20.40.60.81.0 Var ~ (θ2|y) Boxplot comparison of Var(θ1 | y), Var(θ2 | y) with the true values, ODOF and usual ABC methods
  • 117. Comments ABC RF methods mostly insensitive both to strong correlations between the summary statistics and to the presence of noisy variables. implies less number of simulations and no calibration Next steps: adaptive schemes, deep learning, inclusion in DIYABC