SlideShare une entreprise Scribd logo
1  sur  89
Télécharger pour lire hors ligne
Reliable Approximate Bayesian computation
(ABC) model choice via random forests
Christian P. Robert
Université Paris-Dauphine, Paris & University of Warwick, Coventry
NBBC15, Reykjavik, Ísland
bayesianstatistics@gmail.com
Joint with J.-M. Cornuet, A. Estoup, J.-M. Marin, & P. Pudlo
The next MCMSkv meeting:
Computational Bayes section of
ISBA major meeting:
MCMSki V in Lenzerheide,
Switzerland, Jan. 5-7, 2016
MCMC, pMCMC, SMC2
, HMC,
ABC, (ultra-) high-dimensional
computation, BNP, QMC, deep
learning, &tc
Plenary speakers: S. Scott, S.
Fienberg, D. Dunson, K.
Latuszynski, T. Lelièvre
Call for contributed 9 sessions
and tutorials opened
“Switzerland in January, where
else...?!"
Outline
Intractable likelihoods
ABC methods
ABC for model choice
ABC model choice via random forests
Intractable likelihood
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is (really!) not available in closed form
cannot (easily!) be either completed or demarginalised
cannot be (at all!) estimated by an unbiased estimator
examples of latent variable models of high dimension, including
combinatorial structures (trees, graphs), missing constant
f (x|θ) = g(y, θ) Z(θ) (eg. Markov random fields, exponential
graphs,. . . )
c Prohibits direct implementation of a generic MCMC algorithm
like Metropolis–Hastings which gets stuck exploring missing
structures
Intractable likelihood
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is (really!) not available in closed form
cannot (easily!) be either completed or demarginalised
cannot be (at all!) estimated by an unbiased estimator
c Prohibits direct implementation of a generic MCMC algorithm
like Metropolis–Hastings which gets stuck exploring missing
structures
Getting approximative
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach
Empirical A to the original B problem
Degrading the data precision down to tolerance level ε
Replacing the likelihood with a non-parametric approximation
Summarising/replacing the data with insufficient statistics
Getting approximative
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach
Empirical A to the original B problem
Degrading the data precision down to tolerance level ε
Replacing the likelihood with a non-parametric approximation
Summarising/replacing the data with insufficient statistics
Getting approximative
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach
Empirical A to the original B problem
Degrading the data precision down to tolerance level ε
Replacing the likelihood with a non-parametric approximation
Summarising/replacing the data with insufficient statistics
Approximate Bayesian computation
Intractable likelihoods
ABC methods
Genesis of ABC
abc of ABC
Summary statistic
ABC for model choice
ABC model choice via random
forests
Genetic background of ABC
skip genetics
ABC is a recent computational technique that only requires being
able to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still significantly contribute
to methodological developments of ABC.
[Griffith & al., 1997; Tavaré & al., 1999]
Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we cannot calculate the likelihood of the
polymorphism data f (y|θ)...
Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we cannot calculate the likelihood of the
polymorphism data f (y|θ)...
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Observations: leafs of the tree
^θ = ?
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
Instance of ecological questions [message in a beetle]
How did the Asian Ladybird
beetle arrive in Europe?
Why do they swarm right
now?
What are the routes of
invasion?
How to get rid of them?
[Lombaert & al., 2010, PLoS ONE]
beetles in forests
Worldwide invasion routes of Harmonia Axyridis
[Estoup et al., 2012, Molecular Ecology Res.]
c Intractable likelihood
Missing (too much missing!) data structure:
f (y|θ) =
G
f (y|G, θ)f (G|θ)dG
cannot be computed in a manageable way...
[Stephens & Donnelly, 2000]
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
c Intractable likelihood
Missing (too much missing!) data structure:
f (y|θ) =
G
f (y|G, θ)f (G|θ)dG
cannot be computed in a manageable way...
[Stephens & Donnelly, 2000]
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
A?B?C?
A stands for approximate
[wrong likelihood /
picture]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Tavaré et al., 1997]
ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Tavaré et al., 1997]
A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < }
def
∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < }
def
∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
ABC recap
Likelihood free rejection
sampling
Tavaré et al. (1997) Genetics
1) Set i = 1,
2) Generate θ from the
prior distribution
π(·),
3) Generate z from the
likelihood f (·|θ ),
4) If ρ(η(z ), η(y)) ,
set (θi , zi ) = (θ , z ) and
i = i + 1,
5) If i N, return to
2).
We keep the θ’s values such
that the distance between the
corresponding simulated
dataset and the observed
dataset is small enough.
Tuning parameters
> 0: tolerance level,
η(z): function that
summarizes datasets,
ρ(η, η ): distance
between vectors of
summary statistics
N: size of the output
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
restricted posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
Not so good..!
Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Loss of statistical information balanced against gain in data
roughening
Approximation error and information loss remain unknown
Choice of statistics induces choice of distance function towards
standardisation
may be imposed for external/practical reasons (e.g., DIYABC)
may gather several non-B point estimates [the more the
merrier]
can [machine-]learn about efficient combination
Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Loss of statistical information balanced against gain in data
roughening
Approximation error and information loss remain unknown
Choice of statistics induces choice of distance function towards
standardisation
may be imposed for external/practical reasons (e.g., DIYABC)
may gather several non-B point estimates [the more the
merrier]
can [machine-]learn about efficient combination
Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Loss of statistical information balanced against gain in data
roughening
Approximation error and information loss remain unknown
Choice of statistics induces choice of distance function towards
standardisation
may be imposed for external/practical reasons (e.g., DIYABC)
may gather several non-B point estimates [the more the
merrier]
can [machine-]learn about efficient combination
attempts at summaries
How to choose the set of summary statistics?
Joyce and Marjoram (2008, SAGMB)
Nunes and Balding (2010, SAGMB)
Fearnhead and Prangle (2012, JRSS B)
Ratmann et al. (2012, PLOS Comput. Biol)
Blum et al. (2013, Statistical science)
EP-ABC of Barthelmé & Chopin (2013, JASA)
LDA selection of Estoup & al. (2012, Mol. Ecol. Res.)
LDA advantages
much faster computation of scenario probabilities via
polychotomous regression
a (much) lower number of explanatory variables improves the
accuracy of the ABC approximation, reduces the tolerance
and avoids extra costs in constructing the reference table
allows for a large collection of initial summaries
ability to evaluate Type I and Type II errors on more complex
models
LDA reduces correlation among explanatory variables
When available, using both simulated and real data sets, posterior
probabilities of scenarios computed from LDA-transformed and raw
summaries are strongly correlated
LDA advantages
much faster computation of scenario probabilities via
polychotomous regression
a (much) lower number of explanatory variables improves the
accuracy of the ABC approximation, reduces the tolerance
and avoids extra costs in constructing the reference table
allows for a large collection of initial summaries
ability to evaluate Type I and Type II errors on more complex
models
LDA reduces correlation among explanatory variables
When available, using both simulated and real data sets, posterior
probabilities of scenarios computed from LDA-transformed and raw
summaries are strongly correlated
ABC for model choice
Intractable likelihoods
ABC methods
ABC for model choice
ABC model choice via random forests
ABC model choice
ABC model choice
A) Generate large set of
(m, θ, z) from the
Bayesian predictive,
π(m)πm(θ)fm(z|θ)
B) Keep particles (m, θ, z)
such that
ρ(η(y), η(z))
C) For each m, return
pm = proportion of m
among remaining
particles
If ε tuned towards k resulting
particles, then pm k-nearest
neighbor estimate of
P {M = m η(y)
Approximating posterior
prob’s of models = regression
problem where
response is 1 M = m ,
covariates are summary
statistics η(z),
loss is, e.g., L2
Method of choice in DIYABC
is local polytomous logistic
regression
Machine learning perspective [paradigm shift]
ABC model choice
A) Generate a large set
of (m, θ, z)’s from
Bayesian predictive,
π(m)πm(θ)fm(z|θ)
B) Use machine learning
tech. to infer on
π(m η(y))
In this perspective:
(iid) “data set” reference
table simulated during
stage A)
observed y becomes a
new data point
Note that:
predicting m is a
classification problem
⇐⇒ select the best
model based on a
maximal a posteriori rule
computing π(m|η(y)) is
a regression problem
⇐⇒ confidence in each
model
classification is much simpler
than regression (e.g., dim. of
objects we try to learn)
Warning
the lost of information induced by using non sufficient
summary statistics is a genuine problem
Fundamental discrepancy between the genuine Bayes
factors/posterior probabilities and the Bayes factors based on
summary statistics. See, e.g.,
Didelot et al. (2011, Bayesian analysis)
X et al. (2011, PNAS)
Marin et al. (2014, JRSS B)
. . .
Call instead for machine learning approach able to handle with a
large number of correlated summary statistics:
random forests well suited for that task
ABC model choice via random forests
Intractable likelihoods
ABC methods
ABC for model choice
ABC model choice via random forests
Random forests
ABC with random forests
Illustrations
Leaning towards machine learning
Main notions:
ABC-MC seen as learning about which model is most
appropriate from a huge (reference) table
exploiting a large number of summary statistics not an issue
for machine learning methods intended to estimate efficient
combinations
abandoning (temporarily?) the idea of estimating posterior
probabilities of the models, poorly approximated by machine
learning methods, and replacing those by posterior predictive
expected loss
Random forests
Technique that stemmed from Leo Breiman’s bagging (or bootstrap
aggregating) machine learning algorithm for both classification and
regression
[Breiman, 1996]
Improved classification performances by averaging over
classification schemes of randomly generated training sets, creating
a “forest" of (CART) decision trees, inspired by Amit and Geman
(1997) ensemble learning
[Breiman, 2001]
CART construction
Basic classification tree:
Algorithm 1 CART
start the tree with a single root
repeat
pick a non-homogeneous tip v such that Q(v) = 1
attach to v two daughter nodes v1 and v2
for all covariates Xj do
find the threshold tj in the rule Xj < tj that minimizes N(v1)Q(v1) +
N(v2)Q(v2)
end for
find the rule Xj < tj that minimizes N(v1)Q(v1) + N(v2)Q(v2) in j and set
this best rule to node v
until all tips v are homogeneous (Q(v) = 0)
set the labels of all tips
where Q is Gini’s index
Q(vi ) =
M
y=1
^p(vi , y) {1 − ^p(v,y)} .
Growing the forest
Breiman’s solution for inducing random features in the trees of the
forest:
boostrap resampling of the dataset and
random subset-ing [of size
√
t] of the covariates driving the
classification at every node of each tree
Covariate xτ that drives the node separation
xτ cτ
and the separation bound cτ chosen by minimising entropy or Gini
index
Breiman and Cutler’s algorithm
Algorithm 2 Random forests
for t = 1 to T do
//*T is the number of trees*//
Draw a bootstrap sample of size nboot = n
Grow an unpruned decision tree
for b = 1 to B do
//*B is the number of nodes*//
Select ntry of the predictors at random
Determine the best split from among those predictors
end for
end for
Predict new data by aggregating the predictions of the T trees
Subsampling
Due to both large datasets [practical] and theoretical
recommendation of Scornet et al. (2014), from independence
between trees to convergence issues, boostrap sample of much
smaller size than original data size
nboot = o(n)
Each CART tree stops when number of observations per node is 1:
no culling of the branches
Subsampling
Due to both large datasets [practical] and theoretical
recommendation of Scornet et al. (2014), from independence
between trees to convergence issues, boostrap sample of much
smaller size than original data size
nboot = o(n)
Each CART tree stops when number of observations per node is 1:
no culling of the branches
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives, to pure noise)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives, to pure noise)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
at each step O(
√
p) indices sampled at random and most
discriminating statistic selected, by minimising entropy Gini loss
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (s1i , . . . , spi )
(from scientific theory input to available statistical softwares,
to machine-learning alternatives, to pure noise)
ABC reference table involving model index, parameter values
and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
Average of the trees is resulting summary statistics, highly
non-linear predictor of the model index
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observed
dataset: The predictor provided by the forest is “sufficient" to
select the most likely model but not to derive associated posterior
probability
exploit entire forest by computing how many trees lead to
picking each of the models under comparison but variability
too high to be trusted
frequency of trees associated with majority model is no proper
substitute to the true posterior probability
usual ABC-MC approximation equally highly variable and hard
to assess
random forests define a natural distance for ABC sample via
agreement frequency
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observed
dataset: The predictor provided by the forest is “sufficient" to
select the most likely model but not to derive associated posterior
probability
exploit entire forest by computing how many trees lead to
picking each of the models under comparison but variability
too high to be trusted
frequency of trees associated with majority model is no proper
substitute to the true posterior probability
usual ABC-MC approximation equally highly variable and hard
to assess
random forests define a natural distance for ABC sample via
agreement frequency
Posterior predictive expected losses
We suggest replacing unstable approximation of
P(M = m|xo)
with xo observed sample and m model index, by average of the
selection errors across all models given the data xo,
P( ^M(X) = M|xo)
where pair (M, X) generated from the predictive
f (x|θ)π(θ, M|xo)dθ
and ^M(x) denotes the random forest model (MAP) predictor
Posterior predictive expected losses
Arguments:
Bayesian estimate of the posterior error
integrates error over most likely part of the parameter space
gives an averaged error rather than the posterior probability of
the null hypothesis
easily computed: Given ABC subsample of parameters from
reference table, simulate pseudo-samples associated with those
and derive error frequency
Algorithmic implementation
Algorithm 3 Computation of the posterior error
(a) Use the trained RF to compute proximity between each
(m, θ, S(x)) of the reference table and s0 = S(x0)
(b) Select the k simulations with the highest proximity to s0
(c) For each (m, θ) in the latter set, compute Npp new simulations
S(y) from f (y|θ, m)
(d) Return the frequency of erroneous RF predictions over these
k × Npp simulations
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using first two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #1: values of p(m|x) [obtained by numerical integration]
and p(m|S(x)) [obtained by mixing ABC outcome and density
estimation] highly differ!
toy: MA(1) vs. MA(2)
Difference between the posterior probability of MA(2) given either
x or S(x). Blue stands for data from MA(1), orange for data from
MA(2)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #2: Embedded models, with simulations from MA(1)
within those from MA(2), hence linear classification poor
toy: MA(1) vs. MA(2)
Simulations of S(x) under MA(1) (blue) and MA(2) (orange)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = t − ϑ1 t−1[−ϑ2 t−2]
Earlier illustration using two autocorrelations as S(x)
[Marin et al., Stat. & Comp., 2011]
Result #3: On such a small dimension problem, random forests
should come second to k-nn ou kernel discriminant analyses
toy: MA(1) vs. MA(2)
classification prior
method error rate (in %)
LDA 27.43
Logist. reg. 28.34
SVM (library e1071) 17.17
“naïve” Bayes (with G marg.) 19.52
“naïve” Bayes (with NP marg.) 18.25
ABC k-nn (k = 100) 17.23
ABC k-nn (k = 50) 16.97
Local log. reg. (k = 1000) 16.82
Random Forest 17.04
Kernel disc. ana. (KDA) 16.95
True MAP 12.36
Evolution scenarios based on SNPs
Three scenarios for the evolution of three populations from their
most common ancestor
Evolution scenarios based on SNPs
DIYBAC header (!)
7 parameters and 48 summary statistics
3 scenarios: 7 7 7
scenario 1 [0.33333] (6)
N1 N2 N3
0 sample 1
0 sample 2
0 sample 3
ta merge 1 3
ts merge 1 2
ts varne 1 N4
scenario 2 [0.33333] (6)
N1 N2 N3
..........
ts varne 1 N4
scenario 3 [0.33333] (7)
N1 N2 N3
........
historical parameters priors (7,1)
N1 N UN[100.0,30000.0,0.0,0.0]
N2 N UN[100.0,30000.0,0.0,0.0]
N3 N UN[100.0,30000.0,0.0,0.0]
ta T UN[10.0,30000.0,0.0,0.0]
ts T UN[10.0,30000.0,0.0,0.0]
N4 N UN[100.0,30000.0,0.0,0.0]
r A UN[0.05,0.95,0.0,0.0]
ts>ta
DRAW UNTIL
Evolution scenarios based on SNPs
Model 1 with 6 parameters:
four effective sample sizes: N1 for population 1, N2 for
population 2, N3 for population 3 and, finally, N4 for the
native population;
the time of divergence ta between populations 1 and 3;
the time of divergence ts between populations 1 and 2.
effective sample sizes with independent uniform priors on
[100, 30000]
vector of divergence times (ta, ts) with uniform prior on
{(a, s) ∈ [10, 30000] ⊗ [10, 30000]|a < s}
Evolution scenarios based on SNPs
Model 2 with same parameters as model 1 but the divergence time
ta corresponds to a divergence between populations 2 and 3; prior
distributions identical to those of model 1
Model 3 with extra seventh parameter, admixture rate r. For that
scenario, at time ta admixture between populations 1 and 2 from
which population 3 emerges. Prior distribution on r uniform on
[0.05, 0.95]. In that case models 1 and 2 are not embeddeded in
model 3. Prior distributions for other parameters the same as in
model 1
Evolution scenarios based on SNPs
Set of 48 summary statistics:
Single sample statistics
proportion of loci with null gene diversity (= proportion of monomorphic
loci)
mean gene diversity across polymorphic loci
[Nei, 1987]
variance of gene diversity across polymorphic loci
mean gene diversity across all loci
Evolution scenarios based on SNPs
Set of 48 summary statistics:
Two sample statistics
proportion of loci with null FST distance between both samples
[Weir and Cockerham, 1984]
mean across loci of non null FST distances between both samples
variance across loci of non null FST distances between both samples
mean across loci of FST distances between both samples
proportion of 1 loci with null Nei’s distance between both samples
[Nei, 1972]
mean across loci of non null Nei’s distances between both samples
variance across loci of non null Nei’s distances between both samples
mean across loci of Nei’s distances between the two samples
Evolution scenarios based on SNPs
Set of 48 summary statistics:
Three sample statistics
proportion of loci with null admixture estimate
mean across loci of non null admixture estimate
variance across loci of non null admixture estimated
mean across all locus admixture estimates
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individuals per
population, learning ABC reference table with 20, 000 simulations,
prior predictive error rates:
“naïve Bayes” classifier 33.3%
raw LDA classifier 23.27%
ABC k-nn [Euclidean dist. on summaries normalised by MAD]
25.93%
ABC k-nn [unnormalised Euclidean dist. on LDA components]
22.12%
local logistic classifier based on LDA components with
k = 500 neighbours 22.61%
random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individuals per
population, learning ABC reference table with 20, 000 simulations,
prior predictive error rates:
“naïve Bayes” classifier 33.3%
raw LDA classifier 23.27%
ABC k-nn [Euclidean dist. on summaries normalised by MAD]
25.93%
ABC k-nn [unnormalised Euclidean dist. on LDA components]
22.12%
local logistic classifier based on LDA components with
k = 1000 neighbours 22.46%
random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individuals per
population, learning ABC reference table with 20, 000 simulations,
prior predictive error rates:
“naïve Bayes” classifier 33.3%
raw LDA classifier 23.27%
ABC k-nn [Euclidean dist. on summaries normalised by MAD]
25.93%
ABC k-nn [unnormalised Euclidean dist. on LDA components]
22.12%
local logistic classifier based on LDA components with
k = 5000 neighbours 22.43%
random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individuals per
population, learning ABC reference table with 20, 000 simulations,
prior predictive error rates:
“naïve Bayes” classifier 33.3%
raw LDA classifier 23.27%
ABC k-nn [Euclidean dist. on summaries normalised by MAD]
25.93%
ABC k-nn [unnormalised Euclidean dist. on LDA components]
22.12%
local logistic classifier based on LDA components with
k = 5000 neighbours 22.43%
random forest on LDA components only 23.1%
(Error rates computed on a prior sample of size 104)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individuals per
population, learning ABC reference table with 20, 000 simulations,
prior predictive error rates:
“naïve Bayes” classifier 33.3%
raw LDA classifier 23.27%
ABC k-nn [Euclidean dist. on summaries normalised by MAD]
25.93%
ABC k-nn [unnormalised Euclidean dist. on LDA components]
22.12%
local logistic classifier based on LDA components with
k = 5000 neighbours 22.43%
random forest on summaries and LDA components 19.03%
(Error rates computed on a prior sample of size 104)
Evolution scenarios based on SNPs
Posterior predictive error rates
Evolution scenarios based on SNPs
Posterior predictive error rates
favourable: 0.010 error – unfavourable: 0.104 error
Evolution scenarios based on microsatellites
Same setting as previously
Sample of 25 diploid individuals per population, on 20 locus
(roughly corresponds to 1/5th of previous information)
Evolution scenarios based on microsatellites
One sample statistics
mean number of alleles across loci
mean gene diversity across loci (Nei, 1987)
mean allele size variance across loci
mean M index across loci (Garza and Williamson, 2001;
Excoffier et al., 2005)
Evolution scenarios based on microsatellites
Two sample statistics
mean number of alleles across loci (two samples)
mean gene diversity across loci (two samples)
mean allele size variance across loci (two samples)
FST between two samples (Weir and Cockerham, 1984)
mean index of classification (two samples) (Rannala and
Moutain, 1997; Pascual et al., 2007)
shared allele distance between two samples (Chakraborty and
Jin, 1993)
(δµ)2 distance between two samples (Golstein et al., 1995)
Three sample statistics
Maximum likelihood coefficient of admixture (Choisy et al.,
2004)
Evolution scenarios based on microsatellites
classification prior error∗
method rate (in %)
raw LDA 35.64
“naïve” Bayes (with G marginals) 40.02
k-nn (MAD normalised sum stat) 37.47
k-nn (unormalised LDA) 35.14
RF without LDA components 35.14
RF with LDA components 33.62
RF with only LDA components 37.25
∗
estimated on pseudo-samples of 104
items drawn from the prior
Evolution scenarios based on microsatellites
Posterior predictive error rates
Evolution scenarios based on microsatellites
Posterior predictive error rates
favourable: 0.183 error – unfavourable: 0.435 error
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
classification prior error†
method rate (in %)
raw LDA 38.94
“naïve” Bayes (with G margins) 54.02
k-nn (MAD normalised sum stat) 58.47
RF without LDA components 38.84
RF with LDA components 35.32
†
estimated on pseudo-samples of 104
items drawn from the prior
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
Random forest allocation frequencies
1 2 3 4 5 6 7 8 9 10
0.168 0.1 0.008 0.066 0.296 0.016 0.092 0.04 0.014 0.2
Posterior predictive error based on 20,000 prior simulations and
keeping 500 neighbours (or 100 neighbours and 10 pseudo-datasets
per parameter)
0.3682
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion
0 500 1000 1500 2000
0.400.450.500.550.600.65
k
error
posterior predictive error 0.368
Back to Asian Ladybirds [message in a beetle]
Harlequin ladybird data: estimated prior error rates for various classification
methods and sizes of reference table.
Classification method Prior error rates (%)
trained on Nref = 10, 000 Nref = 20, 000 Nref = 50, 000
linear discriminant analysis (LDA) 39.91 39.30 39.04
standard ABC (knn) on DIYABC summaries 57.46 53.76 51.03
standard ABC (knn) on LDA axes 39.18 38.46 37.91
local logistic regression on LDA axes 41.04 37.08 36.05
random forest (RF) on DIYABC summaries 40.18 38.94 37.63
RF on DIYABC summaries and LDA axes 36.86 35.62 34.44
Conclusion
Key ideas
π m η(y) = π m y
Rather than approximating
π m η(y) , focus on
selecting the best model
(classif. vs regression)
Assess confidence in the
selection with posterior
predictive expected losses
Consequences on
ABC-PopGen
Often, RF k-NN (less
sensible to high correlation
in summaries)
RF requires many less prior
simulations
RF selects automatically
relevent summaries
Hence can handle much
more complex models
Conclusion
Key ideas
π m η(y) = π m y
Use a seasoned machine
learning technique
selecting from ABC
simulations: minimise 0-1
loss mimics MAP
Assess confidence in the
selection with posterior
predictive expected losses
Consequences on
ABC-PopGen
Often, RF k-NN (less
sensible to high correlation
in summaries)
RF requires many less prior
simulations
RF selects automatically
relevent summaries
Hence can handle much
more complex models
Þakka þér fyrir

Contenu connexe

Tendances

Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chaptersChristian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 

Tendances (20)

Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Similaire à NBBC15, Reyjavik, June 08, 2015

3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statisticsPierre Pudlo
 
ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical LkdDeb Roy
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10FredrikRonquist
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Christian Robert
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
The Intersection Axiom of Conditional Probability
The Intersection Axiom of Conditional ProbabilityThe Intersection Axiom of Conditional Probability
The Intersection Axiom of Conditional ProbabilityRichard Gill
 

Similaire à NBBC15, Reyjavik, June 08, 2015 (20)

3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical Lkd
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
Bayesian computation with INLA
Bayesian computation with INLABayesian computation with INLA
Bayesian computation with INLA
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Lecture11 xing
Lecture11 xingLecture11 xing
Lecture11 xing
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
The Intersection Axiom of Conditional Probability
The Intersection Axiom of Conditional ProbabilityThe Intersection Axiom of Conditional Probability
The Intersection Axiom of Conditional Probability
 

Plus de Christian Robert

How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 

Plus de Christian Robert (20)

discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 

Dernier

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 

Dernier (20)

Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 

NBBC15, Reyjavik, June 08, 2015

  • 1. Reliable Approximate Bayesian computation (ABC) model choice via random forests Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry NBBC15, Reykjavik, Ísland bayesianstatistics@gmail.com Joint with J.-M. Cornuet, A. Estoup, J.-M. Marin, & P. Pudlo
  • 2. The next MCMSkv meeting: Computational Bayes section of ISBA major meeting: MCMSki V in Lenzerheide, Switzerland, Jan. 5-7, 2016 MCMC, pMCMC, SMC2 , HMC, ABC, (ultra-) high-dimensional computation, BNP, QMC, deep learning, &tc Plenary speakers: S. Scott, S. Fienberg, D. Dunson, K. Latuszynski, T. Lelièvre Call for contributed 9 sessions and tutorials opened “Switzerland in January, where else...?!"
  • 3. Outline Intractable likelihoods ABC methods ABC for model choice ABC model choice via random forests
  • 4. Intractable likelihood Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1, . . . , yn|θ) is (really!) not available in closed form cannot (easily!) be either completed or demarginalised cannot be (at all!) estimated by an unbiased estimator examples of latent variable models of high dimension, including combinatorial structures (trees, graphs), missing constant f (x|θ) = g(y, θ) Z(θ) (eg. Markov random fields, exponential graphs,. . . ) c Prohibits direct implementation of a generic MCMC algorithm like Metropolis–Hastings which gets stuck exploring missing structures
  • 5. Intractable likelihood Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1, . . . , yn|θ) is (really!) not available in closed form cannot (easily!) be either completed or demarginalised cannot be (at all!) estimated by an unbiased estimator c Prohibits direct implementation of a generic MCMC algorithm like Metropolis–Hastings which gets stuck exploring missing structures
  • 6. Getting approximative Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1, . . . , yn|θ) is out of reach Empirical A to the original B problem Degrading the data precision down to tolerance level ε Replacing the likelihood with a non-parametric approximation Summarising/replacing the data with insufficient statistics
  • 7. Getting approximative Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1, . . . , yn|θ) is out of reach Empirical A to the original B problem Degrading the data precision down to tolerance level ε Replacing the likelihood with a non-parametric approximation Summarising/replacing the data with insufficient statistics
  • 8. Getting approximative Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1, . . . , yn|θ) is out of reach Empirical A to the original B problem Degrading the data precision down to tolerance level ε Replacing the likelihood with a non-parametric approximation Summarising/replacing the data with insufficient statistics
  • 9. Approximate Bayesian computation Intractable likelihoods ABC methods Genesis of ABC abc of ABC Summary statistic ABC for model choice ABC model choice via random forests
  • 10. Genetic background of ABC skip genetics ABC is a recent computational technique that only requires being able to sample from the likelihood f (·|θ) This technique stemmed from population genetics models, about 15 years ago, and population geneticists still significantly contribute to methodological developments of ABC. [Griffith & al., 1997; Tavaré & al., 1999]
  • 11. Demo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we cannot calculate the likelihood of the polymorphism data f (y|θ)...
  • 12. Demo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we cannot calculate the likelihood of the polymorphism data f (y|θ)...
  • 13. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T(k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2
  • 14. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T(k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2
  • 15. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Observations: leafs of the tree ^θ = ? Kingman’s genealogy When time axis is normalized, T(k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2
  • 16. Instance of ecological questions [message in a beetle] How did the Asian Ladybird beetle arrive in Europe? Why do they swarm right now? What are the routes of invasion? How to get rid of them? [Lombaert & al., 2010, PLoS ONE] beetles in forests
  • 17. Worldwide invasion routes of Harmonia Axyridis [Estoup et al., 2012, Molecular Ecology Res.]
  • 18. c Intractable likelihood Missing (too much missing!) data structure: f (y|θ) = G f (y|G, θ)f (G|θ)dG cannot be computed in a manageable way... [Stephens & Donnelly, 2000] The genealogies are considered as nuisance parameters This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest.
  • 19. c Intractable likelihood Missing (too much missing!) data structure: f (y|θ) = G f (y|G, θ)f (G|θ)dG cannot be computed in a manageable way... [Stephens & Donnelly, 2000] The genealogies are considered as nuisance parameters This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest.
  • 20. A?B?C? A stands for approximate [wrong likelihood / picture] B stands for Bayesian C stands for computation [producing a parameter sample]
  • 21. ABC methodology Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: Foundation For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y, then the selected θ ∼ π(θ|y) [Rubin, 1984; Diggle & Gratton, 1984; Tavaré et al., 1997]
  • 22. ABC methodology Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: Foundation For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y, then the selected θ ∼ π(θ|y) [Rubin, 1984; Diggle & Gratton, 1984; Tavaré et al., 1997]
  • 23. A as A...pproximative When y is a continuous random variable, strict equality z = y is replaced with a tolerance zone ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } def ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 24. A as A...pproximative When y is a continuous random variable, strict equality z = y is replaced with a tolerance zone ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } def ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 25. ABC recap Likelihood free rejection sampling Tavaré et al. (1997) Genetics 1) Set i = 1, 2) Generate θ from the prior distribution π(·), 3) Generate z from the likelihood f (·|θ ), 4) If ρ(η(z ), η(y)) , set (θi , zi ) = (θ , z ) and i = i + 1, 5) If i N, return to 2). We keep the θ’s values such that the distance between the corresponding simulated dataset and the observed dataset is small enough. Tuning parameters > 0: tolerance level, η(z): function that summarizes datasets, ρ(η, η ): distance between vectors of summary statistics N: size of the output
  • 26. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
  • 27. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
  • 28. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the restricted posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) . Not so good..!
  • 29. Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Loss of statistical information balanced against gain in data roughening Approximation error and information loss remain unknown Choice of statistics induces choice of distance function towards standardisation may be imposed for external/practical reasons (e.g., DIYABC) may gather several non-B point estimates [the more the merrier] can [machine-]learn about efficient combination
  • 30. Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Loss of statistical information balanced against gain in data roughening Approximation error and information loss remain unknown Choice of statistics induces choice of distance function towards standardisation may be imposed for external/practical reasons (e.g., DIYABC) may gather several non-B point estimates [the more the merrier] can [machine-]learn about efficient combination
  • 31. Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Loss of statistical information balanced against gain in data roughening Approximation error and information loss remain unknown Choice of statistics induces choice of distance function towards standardisation may be imposed for external/practical reasons (e.g., DIYABC) may gather several non-B point estimates [the more the merrier] can [machine-]learn about efficient combination
  • 32. attempts at summaries How to choose the set of summary statistics? Joyce and Marjoram (2008, SAGMB) Nunes and Balding (2010, SAGMB) Fearnhead and Prangle (2012, JRSS B) Ratmann et al. (2012, PLOS Comput. Biol) Blum et al. (2013, Statistical science) EP-ABC of Barthelmé & Chopin (2013, JASA) LDA selection of Estoup & al. (2012, Mol. Ecol. Res.)
  • 33. LDA advantages much faster computation of scenario probabilities via polychotomous regression a (much) lower number of explanatory variables improves the accuracy of the ABC approximation, reduces the tolerance and avoids extra costs in constructing the reference table allows for a large collection of initial summaries ability to evaluate Type I and Type II errors on more complex models LDA reduces correlation among explanatory variables When available, using both simulated and real data sets, posterior probabilities of scenarios computed from LDA-transformed and raw summaries are strongly correlated
  • 34. LDA advantages much faster computation of scenario probabilities via polychotomous regression a (much) lower number of explanatory variables improves the accuracy of the ABC approximation, reduces the tolerance and avoids extra costs in constructing the reference table allows for a large collection of initial summaries ability to evaluate Type I and Type II errors on more complex models LDA reduces correlation among explanatory variables When available, using both simulated and real data sets, posterior probabilities of scenarios computed from LDA-transformed and raw summaries are strongly correlated
  • 35. ABC for model choice Intractable likelihoods ABC methods ABC for model choice ABC model choice via random forests
  • 36. ABC model choice ABC model choice A) Generate large set of (m, θ, z) from the Bayesian predictive, π(m)πm(θ)fm(z|θ) B) Keep particles (m, θ, z) such that ρ(η(y), η(z)) C) For each m, return pm = proportion of m among remaining particles If ε tuned towards k resulting particles, then pm k-nearest neighbor estimate of P {M = m η(y) Approximating posterior prob’s of models = regression problem where response is 1 M = m , covariates are summary statistics η(z), loss is, e.g., L2 Method of choice in DIYABC is local polytomous logistic regression
  • 37. Machine learning perspective [paradigm shift] ABC model choice A) Generate a large set of (m, θ, z)’s from Bayesian predictive, π(m)πm(θ)fm(z|θ) B) Use machine learning tech. to infer on π(m η(y)) In this perspective: (iid) “data set” reference table simulated during stage A) observed y becomes a new data point Note that: predicting m is a classification problem ⇐⇒ select the best model based on a maximal a posteriori rule computing π(m|η(y)) is a regression problem ⇐⇒ confidence in each model classification is much simpler than regression (e.g., dim. of objects we try to learn)
  • 38. Warning the lost of information induced by using non sufficient summary statistics is a genuine problem Fundamental discrepancy between the genuine Bayes factors/posterior probabilities and the Bayes factors based on summary statistics. See, e.g., Didelot et al. (2011, Bayesian analysis) X et al. (2011, PNAS) Marin et al. (2014, JRSS B) . . . Call instead for machine learning approach able to handle with a large number of correlated summary statistics: random forests well suited for that task
  • 39. ABC model choice via random forests Intractable likelihoods ABC methods ABC for model choice ABC model choice via random forests Random forests ABC with random forests Illustrations
  • 40. Leaning towards machine learning Main notions: ABC-MC seen as learning about which model is most appropriate from a huge (reference) table exploiting a large number of summary statistics not an issue for machine learning methods intended to estimate efficient combinations abandoning (temporarily?) the idea of estimating posterior probabilities of the models, poorly approximated by machine learning methods, and replacing those by posterior predictive expected loss
  • 41. Random forests Technique that stemmed from Leo Breiman’s bagging (or bootstrap aggregating) machine learning algorithm for both classification and regression [Breiman, 1996] Improved classification performances by averaging over classification schemes of randomly generated training sets, creating a “forest" of (CART) decision trees, inspired by Amit and Geman (1997) ensemble learning [Breiman, 2001]
  • 42. CART construction Basic classification tree: Algorithm 1 CART start the tree with a single root repeat pick a non-homogeneous tip v such that Q(v) = 1 attach to v two daughter nodes v1 and v2 for all covariates Xj do find the threshold tj in the rule Xj < tj that minimizes N(v1)Q(v1) + N(v2)Q(v2) end for find the rule Xj < tj that minimizes N(v1)Q(v1) + N(v2)Q(v2) in j and set this best rule to node v until all tips v are homogeneous (Q(v) = 0) set the labels of all tips where Q is Gini’s index Q(vi ) = M y=1 ^p(vi , y) {1 − ^p(v,y)} .
  • 43. Growing the forest Breiman’s solution for inducing random features in the trees of the forest: boostrap resampling of the dataset and random subset-ing [of size √ t] of the covariates driving the classification at every node of each tree Covariate xτ that drives the node separation xτ cτ and the separation bound cτ chosen by minimising entropy or Gini index
  • 44. Breiman and Cutler’s algorithm Algorithm 2 Random forests for t = 1 to T do //*T is the number of trees*// Draw a bootstrap sample of size nboot = n Grow an unpruned decision tree for b = 1 to B do //*B is the number of nodes*// Select ntry of the predictors at random Determine the best split from among those predictors end for end for Predict new data by aggregating the predictions of the T trees
  • 45. Subsampling Due to both large datasets [practical] and theoretical recommendation of Scornet et al. (2014), from independence between trees to convergence issues, boostrap sample of much smaller size than original data size nboot = o(n) Each CART tree stops when number of observations per node is 1: no culling of the branches
  • 46. Subsampling Due to both large datasets [practical] and theoretical recommendation of Scornet et al. (2014), from independence between trees to convergence issues, boostrap sample of much smaller size than original data size nboot = o(n) Each CART tree stops when number of observations per node is 1: no culling of the branches
  • 47. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives, to pure noise) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi )
  • 48. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives, to pure noise) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi ) at each step O( √ p) indices sampled at random and most discriminating statistic selected, by minimising entropy Gini loss
  • 49. ABC with random forests Idea: Starting with possibly large collection of summary statistics (s1i , . . . , spi ) (from scientific theory input to available statistical softwares, to machine-learning alternatives, to pure noise) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M from (s1i , . . . , spi ) Average of the trees is resulting summary statistics, highly non-linear predictor of the model index
  • 50. Outcome of ABC-RF Random forest predicts a (MAP) model index, from the observed dataset: The predictor provided by the forest is “sufficient" to select the most likely model but not to derive associated posterior probability exploit entire forest by computing how many trees lead to picking each of the models under comparison but variability too high to be trusted frequency of trees associated with majority model is no proper substitute to the true posterior probability usual ABC-MC approximation equally highly variable and hard to assess random forests define a natural distance for ABC sample via agreement frequency
  • 51. Outcome of ABC-RF Random forest predicts a (MAP) model index, from the observed dataset: The predictor provided by the forest is “sufficient" to select the most likely model but not to derive associated posterior probability exploit entire forest by computing how many trees lead to picking each of the models under comparison but variability too high to be trusted frequency of trees associated with majority model is no proper substitute to the true posterior probability usual ABC-MC approximation equally highly variable and hard to assess random forests define a natural distance for ABC sample via agreement frequency
  • 52. Posterior predictive expected losses We suggest replacing unstable approximation of P(M = m|xo) with xo observed sample and m model index, by average of the selection errors across all models given the data xo, P( ^M(X) = M|xo) where pair (M, X) generated from the predictive f (x|θ)π(θ, M|xo)dθ and ^M(x) denotes the random forest model (MAP) predictor
  • 53. Posterior predictive expected losses Arguments: Bayesian estimate of the posterior error integrates error over most likely part of the parameter space gives an averaged error rather than the posterior probability of the null hypothesis easily computed: Given ABC subsample of parameters from reference table, simulate pseudo-samples associated with those and derive error frequency
  • 54. Algorithmic implementation Algorithm 3 Computation of the posterior error (a) Use the trained RF to compute proximity between each (m, θ, S(x)) of the reference table and s0 = S(x0) (b) Select the k simulations with the highest proximity to s0 (c) For each (m, θ) in the latter set, compute Npp new simulations S(y) from f (y|θ, m) (d) Return the frequency of erroneous RF predictions over these k × Npp simulations
  • 55. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using first two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #1: values of p(m|x) [obtained by numerical integration] and p(m|S(x)) [obtained by mixing ABC outcome and density estimation] highly differ!
  • 56. toy: MA(1) vs. MA(2) Difference between the posterior probability of MA(2) given either x or S(x). Blue stands for data from MA(1), orange for data from MA(2)
  • 57. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #2: Embedded models, with simulations from MA(1) within those from MA(2), hence linear classification poor
  • 58. toy: MA(1) vs. MA(2) Simulations of S(x) under MA(1) (blue) and MA(2) (orange)
  • 59. toy: MA(1) vs. MA(2) Comparing an MA(1) and an MA(2) models: xt = t − ϑ1 t−1[−ϑ2 t−2] Earlier illustration using two autocorrelations as S(x) [Marin et al., Stat. & Comp., 2011] Result #3: On such a small dimension problem, random forests should come second to k-nn ou kernel discriminant analyses
  • 60. toy: MA(1) vs. MA(2) classification prior method error rate (in %) LDA 27.43 Logist. reg. 28.34 SVM (library e1071) 17.17 “naïve” Bayes (with G marg.) 19.52 “naïve” Bayes (with NP marg.) 18.25 ABC k-nn (k = 100) 17.23 ABC k-nn (k = 50) 16.97 Local log. reg. (k = 1000) 16.82 Random Forest 17.04 Kernel disc. ana. (KDA) 16.95 True MAP 12.36
  • 61. Evolution scenarios based on SNPs Three scenarios for the evolution of three populations from their most common ancestor
  • 62. Evolution scenarios based on SNPs DIYBAC header (!) 7 parameters and 48 summary statistics 3 scenarios: 7 7 7 scenario 1 [0.33333] (6) N1 N2 N3 0 sample 1 0 sample 2 0 sample 3 ta merge 1 3 ts merge 1 2 ts varne 1 N4 scenario 2 [0.33333] (6) N1 N2 N3 .......... ts varne 1 N4 scenario 3 [0.33333] (7) N1 N2 N3 ........ historical parameters priors (7,1) N1 N UN[100.0,30000.0,0.0,0.0] N2 N UN[100.0,30000.0,0.0,0.0] N3 N UN[100.0,30000.0,0.0,0.0] ta T UN[10.0,30000.0,0.0,0.0] ts T UN[10.0,30000.0,0.0,0.0] N4 N UN[100.0,30000.0,0.0,0.0] r A UN[0.05,0.95,0.0,0.0] ts>ta DRAW UNTIL
  • 63. Evolution scenarios based on SNPs Model 1 with 6 parameters: four effective sample sizes: N1 for population 1, N2 for population 2, N3 for population 3 and, finally, N4 for the native population; the time of divergence ta between populations 1 and 3; the time of divergence ts between populations 1 and 2. effective sample sizes with independent uniform priors on [100, 30000] vector of divergence times (ta, ts) with uniform prior on {(a, s) ∈ [10, 30000] ⊗ [10, 30000]|a < s}
  • 64. Evolution scenarios based on SNPs Model 2 with same parameters as model 1 but the divergence time ta corresponds to a divergence between populations 2 and 3; prior distributions identical to those of model 1 Model 3 with extra seventh parameter, admixture rate r. For that scenario, at time ta admixture between populations 1 and 2 from which population 3 emerges. Prior distribution on r uniform on [0.05, 0.95]. In that case models 1 and 2 are not embeddeded in model 3. Prior distributions for other parameters the same as in model 1
  • 65. Evolution scenarios based on SNPs Set of 48 summary statistics: Single sample statistics proportion of loci with null gene diversity (= proportion of monomorphic loci) mean gene diversity across polymorphic loci [Nei, 1987] variance of gene diversity across polymorphic loci mean gene diversity across all loci
  • 66. Evolution scenarios based on SNPs Set of 48 summary statistics: Two sample statistics proportion of loci with null FST distance between both samples [Weir and Cockerham, 1984] mean across loci of non null FST distances between both samples variance across loci of non null FST distances between both samples mean across loci of FST distances between both samples proportion of 1 loci with null Nei’s distance between both samples [Nei, 1972] mean across loci of non null Nei’s distances between both samples variance across loci of non null Nei’s distances between both samples mean across loci of Nei’s distances between the two samples
  • 67. Evolution scenarios based on SNPs Set of 48 summary statistics: Three sample statistics proportion of loci with null admixture estimate mean across loci of non null admixture estimate variance across loci of non null admixture estimated mean across all locus admixture estimates
  • 68. Evolution scenarios based on SNPs For a sample of 1000 SNIPs measured on 25 biallelic individuals per population, learning ABC reference table with 20, 000 simulations, prior predictive error rates: “naïve Bayes” classifier 33.3% raw LDA classifier 23.27% ABC k-nn [Euclidean dist. on summaries normalised by MAD] 25.93% ABC k-nn [unnormalised Euclidean dist. on LDA components] 22.12% local logistic classifier based on LDA components with k = 500 neighbours 22.61% random forest on summaries 21.03% (Error rates computed on a prior sample of size 104)
  • 69. Evolution scenarios based on SNPs For a sample of 1000 SNIPs measured on 25 biallelic individuals per population, learning ABC reference table with 20, 000 simulations, prior predictive error rates: “naïve Bayes” classifier 33.3% raw LDA classifier 23.27% ABC k-nn [Euclidean dist. on summaries normalised by MAD] 25.93% ABC k-nn [unnormalised Euclidean dist. on LDA components] 22.12% local logistic classifier based on LDA components with k = 1000 neighbours 22.46% random forest on summaries 21.03% (Error rates computed on a prior sample of size 104)
  • 70. Evolution scenarios based on SNPs For a sample of 1000 SNIPs measured on 25 biallelic individuals per population, learning ABC reference table with 20, 000 simulations, prior predictive error rates: “naïve Bayes” classifier 33.3% raw LDA classifier 23.27% ABC k-nn [Euclidean dist. on summaries normalised by MAD] 25.93% ABC k-nn [unnormalised Euclidean dist. on LDA components] 22.12% local logistic classifier based on LDA components with k = 5000 neighbours 22.43% random forest on summaries 21.03% (Error rates computed on a prior sample of size 104)
  • 71. Evolution scenarios based on SNPs For a sample of 1000 SNIPs measured on 25 biallelic individuals per population, learning ABC reference table with 20, 000 simulations, prior predictive error rates: “naïve Bayes” classifier 33.3% raw LDA classifier 23.27% ABC k-nn [Euclidean dist. on summaries normalised by MAD] 25.93% ABC k-nn [unnormalised Euclidean dist. on LDA components] 22.12% local logistic classifier based on LDA components with k = 5000 neighbours 22.43% random forest on LDA components only 23.1% (Error rates computed on a prior sample of size 104)
  • 72. Evolution scenarios based on SNPs For a sample of 1000 SNIPs measured on 25 biallelic individuals per population, learning ABC reference table with 20, 000 simulations, prior predictive error rates: “naïve Bayes” classifier 33.3% raw LDA classifier 23.27% ABC k-nn [Euclidean dist. on summaries normalised by MAD] 25.93% ABC k-nn [unnormalised Euclidean dist. on LDA components] 22.12% local logistic classifier based on LDA components with k = 5000 neighbours 22.43% random forest on summaries and LDA components 19.03% (Error rates computed on a prior sample of size 104)
  • 73. Evolution scenarios based on SNPs Posterior predictive error rates
  • 74. Evolution scenarios based on SNPs Posterior predictive error rates favourable: 0.010 error – unfavourable: 0.104 error
  • 75. Evolution scenarios based on microsatellites Same setting as previously Sample of 25 diploid individuals per population, on 20 locus (roughly corresponds to 1/5th of previous information)
  • 76. Evolution scenarios based on microsatellites One sample statistics mean number of alleles across loci mean gene diversity across loci (Nei, 1987) mean allele size variance across loci mean M index across loci (Garza and Williamson, 2001; Excoffier et al., 2005)
  • 77. Evolution scenarios based on microsatellites Two sample statistics mean number of alleles across loci (two samples) mean gene diversity across loci (two samples) mean allele size variance across loci (two samples) FST between two samples (Weir and Cockerham, 1984) mean index of classification (two samples) (Rannala and Moutain, 1997; Pascual et al., 2007) shared allele distance between two samples (Chakraborty and Jin, 1993) (δµ)2 distance between two samples (Golstein et al., 1995) Three sample statistics Maximum likelihood coefficient of admixture (Choisy et al., 2004)
  • 78. Evolution scenarios based on microsatellites classification prior error∗ method rate (in %) raw LDA 35.64 “naïve” Bayes (with G marginals) 40.02 k-nn (MAD normalised sum stat) 37.47 k-nn (unormalised LDA) 35.14 RF without LDA components 35.14 RF with LDA components 33.62 RF with only LDA components 37.25 ∗ estimated on pseudo-samples of 104 items drawn from the prior
  • 79. Evolution scenarios based on microsatellites Posterior predictive error rates
  • 80. Evolution scenarios based on microsatellites Posterior predictive error rates favourable: 0.183 error – unfavourable: 0.435 error
  • 81. Back to Asian Ladybirds [message in a beetle] Comparing 10 scenarios of Asian beetle invasion beetle moves
  • 82. Back to Asian Ladybirds [message in a beetle] Comparing 10 scenarios of Asian beetle invasion beetle moves classification prior error† method rate (in %) raw LDA 38.94 “naïve” Bayes (with G margins) 54.02 k-nn (MAD normalised sum stat) 58.47 RF without LDA components 38.84 RF with LDA components 35.32 † estimated on pseudo-samples of 104 items drawn from the prior
  • 83. Back to Asian Ladybirds [message in a beetle] Comparing 10 scenarios of Asian beetle invasion beetle moves Random forest allocation frequencies 1 2 3 4 5 6 7 8 9 10 0.168 0.1 0.008 0.066 0.296 0.016 0.092 0.04 0.014 0.2 Posterior predictive error based on 20,000 prior simulations and keeping 500 neighbours (or 100 neighbours and 10 pseudo-datasets per parameter) 0.3682
  • 84. Back to Asian Ladybirds [message in a beetle] Comparing 10 scenarios of Asian beetle invasion
  • 85. Back to Asian Ladybirds [message in a beetle] Comparing 10 scenarios of Asian beetle invasion 0 500 1000 1500 2000 0.400.450.500.550.600.65 k error posterior predictive error 0.368
  • 86. Back to Asian Ladybirds [message in a beetle] Harlequin ladybird data: estimated prior error rates for various classification methods and sizes of reference table. Classification method Prior error rates (%) trained on Nref = 10, 000 Nref = 20, 000 Nref = 50, 000 linear discriminant analysis (LDA) 39.91 39.30 39.04 standard ABC (knn) on DIYABC summaries 57.46 53.76 51.03 standard ABC (knn) on LDA axes 39.18 38.46 37.91 local logistic regression on LDA axes 41.04 37.08 36.05 random forest (RF) on DIYABC summaries 40.18 38.94 37.63 RF on DIYABC summaries and LDA axes 36.86 35.62 34.44
  • 87. Conclusion Key ideas π m η(y) = π m y Rather than approximating π m η(y) , focus on selecting the best model (classif. vs regression) Assess confidence in the selection with posterior predictive expected losses Consequences on ABC-PopGen Often, RF k-NN (less sensible to high correlation in summaries) RF requires many less prior simulations RF selects automatically relevent summaries Hence can handle much more complex models
  • 88. Conclusion Key ideas π m η(y) = π m y Use a seasoned machine learning technique selecting from ABC simulations: minimise 0-1 loss mimics MAP Assess confidence in the selection with posterior predictive expected losses Consequences on ABC-PopGen Often, RF k-NN (less sensible to high correlation in summaries) RF requires many less prior simulations RF selects automatically relevent summaries Hence can handle much more complex models