joaks-evolution-2014

An Improved Approximate-Bayesian Method for
Estimating Shared Evolutionary History
Jamie R. Oaks1,2
1Department of Ecology and Evolutionary Biology, University of Kansas
2Department of Biology, University of Washington
June 21, 2014
Estimating shared history J. Oaks, University of Washington 1/24

Processes of diversiﬁcation
Large-scale geological and climatic processes are important in
biodiversiﬁcation and community assembly

Accounting for such processes will better our understanding of
biodiversity

Accounting for such processes will better our understanding of
biodiversity
We need methods for inferring evolutionary patterns predicted
by historical events from contemporary populations

Community scale processes

Community scale processes
0100200300400500
Time (kya)

Divergence model choice
T = (T1, T2, T3)
model = 111
τ = {τ1}
τ1
T1
T2
T3
0100200300400500
Time (kya)

T = (260, 260, 260)
model = 111
τ = {260}
τ1
T1
T2
T3
0100200300400500
Time (kya)

T = (397, 260, 260)
model = 211
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)

T = (260, 397, 260)
model = 121
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)

T = (260, 260, 397)
model = 112
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)

T = (260, 95, 397)
model = 123
τ = {260, 95, 397}
τ1 τ3τ2
T1
T2
T3
0100200300400500
Time (kya)

T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
τ1
T1
T2
T3
0100200300400500
Time (kya)

T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and T
given DNA sequence
alignments X
τ1
T1
T2
T3
0100200300400500
Time (kya)

T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
given DNA sequence
alignments X
τ1
0100200300400500
Time (kya)
T1
T2
T3

X Sequence alignments
T Divergence times
m Divergence model
G Gene trees
φ Substitution
parameters
Θ Demographic
parameters
given DNA sequence
alignments X
τ1
0100200300400500
Time (kya)
T1
T2
T3

Bayesian model choice
Full model:
p(T, G, φ, Θ | X, mi ) =
p(X | T, G, φ, Θ, mi )p(T, G, φ, Θ | mi )
p(X | mi )
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.

The msBayes model
msBayes will often infer clustered divergences when divergences are
random over millions of generations.
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].

The msBayes model
msBayes will often infer clustered divergences when divergences are
random over millions of generations.
Objective:
Use principles of probability to extend msBayes framework for
improved estimation of shared evolutionary history

An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginal
likelihoods of rich models
2. Alternative approach to modeling the temporal distribution of
divergences

p(X) =
θ
p(X | θ)p(θ)dθ

p(X) =
θ
p(X | θ)p(θ)dθ
0.0 0.2 0.4 0.6 0.8 1.0
θ
0
5
10
15
20
25
30Density
p(X| θ)
p(θ)

An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginal
likelihoods of rich models
2. Alternative approach to modeling the temporal distribution of
divergences

Prior on divergence models
msBayes uses a discrete uniform prior on the number of
divergence events
#ofdivergencemodels
020406080100120
1 3 5 7 9 11 13 15 17 19 21
A
p(M|τ|,i)
0.000.010.020.030.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|

Prior on divergence models
msBayes uses a discrete uniform prior on the number of
divergence events
#ofdivergencemodels
020406080100120
1 3 5 7 9 11 13 15 17 19 21
A
p(M|τ|,i)
0.000.010.020.030.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|
Potential solution:
Place ﬂexible prior directly on the sample space of divergence
models

New method: dpp-msbayes
Replaced uniform priors on continuous parameters with
gamma and beta distributions
Dirichlet process prior (DPP) over all possible divergence
models

dpp-msbayes: Simulation-based assessment
Simulate 50,000 datasets under three models
MmsBayes U-shaped prior on divergence models
Uniform priors on continuous parameters
MUshaped U-shaped prior on divergence models
Gamma priors on continuous parameters
MDPP DPP prior on divergence models
Gamma priors on continuous parameters
Analyze all datasets under each of the models

dpp-msbayes: Simulation results
0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Posterior probability of one divergence
Trueprobabilityofonedivergence
Analysismodel
Data model
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].

0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP MUniform MUshaped
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Trueprobabilityofonedivergence
Analysismodel
Data model

dpp-msbayes: Simulation-based power analyses
Simulate datasets in which all 22 divergence times are random
τ ∼ U(0, 0.5 MGA)
τ ∼ U(0, 1.5 MGA)
τ ∼ U(0, 2.5 MGA)
τ ∼ U(0, 5.0 MGA)
MGA = Millions of Generations Ago
Simulate 1000 datasets for each τ distribution
Analyze all 4000 datasets under models MmsBayes, MUshaped ,
and MDPP

dpp-msbayes: Power results
1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
¿ »U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 5:0 MGA)
MmsBayes
Estimated number of divergence events (mode)
Density

1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
¿ »U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 5:0 MGA)
MmsBayes
Density
1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MDPP
Density

0.0 0.25 0.5 0.75 1
0
2
4
6
8
10
12
14
16
¿ »U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 5:0 MGA)
MmsBayes
Density
0.0 0.25 0.5 0.75 1
0
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Density

0.0 0.25 0.5 0.75 1
0
2
4
6
8
10
12
14
16
¿ »U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 5:0 MGA)
MmsBayes
Density
0.0 0.25 0.5 0.75 1
0
1
2
3
4
5
6
7
8
9
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MUshaped
Density
0.0 0.25 0.5 0.75 1
0
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Density

Empirical application
Did fragmentation of Philippine
Islands during inter-glacial rises in
sea level promote diversiﬁcation?

Empirical results: Philippine diversiﬁcation
1 3 5 7 9 11 13 15 17 19 21
Number of divergence events
0.0
0.1
0.2
0.3
0.4
0.5
Posteriorprobability
msBayes
1 3 5 7 9 11 13 15 17 19 21
dpp-msbayes

Conclusions
New method for estimating shared evolutionary history shows
improved
1. Estimation of posterior uncertainty
2. Model-choice accuracy
3. Power to detect temporal variation across divergences
4. Robustness to model violations

Conclusions
New method for estimating shared evolutionary history shows
improved
1. Estimation of posterior uncertainty
2. Model-choice accuracy
3. Power to detect temporal variation across divergences
4. Robustness to model violations
Caveats:
Estimating a very rich (600+ parameters for 22 taxa) model
using limited information from the data
Likely sensitive to prior assumptions
Be skeptical of strongly supported results

Recommendations
For Bayesian model choice, choose priors carefully
ABC model choice estimates should be accompanied by:
1. Simulation-based power analyses
2. Assessment of prior sensitivity

Future directions
Full-likelihood Bayesian approach 1
Full-phylogenetic framework
τ1
0100200300400500
Time (kya)
T1
T2
T3
1 J. Sukumaran (2012). PhD thesis. Lawrence, Kansas, USA: University of Kansas

Everything is on GitHub. . .
Software:
dpp-msbayes: https://github.com/joaks1/dpp-msbayes
PyMsBayes: https://github.com/joaks1/PyMsBayes
ABACUS: Approximate BAyesian C UtilitieS.
https://github.com/joaks1/abacus
Open-Science Notebook:
msbayes-experiments:
https://github.com/joaks1/msbayes-experiments

Acknowledgments
Ideas and feedback:
Holder Lab
KU Herpetology
Melissa Callahan
Computation:
KU ITTC
KU Computing Center
iPlant
Funding:
NSF
KU Grad Studies, EEB & BI
SSB
Sigma Xi
Photo credits:
Rafe Brown, Cam Siler, &
Jake Esselstyn
FMNH Philippine Mammal
Website:
D.S. Balete, M.R.M. Duya,
& J. Holden
PhyloPic!

Questions?
joaks1@gmail.com

Causes of bias: Insuﬃcient sampling
Models with more parameter space are less densely sampled
Could explain bias toward small models in extreme cases
Predicts large variance in posterior estimates
We explored empirical and simulation-based analyses with 2, 5,
and 10 million prior samples, and estimates were very similar
0.0 0.2 0.4 0.6 0.8 1.0
1e8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
95%HPDDT
UnadjustedA
0.0 0.2 0.4 0.6 0.8 1.0
1e8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8 GLM-adjustedB
Number of prior samples

1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
¿ »U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 5:0 MGA)
MmsBayes
Density
1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MUshaped
Density
1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MDPP
Density

0.0 0.02 0.04 0.06 0.08 0.1 0.12
0.0
50.0
100.0
150.0
200.0
p( ^DT <0:01) =1:0
¿ »U(0; 0:5 MGA)
0.0 0.02 0.04 0.06 0.08 0.1 0.12
0.0
50.0
100.0
150.0
200.0
p( ^DT <0:01) =0:999
¿ »U(0; 1:5 MGA)
0.0 0.02 0.04 0.06 0.08
0.0
50.0
100.0
150.0
200.0
p( ^DT <0:01) =0:996
¿ »U(0; 2:5 MGA)
0.0 0.02 0.04 0.06 0.08 0.1 0.12
0.0
40.0
80.0
120.0
160.0
p( ^DT <0:01) =0:637
¿ »U(0; 5:0 MGA)
MmsBayes
Estimated variance in divergence times (median)
Density
0.0 0.1 0.2 0.3
0.0
20.0
40.0
60.0
p( ^DT <0:01) =0:914
0.0 0.2 0.4 0.6 0.8
0.0
5.0
10.0
15.0
20.0
25.0
p( ^DT <0:01) =0:626
0.0 0.2 0.4 0.6 0.8
0.0
2.0
4.0
6.0
8.0
p( ^DT <0:01) =0:235
0.0 0.4 0.8 1.2
0.0
0.5
1.0
1.5
2.0
2.5
p( ^DT <0:01) =0:004
MUshaped
Density
0.0 0.1 0.2 0.3 0.4 0.5
0.0
2.0
4.0
6.0
8.0
10.0
p( ^DT <0:01) =0:002
0.0 0.4 0.8 1.2
0.0
1.0
2.0
3.0
4.0
p( ^DT <0:01) =0:0
0.0 0.4 0.8 1.2
0.0
0.5
1.0
1.5
2.0
2.5
p( ^DT <0:01) =0:0
0.0 0.4 0.8 1.2 1.6
0.0
0.5
1.0
1.5
2.0
2.5
3.0
p( ^DT <0:01) =0:0
MDPP
Density

Empirical results: Philippine diversiﬁcation
0.0
0.1
0.2
0.3
0.4
0.5
msBayes dpp-msbayes
Posterior
1 3 5 7 9 11 13 15 17 19 21
0.0
0.1
0.2
0.3
0.4
0.5
1 3 5 7 9 11 13 15 17 19 21
Prior
Probability

joaks-evolution-2014

Recommandé

Recommandé

Contenu connexe

Similaire à joaks-evolution-2014

Similaire à joaks-evolution-2014 (20)

Plus de Jamie Oaks

Plus de Jamie Oaks (6)

Dernier

Dernier (20)

joaks-evolution-2014