Talk on Bayesian model choice and an improved method for estimating shared evolutionary history. Presented at Evolution 2014 in Raleigh, North Carolina, USA.
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
joaks-evolution-2014
1. An Improved Approximate-Bayesian Method for
Estimating Shared Evolutionary History
Jamie R. Oaks1,2
1Department of Ecology and Evolutionary Biology, University of Kansas
2Department of Biology, University of Washington
June 21, 2014
Estimating shared history J. Oaks, University of Washington 1/24
2. Processes of diversification
Large-scale geological and climatic processes are important in
biodiversification and community assembly
Estimating shared history J. Oaks, University of Washington 2/24
3. Processes of diversification
Large-scale geological and climatic processes are important in
biodiversification and community assembly
Accounting for such processes will better our understanding of
biodiversity
Estimating shared history J. Oaks, University of Washington 2/24
4. Processes of diversification
Large-scale geological and climatic processes are important in
biodiversification and community assembly
Accounting for such processes will better our understanding of
biodiversity
We need methods for inferring evolutionary patterns predicted
by historical events from contemporary populations
Estimating shared history J. Oaks, University of Washington 2/24
9. Divergence model choice
T = (T1, T2, T3)
model = 111
τ = {τ1}
τ1
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
10. Divergence model choice
T = (260, 260, 260)
model = 111
τ = {260}
τ1
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
11. Divergence model choice
T = (397, 260, 260)
model = 211
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
12. Divergence model choice
T = (260, 397, 260)
model = 121
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
13. Divergence model choice
T = (260, 260, 397)
model = 112
τ = {260, 397}
τ1τ2
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
14. Divergence model choice
T = (260, 95, 397)
model = 123
τ = {260, 95, 397}
τ1 τ3τ2
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
15. Divergence model choice
T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
τ1
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
16. Divergence model choice
T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and T
given DNA sequence
alignments X
τ1
T1
T2
T3
0100200300400500
Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
17. Divergence model choice
T = (T1, . . . , TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and T
given DNA sequence
alignments X
τ1
0100200300400500
Time (kya)
T1
T2
T3
Estimating shared history J. Oaks, University of Washington 3/24
18. Divergence model choice
X Sequence alignments
T Divergence times
m Divergence model
G Gene trees
φ Substitution
parameters
Θ Demographic
parameters
We want to infer m and T
given DNA sequence
alignments X
τ1
0100200300400500
Time (kya)
T1
T2
T3
Estimating shared history J. Oaks, University of Washington 3/24
19. Bayesian model choice
Full model:
p(T, G, φ, Θ | X, mi ) =
p(X | T, G, φ, Θ, mi )p(T, G, φ, Θ | mi )
p(X | mi )
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
20. Bayesian model choice
Full model:
p(T, G, φ, Θ | X, mi ) =
p(X | T, G, φ, Θ, mi )p(T, G, φ, Θ | mi )
p(X | mi )
p(X | mi ) =
θi
p(X | θi , mi )p(θi | mi )dθi
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
21. Bayesian model choice
Full model:
p(T, G, φ, Θ | X, mi ) =
p(X | T, G, φ, Θ, mi )p(T, G, φ, Θ | mi )
p(X | mi )
p(X | mi ) =
θi
p(X | θi , mi )p(θi | mi )dθi
p(mi | X) =
p(X | mi )p(mi )
i p(X | mi )p(mi )
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
22. Bayesian model choice
Full model:
p(T, G, φ, Θ | X, mi ) =
p(X | T, G, φ, Θ, mi )p(T, G, φ, Θ | mi )
p(X | mi )
p(X | mi ) =
θi
p(X | θi , mi )p(θi | mi )dθi
p(mi | X) =
p(X | mi )p(mi )
i p(X | mi )p(mi )
msBayes: Approximate Bayesian computation (ABC)
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
23. The msBayes model
msBayes will often infer clustered divergences when divergences are
random over millions of generations.
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 5/24
24. The msBayes model
msBayes will often infer clustered divergences when divergences are
random over millions of generations.
Objective:
Use principles of probability to extend msBayes framework for
improved estimation of shared evolutionary history
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 5/24
25. An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginal
likelihoods of rich models
2. Alternative approach to modeling the temporal distribution of
divergences
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 6/24
26. p(X) =
θ
p(X | θ)p(θ)dθ
Estimating shared history J. Oaks, University of Washington 7/24
27. p(X) =
θ
p(X | θ)p(θ)dθ
Estimating shared history J. Oaks, University of Washington 7/24
28. p(X) =
θ
p(X | θ)p(θ)dθ
0.0 0.2 0.4 0.6 0.8 1.0
θ
0
5
10
15
20
25
30Density
p(X| θ)
p(θ)
Estimating shared history J. Oaks, University of Washington 7/24
29. p(X) =
θ
p(X | θ)p(θ)dθ
0.0 0.2 0.4 0.6 0.8 1.0
θ
0
5
10
15
20
25
30Density
p(X| θ)
p(θ)
Estimating shared history J. Oaks, University of Washington 7/24
30. An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginal
likelihoods of rich models
2. Alternative approach to modeling the temporal distribution of
divergences
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 8/24
31. Prior on divergence models
msBayes uses a discrete uniform prior on the number of
divergence events
#ofdivergencemodels
020406080100120
1 3 5 7 9 11 13 15 17 19 21
A
p(M|τ|,i)
0.000.010.020.030.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|
Estimating shared history J. Oaks, University of Washington 9/24
32. Prior on divergence models
msBayes uses a discrete uniform prior on the number of
divergence events
#ofdivergencemodels
020406080100120
1 3 5 7 9 11 13 15 17 19 21
A
p(M|τ|,i)
0.000.010.020.030.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|
Potential solution:
Place flexible prior directly on the sample space of divergence
models
Estimating shared history J. Oaks, University of Washington 9/24
33. New method: dpp-msbayes
Replaced uniform priors on continuous parameters with
gamma and beta distributions
Dirichlet process prior (DPP) over all possible divergence
models
Estimating shared history J. Oaks, University of Washington 10/24
34. dpp-msbayes: Simulation-based assessment
Simulate 50,000 datasets under three models
MmsBayes U-shaped prior on divergence models
Uniform priors on continuous parameters
MUshaped U-shaped prior on divergence models
Gamma priors on continuous parameters
MDPP DPP prior on divergence models
Gamma priors on continuous parameters
Analyze all datasets under each of the models
Estimating shared history J. Oaks, University of Washington 11/24
35. dpp-msbayes: Simulation results
0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Posterior probability of one divergence
Trueprobabilityofonedivergence
Analysismodel
Data model
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 12/24
36. dpp-msbayes: Simulation results
0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP MUniform MUshaped
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Posterior probability of one divergence
Trueprobabilityofonedivergence
Analysismodel
Data model
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 12/24
37. dpp-msbayes: Simulation-based power analyses
Simulate datasets in which all 22 divergence times are random
τ ∼ U(0, 0.5 MGA)
τ ∼ U(0, 1.5 MGA)
τ ∼ U(0, 2.5 MGA)
τ ∼ U(0, 5.0 MGA)
MGA = Millions of Generations Ago
Simulate 1000 datasets for each τ distribution
Analyze all 4000 datasets under models MmsBayes, MUshaped ,
and MDPP
Estimating shared history J. Oaks, University of Washington 13/24
38. dpp-msbayes: Power results
1 3 5 7 9 11 13 15 17 19 21
0.0
0.2
0.4
0.6
0.8
1.0
¿ »U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿ »U(0; 5:0 MGA)
MmsBayes
Estimated number of divergence events (mode)
Density
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 14/24
40. dpp-msbayes: Power results
0.0 0.25 0.5 0.75 1
0
2
4
6
8
10
12
14
16
¿ »U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 5:0 MGA)
MmsBayes
Posterior probability of one divergence
Density
0.0 0.25 0.5 0.75 1
0
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Posterior probability of one divergence
Density
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 15/24
41. dpp-msbayes: Power results
0.0 0.25 0.5 0.75 1
0
2
4
6
8
10
12
14
16
¿ »U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿ »U(0; 5:0 MGA)
MmsBayes
Posterior probability of one divergence
Density
0.0 0.25 0.5 0.75 1
0
1
2
3
4
5
6
7
8
9
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MUshaped
Posterior probability of one divergence
Density
0.0 0.25 0.5 0.75 1
0
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Posterior probability of one divergence
Density
Estimating shared history J. Oaks, University of Washington 16/24
42. Empirical application
Did fragmentation of Philippine
Islands during inter-glacial rises in
sea level promote diversification?
Estimating shared history J. Oaks, University of Washington 17/24
43. Empirical results: Philippine diversification
1 3 5 7 9 11 13 15 17 19 21
Number of divergence events
0.0
0.1
0.2
0.3
0.4
0.5
Posteriorprobability
msBayes
1 3 5 7 9 11 13 15 17 19 21
Number of divergence events
dpp-msbayes
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 18/24
44. Conclusions
New method for estimating shared evolutionary history shows
improved
1. Estimation of posterior uncertainty
2. Model-choice accuracy
3. Power to detect temporal variation across divergences
4. Robustness to model violations
Estimating shared history J. Oaks, University of Washington 19/24
45. Conclusions
New method for estimating shared evolutionary history shows
improved
1. Estimation of posterior uncertainty
2. Model-choice accuracy
3. Power to detect temporal variation across divergences
4. Robustness to model violations
Caveats:
Estimating a very rich (600+ parameters for 22 taxa) model
using limited information from the data
Likely sensitive to prior assumptions
Be skeptical of strongly supported results
Estimating shared history J. Oaks, University of Washington 19/24
46. Recommendations
For Bayesian model choice, choose priors carefully
ABC model choice estimates should be accompanied by:
1. Simulation-based power analyses
2. Assessment of prior sensitivity
Estimating shared history J. Oaks, University of Washington 20/24
47. Future directions
Full-likelihood Bayesian approach 1
Full-phylogenetic framework
τ1
0100200300400500
Time (kya)
T1
T2
T3
1 J. Sukumaran (2012). PhD thesis. Lawrence, Kansas, USA: University of Kansas
Estimating shared history J. Oaks, University of Washington 21/24
48. Everything is on GitHub. . .
Software:
dpp-msbayes: https://github.com/joaks1/dpp-msbayes
PyMsBayes: https://github.com/joaks1/PyMsBayes
ABACUS: Approximate BAyesian C UtilitieS.
https://github.com/joaks1/abacus
Open-Science Notebook:
msbayes-experiments:
https://github.com/joaks1/msbayes-experiments
Estimating shared history J. Oaks, University of Washington 22/24
49. Acknowledgments
Ideas and feedback:
Holder Lab
KU Herpetology
Melissa Callahan
Computation:
KU ITTC
KU Computing Center
iPlant
Funding:
NSF
KU Grad Studies, EEB & BI
SSB
Sigma Xi
Photo credits:
Rafe Brown, Cam Siler, &
Jake Esselstyn
FMNH Philippine Mammal
Website:
D.S. Balete, M.R.M. Duya,
& J. Holden
PhyloPic!
Estimating shared history J. Oaks, University of Washington 23/24
51. Causes of bias: Insufficient sampling
Models with more parameter space are less densely sampled
Could explain bias toward small models in extreme cases
Predicts large variance in posterior estimates
We explored empirical and simulation-based analyses with 2, 5,
and 10 million prior samples, and estimates were very similar
0.0 0.2 0.4 0.6 0.8 1.0
1e8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
95%HPDDT
UnadjustedA
0.0 0.2 0.4 0.6 0.8 1.0
1e8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8 GLM-adjustedB
Number of prior samples
Estimating shared history J. Oaks, University of Washington 24/24