SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
A probabilistic parsimonious model
for species tree reconstruction

Leonardo de Oliveira Martins
David Posada

●

leomrtns@uvigo.es

●

dposada@uvigo.es

with invaluable help from Klaus Schliep and Diego Mallo
What do we want
●

To estimate species trees given arbitrary gene families ←

can contain paralogous, missing data, etc.

To account for uncertainty in gene tree and species tree
estimation ← some gene families may be more informative, or
●

maybe we don't have signal at all

●

To allow for several sources of disagreement ← real data

seldomly can be explained by just one biological phenomenon

●

Fast computation ← improvement provided by slower, fully

probabilistic methods may be elusive, and they can benefit from
our output nonetheless
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Model for the evolution of gene families
S

G1
D1

G2
D2

Gn
Dn

.
.
.
Model for the evolution of gene families
S

G1
D1

We just need to consider the
simplest explanation for the

P(G/S)

Our assumption:

difference between the gene
and species trees
we may use several such
simple explanations
●

distance between G and S
Model for the evolution of gene families
S

G1
D1

We just need to consider the
simplest explanation for the
difference between the gene
and species trees

P(G/S)

Our assumption:

Rodrigo and Steel.
2008. SystBiol 57: 243
ML supertrees

we may use several such
simple explanations
●

work with unrooted gene
trees
●

penalize gene trees very
different from species tree
●

distance between G and S
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Quantifying the disagreement
assuming deepcoal:

gene tree
species tree
reconciliation
1 deepcoal
assuming duplosses:

1 dup
3 losses
assuming HGT:

1 event
Quantifying the disagreement
assuming deepcoal:

gene tree
species tree
reconciliation
1 deepcoal
assuming duplosses:

1 dup
3 losses
assuming HGT:

1 event
Stochastic error/nonparametric
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Quantifying the disagreement – other measures

mul-tree version: Chaudhary R, Burleigh JG, Fernández-Baca D (2013) Inferring Species Trees from
Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance. arXiv:1210.2665
Quantifying the disagreement – other measures

de Oliveira Martins et al. (2008) Phylogenetic Detection of Recombination with a Bayesian Prior on the
Distance between Trees. PLoS ONE 3(7): e2651.
Quantifying the disagreement – other measures

see also: Whidden et al. (2013) Supertrees based on the subtree prune-and-regraft distance. PeerJ PrePrints
1:e18v1
Quantifying the disagreement – other measures

Hdist similar to: Nye TMW, Liò P, Gilks WR (2006) A novel algorithm and web-based tool for comparing two
alternative phylogenetic trees. Bioinformatics 22: 117-119
Now we have estimates for these
assuming deepcoal:

1 deepcoal
assuming duplosses:

1 dup
3 losses
assuming HGT:

1 event
Stochastic error/nonparametric
Now we have estimates for these
assuming deepcoal:
Gene tree parsimony
1 deepcoal
assuming duplosses:

1 dup
3 losses
assuming HGT:

1 event
Stochastic error/nonparametric
Now we have estimates for these
assuming deepcoal:
Gene tree parsimony
1 deepcoal
assuming duplosses:

Gene tree parsimony
1 dup
3 losses
assuming HGT:

1 event
Stochastic error/nonparametric
Now we have estimates for these
assuming deepcoal:
Gene tree parsimony
1 deepcoal
assuming duplosses:

Gene tree parsimony
1 dup
3 losses
assuming HGT:

(approximate) dSPR
1 event
Stochastic error/nonparametric
Now we have estimates for these
assuming deepcoal:
Gene tree parsimony
1 deepcoal
assuming duplosses:

Gene tree parsimony
1 dup
3 losses
assuming HGT:

(approximate) dSPR
1 event
RF, Hdist

Stochastic error/nonparametric
Considering several measures of disagreement:

Thus we can incorporate e.g. duplications
and losses while accounting for HGT and
random errors

Easy to include other
distances in the future
Considering several measures of disagreement:

Thus we can incorporate e.g. duplications
and losses while accounting for HGT and
random errors

Easy to include other
distances in the future

Problem: the normalization constant
Ref.: Bryant D, Steel M (2009) Computing the Distribution of a Tree Metric. TCBB: 420 – 426

Solution: importance sampling estimate of Z(.)
E.g.: Rodrigue N, Kleinman CL, Philippe H, Lartillot N (2009) Computational Methods for Evaluating
Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons. Mol Biol Evol 26:
1663-1676.
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Distribution of gene trees: probabilistic model
G1
D1
Q1

.
.
.
Gn

Dn
Qn

S
Distribution of gene trees: probabilistic model
G1

S
λdup1

D1
Q1

.
.
.

λdupprior
Gn

Dn
Qn

λdupn
Distribution of gene trees: probabilistic model
G1

S
λdup1

D1
Q1

λloss1

.
.
.

λspr1

λdupprior
Gn

Dn
Qn

.
.
.

λdupn
λlossn .
.
λsprn .

λlossprior
λsprprior
Distribution of gene trees: probabilistic model
G1

S
λdup1

Importance
Sampling
So we can use complex,
state-of-the-art software
for phylogenetic
inference

λloss1

.
.
.

λspr1

.
.
.

λdupprior
Gn

λdupn
λlossn .
.
λsprn .

λlossprior
λsprprior
Distribution of gene trees: probabilistic model
G1

S
λdup1

Importance
Sampling
So we can use complex,
state-of-the-art software
for phylogenetic
inference

λloss1

.
.
.

λspr1

.
.
.

λdupprior
Gn

λdupn
λlossn .
.
λsprn .

Input

λlossprior
λsprprior
Distribution of gene trees: probabilistic model
G1

S
λdup1

Importance
Sampling
So we can use complex,
state-of-the-art software
for phylogenetic
inference

λloss1

.
.
.

λspr1

.
.
.

λdupprior
Gn

λdupn
λlossn .
.
λsprn .

Output

λlossprior
λsprprior
Distribution of gene trees: probabilistic model
G1

S
λdup1

Importance
Sampling
So we can use complex,
state-of-the-art software
for phylogenetic
inference
We should not rely on
single estimates of gene
phylogenies

λloss1

.
.
.

λspr1

.
.
.

λdupprior
Gn

λdupn
λlossn .
.
λsprn .

λlossprior
λsprprior

Output

E.g.: Boussau B, Szollosi GJ, Duret L, Gouy M, Tannier E, Daubin V. (2012) Genome-scale coestimation of
species and gene trees. Genome research 23: 323-330.
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Example: distances between gene families
●

567 single-copy gene trees for 23 species

Data from.: Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong
phylogenetic signals. Nature 497: 327–331

●

Analysis under a model where only RF, Hdist and dSPR are considered
●

Not interested in data set per se (unreliable)

●

Use it just as a didactical tool about how the model works
Example: distances between gene families
●

567 single-copy gene trees for 23 species

Data from.: Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong
phylogenetic signals. Nature 497: 327–331

●

Analysis under a model where only RF, Hdist and dSPR are considered
●

Not interested in data set per se (unreliable)

●

Use it just as a didactical tool about how the model works

RF

Hdist

SPR
Example: distances between gene families

RF

Hdist

SPR
Example: distances between gene families
Posterior samples

RF

Hdist

SPR
Example: distances between gene families
Posterior samples
best estimate

RF

Hdist

SPR
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Analysis of simulated data sets
●

Fully probabilistic simulation of gene trees by Diego Mallo and

David Posada
●

Birth and death of new loci, conditioned on a multispecies

coalescent, followed by sequence evolution

We use gene trees only, and simulate
tree inference error

Idea from: Rasmussen MD, Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using
a locus tree. Genome Res. 22: 755-765
Analysis of simulated data sets – results
Analysis of simulated data sets – results
Analysis of simulated data sets – results
Outline
Model of gene family evolution
Parsimonious estimation of disagreement
* reconciliation
* distance between trees
Hierarchical Bayesian model
Examples
* comparing many trees
* simulation
* TreeFam data set
Single copy genes from Drosophila (TreeFam)
●

4591 informative, single-copy gene families

●

(TreeFam database has 14250 informative gene families)
Single copy genes from Drosophila (TreeFam)
●

4591 informative, single-copy gene families

●

(TreeFam database has 14250 informative gene families)
Single copy genes from Drosophila (TreeFam)
●

4591 informative, single-copy gene families

Estimated species tree:

●

Root location uncertain
Single copy genes from Drosophila (TreeFam)
●

4591 informative, single-copy gene families

Estimated species tree:

●

Root location uncertain

●

Only one unrooted topology
Large gene families from Drosophila (TreeFam)
●

43 gene families with 102~295 tips
Large gene families from Drosophila (TreeFam)
●

43 gene families with 102~295 tips
best species tree:

~100%
To recap, our model can
●

Estimate species trees given arbitrary gene families ← can

contain paralogous, missing data, etc.

The larger, the better – specially for rooting the species tree

Account for uncertainty in gene tree and species tree
estimation ← some gene families may be more informative, or
●

maybe we don't have signal at all
Do not assume gene trees are known – embrace ignorance!
●

Allow for several sources of disagreement ← real data

seldomly can be explained by just one biological phenomenon
Different gene families may be product of distinct processes
●

Be fast ← improvement provided by slower, fully probabilistic

methods may be elusive, and they can benefit from our output
nonetheless

It's parallelized, and all distances can be calculated very fast.
Check out http://darwin.uvigo.es for announcements, code, slides...

Thank you!

Contenu connexe

Similaire à A probabilistic parsimonious model for species tree reconstruction

Data enriched linear regression
Data enriched linear regressionData enriched linear regression
Data enriched linear regressionSunny Kr
 
Holder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biasesHolder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biasesMark Holder
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020Eero Siljander
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsColleen Farrelly
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataNick Stauner
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Scientific applications of machine learning
Scientific applications of machine learningScientific applications of machine learning
Scientific applications of machine learningbutest
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
art%3A10.1007%2Fs00122-016-2798-8
art%3A10.1007%2Fs00122-016-2798-8art%3A10.1007%2Fs00122-016-2798-8
art%3A10.1007%2Fs00122-016-2798-8Peter Vos
 

Similaire à A probabilistic parsimonious model for species tree reconstruction (20)

Tools in phylogeny
Tools in phylogeny Tools in phylogeny
Tools in phylogeny
 
Data enriched linear regression
Data enriched linear regressionData enriched linear regression
Data enriched linear regression
 
Data in science
Data in science Data in science
Data in science
 
Holder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biasesHolder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biases
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale data
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Scientific applications of machine learning
Scientific applications of machine learningScientific applications of machine learning
Scientific applications of machine learning
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
art%3A10.1007%2Fs00122-016-2798-8
art%3A10.1007%2Fs00122-016-2798-8art%3A10.1007%2Fs00122-016-2798-8
art%3A10.1007%2Fs00122-016-2798-8
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

A probabilistic parsimonious model for species tree reconstruction

  • 1. A probabilistic parsimonious model for species tree reconstruction Leonardo de Oliveira Martins David Posada ● leomrtns@uvigo.es ● dposada@uvigo.es with invaluable help from Klaus Schliep and Diego Mallo
  • 2. What do we want ● To estimate species trees given arbitrary gene families ← can contain paralogous, missing data, etc. To account for uncertainty in gene tree and species tree estimation ← some gene families may be more informative, or ● maybe we don't have signal at all ● To allow for several sources of disagreement ← real data seldomly can be explained by just one biological phenomenon ● Fast computation ← improvement provided by slower, fully probabilistic methods may be elusive, and they can benefit from our output nonetheless
  • 3. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 4. Model for the evolution of gene families S G1 D1 G2 D2 Gn Dn . . .
  • 5. Model for the evolution of gene families S G1 D1 We just need to consider the simplest explanation for the P(G/S) Our assumption: difference between the gene and species trees we may use several such simple explanations ● distance between G and S
  • 6. Model for the evolution of gene families S G1 D1 We just need to consider the simplest explanation for the difference between the gene and species trees P(G/S) Our assumption: Rodrigo and Steel. 2008. SystBiol 57: 243 ML supertrees we may use several such simple explanations ● work with unrooted gene trees ● penalize gene trees very different from species tree ● distance between G and S
  • 7. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 8. Quantifying the disagreement assuming deepcoal: gene tree species tree reconciliation 1 deepcoal assuming duplosses: 1 dup 3 losses assuming HGT: 1 event
  • 9. Quantifying the disagreement assuming deepcoal: gene tree species tree reconciliation 1 deepcoal assuming duplosses: 1 dup 3 losses assuming HGT: 1 event Stochastic error/nonparametric
  • 10. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 11. Quantifying the disagreement – other measures mul-tree version: Chaudhary R, Burleigh JG, Fernández-Baca D (2013) Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance. arXiv:1210.2665
  • 12. Quantifying the disagreement – other measures de Oliveira Martins et al. (2008) Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees. PLoS ONE 3(7): e2651.
  • 13. Quantifying the disagreement – other measures see also: Whidden et al. (2013) Supertrees based on the subtree prune-and-regraft distance. PeerJ PrePrints 1:e18v1
  • 14. Quantifying the disagreement – other measures Hdist similar to: Nye TMW, Liò P, Gilks WR (2006) A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22: 117-119
  • 15. Now we have estimates for these assuming deepcoal: 1 deepcoal assuming duplosses: 1 dup 3 losses assuming HGT: 1 event Stochastic error/nonparametric
  • 16. Now we have estimates for these assuming deepcoal: Gene tree parsimony 1 deepcoal assuming duplosses: 1 dup 3 losses assuming HGT: 1 event Stochastic error/nonparametric
  • 17. Now we have estimates for these assuming deepcoal: Gene tree parsimony 1 deepcoal assuming duplosses: Gene tree parsimony 1 dup 3 losses assuming HGT: 1 event Stochastic error/nonparametric
  • 18. Now we have estimates for these assuming deepcoal: Gene tree parsimony 1 deepcoal assuming duplosses: Gene tree parsimony 1 dup 3 losses assuming HGT: (approximate) dSPR 1 event Stochastic error/nonparametric
  • 19. Now we have estimates for these assuming deepcoal: Gene tree parsimony 1 deepcoal assuming duplosses: Gene tree parsimony 1 dup 3 losses assuming HGT: (approximate) dSPR 1 event RF, Hdist Stochastic error/nonparametric
  • 20. Considering several measures of disagreement: Thus we can incorporate e.g. duplications and losses while accounting for HGT and random errors Easy to include other distances in the future
  • 21. Considering several measures of disagreement: Thus we can incorporate e.g. duplications and losses while accounting for HGT and random errors Easy to include other distances in the future Problem: the normalization constant Ref.: Bryant D, Steel M (2009) Computing the Distribution of a Tree Metric. TCBB: 420 – 426 Solution: importance sampling estimate of Z(.) E.g.: Rodrigue N, Kleinman CL, Philippe H, Lartillot N (2009) Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons. Mol Biol Evol 26: 1663-1676.
  • 22. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 23. Distribution of gene trees: probabilistic model G1 D1 Q1 . . . Gn Dn Qn S
  • 24. Distribution of gene trees: probabilistic model G1 S λdup1 D1 Q1 . . . λdupprior Gn Dn Qn λdupn
  • 25. Distribution of gene trees: probabilistic model G1 S λdup1 D1 Q1 λloss1 . . . λspr1 λdupprior Gn Dn Qn . . . λdupn λlossn . . λsprn . λlossprior λsprprior
  • 26. Distribution of gene trees: probabilistic model G1 S λdup1 Importance Sampling So we can use complex, state-of-the-art software for phylogenetic inference λloss1 . . . λspr1 . . . λdupprior Gn λdupn λlossn . . λsprn . λlossprior λsprprior
  • 27. Distribution of gene trees: probabilistic model G1 S λdup1 Importance Sampling So we can use complex, state-of-the-art software for phylogenetic inference λloss1 . . . λspr1 . . . λdupprior Gn λdupn λlossn . . λsprn . Input λlossprior λsprprior
  • 28. Distribution of gene trees: probabilistic model G1 S λdup1 Importance Sampling So we can use complex, state-of-the-art software for phylogenetic inference λloss1 . . . λspr1 . . . λdupprior Gn λdupn λlossn . . λsprn . Output λlossprior λsprprior
  • 29. Distribution of gene trees: probabilistic model G1 S λdup1 Importance Sampling So we can use complex, state-of-the-art software for phylogenetic inference We should not rely on single estimates of gene phylogenies λloss1 . . . λspr1 . . . λdupprior Gn λdupn λlossn . . λsprn . λlossprior λsprprior Output E.g.: Boussau B, Szollosi GJ, Duret L, Gouy M, Tannier E, Daubin V. (2012) Genome-scale coestimation of species and gene trees. Genome research 23: 323-330.
  • 30. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 31. Example: distances between gene families ● 567 single-copy gene trees for 23 species Data from.: Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497: 327–331 ● Analysis under a model where only RF, Hdist and dSPR are considered ● Not interested in data set per se (unreliable) ● Use it just as a didactical tool about how the model works
  • 32. Example: distances between gene families ● 567 single-copy gene trees for 23 species Data from.: Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497: 327–331 ● Analysis under a model where only RF, Hdist and dSPR are considered ● Not interested in data set per se (unreliable) ● Use it just as a didactical tool about how the model works RF Hdist SPR
  • 33. Example: distances between gene families RF Hdist SPR
  • 34. Example: distances between gene families Posterior samples RF Hdist SPR
  • 35. Example: distances between gene families Posterior samples best estimate RF Hdist SPR
  • 36. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 37. Analysis of simulated data sets ● Fully probabilistic simulation of gene trees by Diego Mallo and David Posada ● Birth and death of new loci, conditioned on a multispecies coalescent, followed by sequence evolution We use gene trees only, and simulate tree inference error Idea from: Rasmussen MD, Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22: 755-765
  • 38. Analysis of simulated data sets – results
  • 39. Analysis of simulated data sets – results
  • 40. Analysis of simulated data sets – results
  • 41. Outline Model of gene family evolution Parsimonious estimation of disagreement * reconciliation * distance between trees Hierarchical Bayesian model Examples * comparing many trees * simulation * TreeFam data set
  • 42. Single copy genes from Drosophila (TreeFam) ● 4591 informative, single-copy gene families ● (TreeFam database has 14250 informative gene families)
  • 43. Single copy genes from Drosophila (TreeFam) ● 4591 informative, single-copy gene families ● (TreeFam database has 14250 informative gene families)
  • 44. Single copy genes from Drosophila (TreeFam) ● 4591 informative, single-copy gene families Estimated species tree: ● Root location uncertain
  • 45. Single copy genes from Drosophila (TreeFam) ● 4591 informative, single-copy gene families Estimated species tree: ● Root location uncertain ● Only one unrooted topology
  • 46. Large gene families from Drosophila (TreeFam) ● 43 gene families with 102~295 tips
  • 47. Large gene families from Drosophila (TreeFam) ● 43 gene families with 102~295 tips best species tree: ~100%
  • 48. To recap, our model can ● Estimate species trees given arbitrary gene families ← can contain paralogous, missing data, etc. The larger, the better – specially for rooting the species tree Account for uncertainty in gene tree and species tree estimation ← some gene families may be more informative, or ● maybe we don't have signal at all Do not assume gene trees are known – embrace ignorance! ● Allow for several sources of disagreement ← real data seldomly can be explained by just one biological phenomenon Different gene families may be product of distinct processes ● Be fast ← improvement provided by slower, fully probabilistic methods may be elusive, and they can benefit from our output nonetheless It's parallelized, and all distances can be calculated very fast.
  • 49. Check out http://darwin.uvigo.es for announcements, code, slides... Thank you!