SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
FastMulRFS: Fast and accurate
species tree estimation under generic
gene duplication and loss models
Erin Molloy & Tandy Warnow, University of Illinois at Urbana Champaign
Systematics, Biogeography and Evolution (SBE) Meeting 2020
Symposium: Methods in Phylogenetic Inference
Funding: Ira & Debra Cohen Graduate Fellowship in CS to EKM, U.S. NSF Grant #1535977 to TW
Motivation
In most studies, species trees are estimated from single-copy genes, so multi-copy genes are excluded
[e.g. Wickett et al. (2014) estimated species tree from ~400 single-copy genes, excluding ~9,000 multi-
copy genes]
Goal
Leverage phylogenetic information from multi-copy genes for species tree estimation
Species Tree Estimation Pipeline
1.Estimate phylogeny for each gene family — the result is a multi-labeled gene tree or MUL-tree
2.Run method that computes species tree from MUL-trees — for example:
• DupTree [Wehe et al., 2008]
• ASTRAL-multi [Rabiee et al., 2019; Legried et al., 2020]
• MulRF [Chaudhary et al., 2013]
MOTIVATION & BACKGROUND
MulRF: Robinson-Foulds (RF)
Supertree Problem for MUL-trees
Find species tree that minimizes
total RF distance between extended
species tree and MUL-trees
FastMulRFS
1. Preprocesses each MUL-tree,
producing a tree that is singly-
labeled and potentially unresolved
2. Applies FastRFS [Vachaspati &
Warnow, 2017], which solves RF
supertree problem exactly within
constrained search space defined
by input, to preprocessed trees
NEW METHOD — FASTMULRFS
(b) Extended Species Tree
A1 A2 B1 B2 C1 C2 D1 D2
E
(a) Species Tree
D EA B C
(d) Preprocessed MUL-tree
D EA B C
A1 B1 C1 C2 B2 A2 D1 D2
E
(c) MUL-tree
Running Time
• DupTree — 2.7 hrs
• ASTRAL-multi — >48 hrs
• MulRF — ran out of mem (>256 GB)
• FastMulRFS — 5.4 hrs
Species Tree Comparison
• FastMulRFS found 2 equally optimal
species trees in constrained search
space — so their strict consensus has
76 internal branches
• 69 of these internal branches agree
with species tree estimated by
running ASTRAL on ~400 single-copy
genes [Wickett et al., 2014]
No GTEE
(N=10)
52% Mean GTEE
(N=10)
0.0
0.1
0.2
0.3
SpeciesTreeError
Model: High GDL + low/mod ILS
No GTEE
(N=10)
52% Mean GTEE
(N=10)
0
50
100
150
RunningTime(m)
DupTree
ASTRAL-multi
MulRF
FastMulRFS
No GTEE
(N=10)
52% Mean GTEE
(N=10)
0.0
0.1
0.2
0.3
SpeciesTreeError
Model: High GDL + low/mod ILS
No GTEE
(N=10)
52% Mean GTEE
(N=10)
0
50
100
150
RunningTime(m)
DupTree
ASTRAL-multi
MulRF
FastMulRFS
NOTES: Boxes left to right = names top to bottom; GDL = Gene Duplication & Loss;
ILS = Incomplete Lineage Sorting; GTEE = Gene Tree Estimation Error
L: Simulated Datasets (100 taxa & 500 genes) | R: OneKP Dataset (83 taxa & 9,237 genes)
Get FastMulRFS:
https://github.com/ekmolloy/fastmulrfs
Learn more:
https://doi.org/10.1093/bioinformatics/btaa444
In our study, FastMulRFS
• was as accurate as DupTree and ASTRAL-multi when GTEE was low
• was more accurate than DupTree and ASTRAL-multi when GTEE was high
• was faster than MulRF (different heuristic for same optimization problem) and had similar accuracy
• enabled analysis of OneKP Plant dataset, which was not possible with MulRF
Future Work
• Examine 7 branches in FastMulRFS tree that disagree with OneKP analysis by Wickett et al. (2014)
• Evaluate FastMulRFS on more model conditions
• Compare FastMulRFS to other methods (PHYLDOG [Boussau et al., 2012], A-Pro [Zhang et al., 2020])
CONCLUSIONS & FUTURE WORK

Contenu connexe

Tendances

PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological data
Yiteng Dang
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
Naima Tahsin
 

Tendances (10)

A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral Images
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
PHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGAPHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGA
 
Betsi
BetsiBetsi
Betsi
 
PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological data
 
Towards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD DatasetsTowards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD Datasets
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
OSPREY 3.0: Open-Source Protein Redesign for You
OSPREY 3.0: Open-Source Protein Redesign for YouOSPREY 3.0: Open-Source Protein Redesign for You
OSPREY 3.0: Open-Source Protein Redesign for You
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
 
NER 2013 Poster
NER 2013 PosterNER 2013 Poster
NER 2013 Poster
 

Similaire à FastMulRFS

Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
jaumebp
 
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
ijassn
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Spark Summit
 
Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...
IJECEIAES
 

Similaire à FastMulRFS (20)

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Phylogenetic Tree evolution
Phylogenetic Tree evolutionPhylogenetic Tree evolution
Phylogenetic Tree evolution
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Perth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_quPerth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_qu
 
General ausplots school
General ausplots schoolGeneral ausplots school
General ausplots school
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
OPTIMAL CLUSTERING AND ROUTING FOR WIRELESS SENSOR NETWORK BASED ON CUCKOO SE...
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016
 
Oman camel conference
Oman camel conferenceOman camel conference
Oman camel conference
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
Genetic Algorithm for optimization on IRIS Dataset presentation ppt
Genetic Algorithm for optimization on IRIS Dataset presentation pptGenetic Algorithm for optimization on IRIS Dataset presentation ppt
Genetic Algorithm for optimization on IRIS Dataset presentation ppt
 
Bioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxBioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptx
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 

Dernier

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Dernier (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 

FastMulRFS

  • 1. FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models Erin Molloy & Tandy Warnow, University of Illinois at Urbana Champaign Systematics, Biogeography and Evolution (SBE) Meeting 2020 Symposium: Methods in Phylogenetic Inference Funding: Ira & Debra Cohen Graduate Fellowship in CS to EKM, U.S. NSF Grant #1535977 to TW
  • 2. Motivation In most studies, species trees are estimated from single-copy genes, so multi-copy genes are excluded [e.g. Wickett et al. (2014) estimated species tree from ~400 single-copy genes, excluding ~9,000 multi- copy genes] Goal Leverage phylogenetic information from multi-copy genes for species tree estimation Species Tree Estimation Pipeline 1.Estimate phylogeny for each gene family — the result is a multi-labeled gene tree or MUL-tree 2.Run method that computes species tree from MUL-trees — for example: • DupTree [Wehe et al., 2008] • ASTRAL-multi [Rabiee et al., 2019; Legried et al., 2020] • MulRF [Chaudhary et al., 2013] MOTIVATION & BACKGROUND
  • 3. MulRF: Robinson-Foulds (RF) Supertree Problem for MUL-trees Find species tree that minimizes total RF distance between extended species tree and MUL-trees FastMulRFS 1. Preprocesses each MUL-tree, producing a tree that is singly- labeled and potentially unresolved 2. Applies FastRFS [Vachaspati & Warnow, 2017], which solves RF supertree problem exactly within constrained search space defined by input, to preprocessed trees NEW METHOD — FASTMULRFS (b) Extended Species Tree A1 A2 B1 B2 C1 C2 D1 D2 E (a) Species Tree D EA B C (d) Preprocessed MUL-tree D EA B C A1 B1 C1 C2 B2 A2 D1 D2 E (c) MUL-tree
  • 4. Running Time • DupTree — 2.7 hrs • ASTRAL-multi — >48 hrs • MulRF — ran out of mem (>256 GB) • FastMulRFS — 5.4 hrs Species Tree Comparison • FastMulRFS found 2 equally optimal species trees in constrained search space — so their strict consensus has 76 internal branches • 69 of these internal branches agree with species tree estimated by running ASTRAL on ~400 single-copy genes [Wickett et al., 2014] No GTEE (N=10) 52% Mean GTEE (N=10) 0.0 0.1 0.2 0.3 SpeciesTreeError Model: High GDL + low/mod ILS No GTEE (N=10) 52% Mean GTEE (N=10) 0 50 100 150 RunningTime(m) DupTree ASTRAL-multi MulRF FastMulRFS No GTEE (N=10) 52% Mean GTEE (N=10) 0.0 0.1 0.2 0.3 SpeciesTreeError Model: High GDL + low/mod ILS No GTEE (N=10) 52% Mean GTEE (N=10) 0 50 100 150 RunningTime(m) DupTree ASTRAL-multi MulRF FastMulRFS NOTES: Boxes left to right = names top to bottom; GDL = Gene Duplication & Loss; ILS = Incomplete Lineage Sorting; GTEE = Gene Tree Estimation Error L: Simulated Datasets (100 taxa & 500 genes) | R: OneKP Dataset (83 taxa & 9,237 genes)
  • 5. Get FastMulRFS: https://github.com/ekmolloy/fastmulrfs Learn more: https://doi.org/10.1093/bioinformatics/btaa444 In our study, FastMulRFS • was as accurate as DupTree and ASTRAL-multi when GTEE was low • was more accurate than DupTree and ASTRAL-multi when GTEE was high • was faster than MulRF (different heuristic for same optimization problem) and had similar accuracy • enabled analysis of OneKP Plant dataset, which was not possible with MulRF Future Work • Examine 7 branches in FastMulRFS tree that disagree with OneKP analysis by Wickett et al. (2014) • Evaluate FastMulRFS on more model conditions • Compare FastMulRFS to other methods (PHYLDOG [Boussau et al., 2012], A-Pro [Zhang et al., 2020]) CONCLUSIONS & FUTURE WORK