SlideShare a Scribd company logo
1 of 46
The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research Naim Matasci 520 303 8623 The iPlant Collaborative National Museum of Natural History Jul 14, 2011
What is iPlant?
Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
4
Physical Infrastructure Computation ,[object Object]
20K cores cluster
1 TB RAM
512 GPUsStorage ,[object Object]
20  PB archive
High speed parallel data transfer ,[object Object]
Cloud Storage AVAILABLE NOW! ,[object Object]
Multiple points of entry: web interface, mounted FS, API
Free and securehttp://www.iplantcollaborative.org/about/policies/data-set-hosting
Cloud Computing AVAILABLE NOW! Virtual Machines Up to 4 cores, 32 GB RAM, 100 GB dedicated disk Run any x86-compatible OS (even Windows) Persistent or on-demand Log in via SSH or secure VNC Use Cases Internet-enabled Servers Database management appliances Virtual desktops …The sky is the limit! http://www.iplantcollaborative.org/atmosphere-preview
Consumer Applications 9 iPlant's CI
iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
Big Trees To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species  Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set  RAxML-Light (AlexandrosStamatakis) Large Scale Maximum Likelihood implementation  55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!
Tree Visualization To develop an application for viewing, analyzing and exploring large phylogenetic trees.
Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
1KP dozens of species completed genomes unexplored territory N(genes) dozens of genes PCR in 104 species N(species)
Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomicsof 1000 species across plant taxa
Tree Reconciliation To reconcile the evolutionary history of genes and species.
Gene family data courtesy John Bowers Tree Reconciliation
Taxonomic Name Resolution Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.
Taxonomic uncertainty Non-existent names ,[object Object]
Contamination
Annotations
Morphospecies
Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions)Synonymy ,[object Object]
Taxonomic synonyms / conceptsMisidentifications, incomplete identifications
a)Centauriumcurvistamineum (Wittr.) Abrams (1951) b)Centaurium minimum (Howell) Piper (1915) c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906) d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927) f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891) g)ErythraeacurvistamineaWittr. (1886) h)Erythraea minima Howell (1901) i)ErythraeamuhlenbergiiGriseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
How to figure that out? …or ask around at My-Plant.org
Makemake at de.wikipedia
Non-existent names: Herbarium specimens *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
Hans Hillewaert
Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names
Availability Source code (3-clause BSD) http://github.com/iPlantCollaborativeOpenSource/TNRS Web + API instructions http://tnrs.iplantcollaborative.org

More Related Content

Viewers also liked

Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliationNaim Matasci
 
Creatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopCreatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopSimon Jack
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsNaim Matasci
 
Sandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandy Gottlieb
 
Animation Trivia
Animation TriviaAnimation Trivia
Animation TriviaCel Mallari
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses WorkflowNaim Matasci
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of LifeNaim Matasci
 
Sandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandy Gottlieb
 
Robots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsRobots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsCel Mallari
 
Intro to animation
Intro to animationIntro to animation
Intro to animationCel Mallari
 

Viewers also liked (10)

Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliation
 
Creatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopCreatures of Habit Creativity Workshop
Creatures of Habit Creativity Workshop
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 
Sandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal Issues
 
Animation Trivia
Animation TriviaAnimation Trivia
Animation Trivia
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses Workflow
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
Sandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.ppt
 
Robots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsRobots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dps
 
Intro to animation
Intro to animationIntro to animation
Intro to animation
 

Similar to The iPlant Tree of Life Project and Toolkit

The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopNaim Matasci
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
nternational Biodiversity Projects and Natural History Museums: Current stat...
nternational Biodiversity Projects and Natural History Museums:  Current stat...nternational Biodiversity Projects and Natural History Museums:  Current stat...
nternational Biodiversity Projects and Natural History Museums: Current stat...Klaus Riede
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeGopal Singh
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...ICZN
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedLarry Smarr
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyAnne Thessen
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2Ellinor Michel
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Plant phenotyping platforms
Plant phenotyping platformsPlant phenotyping platforms
Plant phenotyping platformsMichal Slota
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance CollaborationLarry Smarr
 

Similar to The iPlant Tree of Life Project and Toolkit (20)

The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
nternational Biodiversity Projects and Natural History Museums: Current stat...
nternational Biodiversity Projects and Natural History Museums:  Current stat...nternational Biodiversity Projects and Natural History Museums:  Current stat...
nternational Biodiversity Projects and Natural History Museums: Current stat...
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genome
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light Speed
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Plant phenotyping platforms
Plant phenotyping platformsPlant phenotyping platforms
Plant phenotyping platforms
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

The iPlant Tree of Life Project and Toolkit

  • 1. The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research Naim Matasci 520 303 8623 The iPlant Collaborative National Museum of Natural History Jul 14, 2011
  • 3. Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
  • 4. 4
  • 5.
  • 8.
  • 9. 20 PB archive
  • 10.
  • 11.
  • 12. Multiple points of entry: web interface, mounted FS, API
  • 14. Cloud Computing AVAILABLE NOW! Virtual Machines Up to 4 cores, 32 GB RAM, 100 GB dedicated disk Run any x86-compatible OS (even Windows) Persistent or on-demand Log in via SSH or secure VNC Use Cases Internet-enabled Servers Database management appliances Virtual desktops …The sky is the limit! http://www.iplantcollaborative.org/atmosphere-preview
  • 15. Consumer Applications 9 iPlant's CI
  • 16. iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
  • 17. Big Trees To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
  • 18. Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (AlexandrosStamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!
  • 19. Tree Visualization To develop an application for viewing, analyzing and exploring large phylogenetic trees.
  • 20. Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
  • 21. iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
  • 22. 1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
  • 23. 1KP dozens of species completed genomes unexplored territory N(genes) dozens of genes PCR in 104 species N(species)
  • 24. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomicsof 1000 species across plant taxa
  • 25. Tree Reconciliation To reconcile the evolutionary history of genes and species.
  • 26. Gene family data courtesy John Bowers Tree Reconciliation
  • 27.
  • 28. Taxonomic Name Resolution Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.
  • 29.
  • 33.
  • 34. Taxonomic synonyms / conceptsMisidentifications, incomplete identifications
  • 35. a)Centauriumcurvistamineum (Wittr.) Abrams (1951) b)Centaurium minimum (Howell) Piper (1915) c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906) d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927) f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891) g)ErythraeacurvistamineaWittr. (1886) h)Erythraea minima Howell (1901) i)ErythraeamuhlenbergiiGriseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
  • 36. How to figure that out? …or ask around at My-Plant.org
  • 38. Non-existent names: Herbarium specimens *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
  • 40.
  • 41. Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names
  • 42.
  • 43.
  • 44.
  • 45. Availability Source code (3-clause BSD) http://github.com/iPlantCollaborativeOpenSource/TNRS Web + API instructions http://tnrs.iplantcollaborative.org
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. Trait Evolution To develop an infrastructure for downstream analysis of large trees.
  • 51. Trait Evolution Toolkit to study the evolution of traits of interest on very large phylogenies Diversification Biogeographic patterns Adaptation Co-evolution …
  • 52. Current analyses (Proof of concept) Phylogenetically Independent Contrasts(Felsenstein 1985) Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)
  • 53. Community Integrated (2 ½ Days Workshop) EUtils Lopper RAxML Ninja Phyml Muscle PHYLIP VCF to GFF script LRmaqqtl FASTX quality stats FASTX quality boxplot FASTX nucleotide distribution Cuffcompare ERMINEJ progressiveMauve iPlantBorda (mlpy) iPlantCanberra (mlpy) vbay MECPM OUCH Picante Ontologize BOWTIE BWA TopHat SHRiMP Cuffdiff GNU Core Text utilities GeneMania SRA import PARS PL DTT BBC biclustering
  • 54. My-Plant.org To easily share information and research, collaborate, and stay on top of the latest news in the field.
  • 55. Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED! http://my-plant.org/
  • 56.

Editor's Notes

  1. Bringing a culture of computing to the Plant Sciences.
  2. World class resources:Rocinante: 128 cores; 16 nodes; 64 GB node; 300 TB storageCorral: 1.7 PB storage + 20 PB archiveLonestar4: 22,656 Intel Westmere cores; 40 GB QDR-IB; 1 PB storage; 44.3 TB RAM. Plus 1 TB RAM, GPU, and Cloud upgrades.Longhorn: 2048 Intel Nehalem cores. 512 NVIDIA Quadro FX 5800 GPU. 14.5 TB RAM. 1 PB storage.Ranger: 62976 AMD Opteron cores; 123 TB RAM; 32 GPUs. 1.7 PB storage.
  3. Large: >2 Gigs, where browsers fail
  4. Highest level of abstraction
  5. Distance matrix calculation compared to FASTREE
  6. BIEN: biological information and ecology network
  7. Parsing: GNI Parser Dmitry MozzherinMatching: Taxamatch by Tony Rees
  8. Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interestAdaptation in response to past climate changeCo-evolution of pollinators and flowers or hosts and parasites
  9. Contrast: Test for correlation of continuous traits, taking into account phylogenyDACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxaCACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa