The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research
Given at the National Museum of National History in 2011
An overview of iPlant and iPToL
1. The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research Naim Matasci 520 303 8623 The iPlant Collaborative National Museum of Natural History Jul 14, 2011
14. Cloud Computing AVAILABLE NOW! Virtual Machines Up to 4 cores, 32 GB RAM, 100 GB dedicated disk Run any x86-compatible OS (even Windows) Persistent or on-demand Log in via SSH or secure VNC Use Cases Internet-enabled Servers Database management appliances Virtual desktops …The sky is the limit! http://www.iplantcollaborative.org/atmosphere-preview
16. iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
17. Big Trees To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
18. Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (AlexandrosStamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!
19. Tree Visualization To develop an application for viewing, analyzing and exploring large phylogenetic trees.
20. Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
21. iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
22. 1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
23. 1KP dozens of species completed genomes unexplored territory N(genes) dozens of genes PCR in 104 species N(species)
24. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomicsof 1000 species across plant taxa
35. a)Centauriumcurvistamineum (Wittr.) Abrams (1951) b)Centaurium minimum (Howell) Piper (1915) c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906) d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927) f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891) g)ErythraeacurvistamineaWittr. (1886) h)Erythraea minima Howell (1901) i)ErythraeamuhlenbergiiGriseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
36. How to figure that out? …or ask around at My-Plant.org
41. Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names
42.
43.
44.
45. Availability Source code (3-clause BSD) http://github.com/iPlantCollaborativeOpenSource/TNRS Web + API instructions http://tnrs.iplantcollaborative.org
46.
47.
48.
49.
50. Trait Evolution To develop an infrastructure for downstream analysis of large trees.
51. Trait Evolution Toolkit to study the evolution of traits of interest on very large phylogenies Diversification Biogeographic patterns Adaptation Co-evolution …
52. Current analyses (Proof of concept) Phylogenetically Independent Contrasts(Felsenstein 1985) Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)
53. Community Integrated (2 ½ Days Workshop) EUtils Lopper RAxML Ninja Phyml Muscle PHYLIP VCF to GFF script LRmaqqtl FASTX quality stats FASTX quality boxplot FASTX nucleotide distribution Cuffcompare ERMINEJ progressiveMauve iPlantBorda (mlpy) iPlantCanberra (mlpy) vbay MECPM OUCH Picante Ontologize BOWTIE BWA TopHat SHRiMP Cuffdiff GNU Core Text utilities GeneMania SRA import PARS PL DTT BBC biclustering
54. My-Plant.org To easily share information and research, collaborate, and stay on top of the latest news in the field.
Parsing: GNI Parser Dmitry MozzherinMatching: Taxamatch by Tony Rees
Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interestAdaptation in response to past climate changeCo-evolution of pollinators and flowers or hosts and parasites
Contrast: Test for correlation of continuous traits, taking into account phylogenyDACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxaCACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa