SlideShare une entreprise Scribd logo
1  sur  134
Télécharger pour lire hors ligne
Phylogeny-Driven Approaches to
 Genomics and Metagenomics
              June 23, 2012
    Canadian Society for Microbiology

            Jonathan A. Eisen
      University of California, Davis
           @phylogenomics
Acknowledgements

• $$$
  •   DOE
  •   NSF
  •   GBMF
  •   Sloan
  •   DARPA
  •   DSMZ
  •   DHS
• People, places
  • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides
  • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell
    Neches, Jenna Morgan-Lang
  • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak,
    Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward,
    Hans-Peter Klenk
Phylogeny: What is it?
Phylogeny: What is it?

• Phylogeny is a description of
  the evolutionary history of
  relationships among organisms
  (or their parts).
• This is frequently portrayed in
  a diagram called a phylogenetic
  tree.
• Phylogenies can be more
  complex than a bifurcating tree
  (e.g., lateral gene transfer,
  recombination, hybridization)
Whatever the History:
               Trying to Incorporate it is Critical




from Lake et al. doi: 10.1098/rstb.2009.0035
Phylogeny



            • Applies to
             • Species
             • Genes
             • Genomes
Phylogeny: What is it good for?
Phylogeny: What is it good for?




      Uses of Phylogeny
in Genomics and Metagenomics
Uses of Phylogeny
in Genomics and Metagenomics

         Example 1:

        Phylotyping
rRNA Phylotyping
                DNA
                extraction                              PCR

                                                    Makes lots of                  Sequence
                       PCR                          copies of the                 rRNA genes
                                                    rRNA genes
                                                     in sample


                                                                                   rRNA1
                                                                        5’...ACACACATAGGTGGAGCTA
                                                                              GCGATCGATCGA... 3’
    Phylogenetic tree            Sequence alignment = Data matrix
                                                                                   rRNA2
      rRNA1    rRNA2
                                      rRNA1     A   C   A   C   A   C   5’..TACAGTATAGGTGGAGCTAG
                                                                               CGACGATCGA... 3’
                       rRNA4
rRNA3                                 rRNA2     T   A   C   A   G   T
                                                                                   rRNA3
                                      rRNA3     C   A   C   T   G   T   5’...ACGGCAAAATAGGTGGATT
  E. coli               Humans        rRNA4     C   A   C   A   G   T         CTAGCGATATAGA... 3’

               Yeast                  E. coli   A   G   A   C   A   G               rRNA4
                                                                        5’...ACGGCCCGATAGGTGGATT
                                     Humans     T   A   T   A   G   T         CTAGCGCCATAGA... 3’
                                      Yeast     T   A   C   A   G   T
rRNA Phylotyping
          • Collect DNA from
            environment
          • PCR amplify rRNA
            genes using broad
            (so-called universal)
            primers
          • Sequence
          • Align to others
          • Infer evolutionary tree
          • Unknowns “identified”
            by placement on tree
Era IV: Genomes in Environment




                 shotgun
                      sequence




Metagenomics
rRNA Phylotyping in Sargasso




Venter et al., Science
304: 66. 2004
RecA Phylotyping in Sargasso Data




Venter et al., Science
304: 66. 2004
Weighted % of Clones




                                                                                                           0
                                                                                                               0.125
                                                                                                                               0.250
                                                                                                                                              0.375
                                                                                                                                                              0.500
                                                                 Al
                                                                   ph
                                                                          ap
                                                                            ro
                                                                               t  eo
                                                                  Be                       ba
                                                                    ta                       ct
                                                                                                   er
                                                                          pr                          ia
                                                                            ot
                                                                                  eo
                                                             G                       b
                                                              am                            ac
                                                                      m                        t  er
                                                                       ap                            ia
                                                                               ro
                                                             Ep                   teo
                                                                  si                       ba
                                                                     lo                      ct
                                                                          np                       er
                                                                               ro                     ia
                                                                            eo    t
                                                                 De             ba
                                                                   lta             ct
                                                                      pr              er
                                                                         ot              ia
                                                                            eo
                                                                                ba
                                                                       C
                                                                                                                                                      EFG




                                                                                   ct
                                                                         ya           er
                                                                            no           ia
                                                                                ba
                                                                                   ct
                                                                                      er
                                                                            Fi           ia
                                                                               rm
                                                                                  ic
                                                                                                                                                      EFTu




                                                                                     ut
                                                                                        es
                                                                       Ac
                                                                          tin
                                                                              ob
                                                                                 ac
                                                                                     te
                                                                                        ria
                                                                               C
                                                                                 hl
                                                                                                                                                      HSP70




                                                                                    or
                                                                                      ob
                                                                                           i
                                                                                                C




                                  Major Phylogenetic Group
                                                                                                    FB
                                                                                                                                                                      Sargasso Phylotypes




                                                                                  C
                                                                                                                                                      RecA




                                                                                       hl
                                                                                            or
                                                                                              of
                                                                                                   le
                                                                                                     xi
                                                                               Sp
                                                                                      iro
                                                                                           ch
                                                                                                ae
                                                                                                  te
                                                                                                      s
                                                                                                                                                      RpoB




                                                                               Fu
                                                                                      so
                                                                                        ba
                                                             De                              ct
                                                               in                                er
                                                                                                    ia
                                                                      oc
                                                                        oc
                                                                          cu
                                                                                  s-
                                                                                                                                                      rRNA




                                                                                       Th
                                                                           Eu      er
                                                                             ry       m
                                                                               ar       u
                                                                                 ch s
                                                                                    ae
                                                                          C            ot
                                                                                          a
                                                                            re
                                                                              na
                                                                                rc
                                                                                   ha
                                                                                      eo
                                                                                         ta
Venter et al., Science 304: 66-74. 2004
Side benefit: binning
Metagenomics
Binning challenge
Binning challenge




Best binning method: reference genomes
Binning challenge




Best binning method: reference genomes
Binning challenge




No reference genome? What do you do?
Binning challenge




No reference genome? What do you do?

Composition, Assembly, others
Binning challenge




No reference genome? What do you do?

Phylogeny
Sulcia makes amino acids




Baumannia makes vitamins and cofactors




                       Wu et al. 2006 PLoS Biology 4: e188.
CFB Phyla
Side benefit II: PG Ecology
rRNA survey


              • Sequence
                rRNAs
              • Cluster
rRNA survey


OTU1                  • Sequence
OTU2                    rRNAs
OTU3                  • Cluster
OTU4
                      • Identify
OTU5
OTU6                    “OTUs”
OTU7
OTU8
OTU9
OTU10
OTUs on Tree


         OTU1
         OTU5
  OTU4

          OTU6
  OTU2
  OTU3
   OTU7
     OTU9
                 OTU8
     OTU10
OTUs on Tree


      OTU1       • Clades
      OTU5
                 • Rates of
  OTU4
                   change
        OTU6     • LGT
  OTU2
  OTU3           • Convergence
   OTU7          • Character
     OTU9
             OTU8 history
     OTU10
Unifrac



                                          nuscript
typically used as a qualitative measure because duplicate se-                Weighted UniFrac. Weighted UniFrac is a new variant of the original un-
quences are usually removed from the tree. However, the P                  weighted UniFrac measure that weights the branches of a phylogenetic tree
test may be used in a semiquantitative manner if all clones,               based on the abundance of information (Fig. 1B). Weighted UniFrac is thus a
                                                                           quantitative measure of ␤ diversity that can detect changes in how many se-
even those with identical or near-identical sequences, are in-
                                                                           quences from each lineage are present, as well as detect changes in which taxa
cluded in the tree (13).                                                   are present. This ability is important because the relative abundance of different
   Here we describe a quantitative version of UniFrac that we              kinds of bacteria can be critical for describing community changes. In contrast,
call “weighted UniFrac.” We show that weighted UniFrac be-                 the original, unweighted UniFrac (Fig. 1A) is a qualitative ␤ diversity measure
haves similarly to the FST test in situations where both are               because duplicate sequences contribute no additional branch length to the tree
                                                                           (by definition, the branch length that separates a pair of duplicate sequences is
                                                                           zero, because no substitutions separate them).
                                                                             The first step in applying weighted UniFrac is to calculate the raw weighted
                                                                           UniFrac value (u), according to the first equation:




                                          NIH-PA Author Manuscript
                                                                                                            ͸
                                                                                                             n

                                                                                                       uϭ         bi ϫ   ͯA Ϫ B ͯ
                                                                                                                          Ai

                                                                                                                           T
                                                                                                                              B
                                                                                                                               T
                                                                                                                                i


                                                                                                              i


                                                                           Here, n is the total number of branches in the tree, bi is the length of branch i,
                                                                           Ai and Bi are the numbers of sequences that descend from branch i in commu-
                                                                           nities A and B, respectively, and AT and BT are the total numbers of sequences
                                                                           in communities A and B, respectively. In order to control for unequal sampling
                                                                           effort, Ai and Bi are divided by AT and BT.
                                                                              If the phylogenetic tree is not ultrametric (i.e., if different sequences in the
                                                                           sample have evolved at different rates), clustering with weighted UniFrac will
                                                                           place more emphasis on communities that contain quickly evolving taxa. Since
                                                                           these taxa are assigned more branch length, a comparison of the communities
   FIG. 1. Calculation of the unweighted and the weighted UniFrac          that contain them will tend to produce higher values of u. In some situations, it
measures. Squares and circles represent sequences from two different       may be desirable to normalize u so that it has a value of 0 for identical commu-
environments. (a) In unweighted UniFrac, the distance between the          nities and 1 for nonoverlapping communities. This is accomplished by dividing u
circle and square communities is calculated as the fraction of the         by a scaling factor (D), which is the average distance of each sequence from the
branch length that has descendants from either the square or the circle    root, as shown in the equation as follows:
environment (black) but not both (gray). (b) In weighted UniFrac,

                                                                                                            ͸ ͩ
branch lengths are weighted by the relative abundance of sequences in
                                                                                                                                    ͪ
                                                                                                             n
the square and circle communities; square sequences are weighted                                                          Aj   Bj
                                                                                                       Dϭ         dj ϫ       ϩ
twice as much as circle sequences because there are twice as many total                                                   AT BT
circle sequences in the data set. The width of branches is proportional        Figure 1.                     j
                                          NIH-PA Author Manuscript




to the degree to which each branch is weighted in the calculations, and    Here, dj is the distance of sequence j from the root, (PD) and PD Gain (G) for the grey community. The
                                                                                Estimates of Phylogenetic Diversity Aj and Bj are the numbers
gray branches have no weight. Branches 1 and 2 have heavy weights          of times the sequences were observed in communitieswhite, and grey communities. (A) PD is the sum of the
                                                                                boxes represent taxa from the black, A and B, respectively, and
since the descendants are biased toward the square and circles, respec-    AT and BT are the total numbers of sequences from communities A and B,
tively. Branch 3 contributes no value since it has an equal contribution        branches leading to the grey taxa. (B) G is the sum of the branches leading only to the grey
                                                                           respectively.
from circle and square sequences after normalization.                        Clustering with normalized u values treatsshowing the increase inof
                                                                                taxa. (C) PD rarefaction curves each sample equally instead branch length with sampling effort
                                                                               for the intestinal and stool bacteria from three healthy individuals. Aligned16S rRNA
                                                                               sequences from the three individuals were available with the Supplementary Materials in
                                                                               (Eckburg, et al., 2005). The Arb parsimony insertion tool was used to add the sequences to a
                                                                               tree containing over 9,000 sequences (Hugenholtz, 2002) that is available for download at
                                                                               the rRNA Database Project II website (Maidak, et al., 2001). The curves represent the
                                                                               average values for 50 replicate trials.




                                                                                   FEMS Microbiol Rev. Author manuscript; available in PMC 2009 July 1.
Caveat: Not Everything in Groups
RecA, RpoB in GOS

                                     GOS 1

                                     GOS 2




                                     GOS 3

                                     GOS 4




                                     GOS 5
Wu et al PLoS One 2011
Uses of Phylogeny
in Genomics and Metagenomics

         Example 2:

   Functional Diversity and
    Functional Predictions
Predicting Function

• Key step in genome projects
• More accurate predictions help guide
  experimental and computational
  analyses
• Many diverse approaches
• All improved both by “phylogenomic”
  type analyses that integrate
  evolutionary reconstructions and
  understanding of how new functions
  evolve
Predicting Function

• Identification of motifs
  – Short regions of sequence similarity that are indicative of
    general activity
  – e.g., ATP binding
• Homology/similarity based methods
  – Gene sequence is searched against a databases of other
    sequences
  – If significant similar genes are found, their functional
    information is used
• Problem
  – Genes frequently have similarity to hundreds of motifs and
    multiple genes, not all with the same function
From Eisen et al.
1997 Nature
Medicine 3:
1076-1078.
Blast Search of H. pylori “MutS”




• Blast search pulls up Syn. sp MutS#2 with much higher p
  value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
  repair
• Assumes functional constancy
                   Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
MutL??




Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
Overlaying Functions onto Tree
                                                                     MutS2
                                                 Aquae
                            MSH5                      StrpyBacsuSynsp
                                                                  Deira     Helpy
                            Yeast
                      Human                                            Borbu
                      Celeg                                                         Metth


   MSH6                                                                         mSaco

              Yeast
            Human
            Mouse
             Arath
                                                                                    Yeast   MSH4
                                                                                     Celeg
                                                                                    Human
            Arath
          Human
MSH3     Mouse
                                                                                   Fly
       Spombe
          Yeast                                                                  Xenla
                                                                                  Rat
                                                                                  Mouse
          Yeast                                                                  Human
MSH1   Spombe                                                                     Yeast      MSH2
                                                                                Neucr
                                                                               Arath


                         Aquae                                      Trepa
                         Chltr
                           Deira Theaq
                                                              Bacsu Borbu
                                    Thema
                                                      Synsp Strpy
                                         Ecoli                                                Based on Eisen,
                                                 Neigo
                                                                                              1998 Nucl Acids Res
                                           MutS1                                              26: 4291-4300.
PHYLOGENENETIC PREDICTION OF GENE FUNCTION



            EXAMPLE A                                   METHOD                           EXAMPLE B

                     2A                         CHOOSE GENE(S) OF INTEREST                        5


                  3A                                                                          1 3 4
                       2B                                                                 2
                                                   IDENTIFY HOMOLOGS                             5
             1A 2A 1B 3B                                                                       6



                                                    ALIGN SEQUENCES

    1A      2A    3A 1B        2B      3B                                      1    2         3       4   5   6



                                                  CALCULATE GENE TREE


                             Duplication?


   1A       2A 3A 1B          2B      3B                                       1    2         3       4   5   6



                                                    OVERLAY KNOWN
                                                  FUNCTIONS ONTO TREE

                             Duplication?


            2A 3A 1B          2B      3B                                      1      2        3       4   5   6
   1A



                                                  INFER LIKELY FUNCTION
                                                  OF GENE(S) OF INTEREST
                                                                             Ambiguous
                             Duplication?



Species 1        Species 2          Species 3

                                                                                                                  Based on
 1A 1B            2A 2B              3A 3B                                     1    2         3       4   5   6


                                                    ACTUAL EVOLUTION
                                                (ASSUMED TO BE UNKNOWN)                                           Eisen, 1998
                                                                                                                  Genome Res 8:
                             Duplication                                                                          163-167.
PHYLOGENENETIC PREDICTION OF GENE FUNCTION



            EXAMPLE A                                   METHOD                           EXAMPLE B

                     2A                         CHOOSE GENE(S) OF INTEREST                        5


                  3A                                                                          1 3 4
                       2B                                                                 2
                                                   IDENTIFY HOMOLOGS                             5
             1A 2A 1B 3B                                                                       6



                                                    ALIGN SEQUENCES

    1A      2A    3A 1B        2B      3B                                      1    2         3       4   5   6



                                                  CALCULATE GENE TREE


                             Duplication?


   1A       2A 3A 1B          2B      3B                                       1    2         3       4   5   6



                                                    OVERLAY KNOWN
                                                  FUNCTIONS ONTO TREE

                             Duplication?


            2A 3A 1B          2B      3B                                      1      2        3       4   5   6
   1A



                                                  INFER LIKELY FUNCTION
                                                  OF GENE(S) OF INTEREST
                                                                             Ambiguous
                             Duplication?



Species 1        Species 2          Species 3
 1A 1B                                                                         1    2         3       4   5   6
                  2A 2B              3A 3B


                                                    ACTUAL EVOLUTION
                                                (ASSUMED TO BE UNKNOWN)
                                                                                                                  Based on
                             Duplication
                                                                                                                  Eisen, 1998
                                                                                                                  Genome Res 8:
Diversity of Proteorhodopsins




                      Venter et al., 2004
Carboxydothermus sporulates




       Wu et al. 2005 PLoS Genetics 1: e65.
Wu et al. 2005 PLoS Genetics 1: e65.
Uses of Phylogeny
in Genomics and Metagenomics

         Example 3:

Selecting Organisms for Study
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group             phyla of
                                       bacteria
             OP8
             Nitrospira
             Bacteroides
             Chlorobi
             Fibrobacteres
             Marine GroupA
             WS3
             Gemmimonas
             Firmicutes
             Fusobacteria
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on Hugenholtz,
             OP11                      2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides
                                       bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
                                     • Most genomes
             WS3
             Gemmimonas                from three
             Firmicutes
             Fusobacteria              phyla
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on Hugenholtz,
             OP11                      2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides
                                       bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
                                     • Most genomes
             WS3
             Gemmimonas                from three
             Firmicutes
             Fusobacteria              phyla
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
                                     • Some studies
             Deferribacteres
             Chrysiogenetes            in other phyla
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on Hugenholtz,
             OP11                      2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides
                                       bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
                                     • Most genomes
             WS3
             Gemmimonas                from three
             Firmicutes
             Fusobacteria              phyla
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
                                     • Some other
             Deferribacteres
             Chrysiogenetes            phyla are only
             NKB19
             Verrucomicrobia
             Chlamydia
                                       sparsely
             OP3
             Planctomycetes
             Spriochaetes
                                       sampled
             Coprothmermobacter
             OP10                    • Same trend in
             Thermomicrobia
             Chloroflexi
             TM7
                                       Eukaryotes
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on Hugenholtz,
             OP11                      2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides
                                       bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
                                     • Most genomes
             WS3
             Gemmimonas                from three
             Firmicutes
             Fusobacteria              phyla
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
                                     • Some other
             Deferribacteres
             Chrysiogenetes            phyla are only
             NKB19
             Verrucomicrobia
             Chlamydia
                                       sparsely
             OP3
             Planctomycetes
             Spriochaetes
                                       sampled
             Coprothmermobacter
             OP10                    • Same trend in
             Thermomicrobia
             Chloroflexi
             TM7
                                       Viruses
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on Hugenholtz,
             OP11                      2002
GEBA




http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
  Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne
  Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla
  Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen,
  Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor
  Markowitz, et al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu,
  Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain,
  Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati,
  Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
GEBA Now

• 300+ genomes
• Rich sampling of major groups of
  cultured organisms
GEBA Lesson 1:
             The rRNA Tree of Life is a Useful Tool




From Wu et al. 2009 Nature 462, 1056-1060
GEBA Lesson 2:
               The rRNA Tree of Life is not perfect ...


              16s                                               WGT, 23S




Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
GEBA Lesson 3:
     Phylogeny improves genome annotation


• Took 56 GEBA genomes and compared results vs. 56
  randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
  based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
GEBA Lesson 4 :
Metadata Important
GEBA Lesson 5:
Improves discovering new genetic diversity
Phylogenetic Distribution Novelty:
                  Bacterial Actin Related Protein
                                                                 C. boidinii  gi57157304
                                                                 S. cerevisiae  gi14318479
                                                                L. starkeyi  gi166080363 
                                                                S. japonicus  gi213407080                                   ACTIN
                                                                  A. cliftonii  gi14269497
                                                          99      U. pertusa  gi50355609
                                                               H. sapiens  gi4501889
                                                              M. cerebralis  gi46326807
                                                              67     C. cinerea  gi169844021
                                                                        N. crassa  gi85101929                                ARP1
                                                            100          I. scapularis  gi215507378 
                                                     51          100 H. sapiens  gi5031569
                                                                      65        S. japonicus  gi213404844
                                                                   100         S. cerevisiae  gi6320175
                                                                                                                             ARP2
                                                                               D. melanogaster  gi24642545
                                                                      100 G. gallus  gi45382569
                                                                          75     C. neoformans  gi58266690
                                                                                  S. cerevisiae  gi6322525                   ARP3
                                                                       100       D. melanogaster  gi17737543
                                                                          100 H. sapiens  gi5031573 
                                                                                   H. ochraceum  gi227395998                 BARP
                                                                                        S. cerevisiae  gi1008244 
                                                      73                    P. patens  gi168051992                           ARP4
                                                                  99      A. thaliana  gi18394608 
                                                                         94                 S. cerevisiae  gi1301932
                                                                100                          S. japonicus  gi213408393       ARP5
                                             87                                          D. discoideum  gi66802418
                                                       74                                  D. melanogaster  gi17737347
                                                               97                                   S. cerevisiae  gi6323114
                                                                            100               D. hansenii gi21851 1921       ARP6
                                                                           100     O. sativa  gi182657420 
                                                                                    A. thaliana gi1841 1737                  ARP7
                                                                             D. melanogater  gi19920358
                                                      100               M. musculus  gi226246593                           ARP10


                                              0.5 



   Haliangium ochraceum DSM 14365                       Patrik D’haeseleer, Adam Zemla, Victor Kunin

Wu et al. 2009 Nature 462, 1056-1060   See also Guljamow et al. 2007 Current Biology.
Protein Family Rarefaction

• Take data set of multiple complete
  genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Synapomorphies exist




Wu et al. 2009 Nature 462, 1056-1060
GEBA Lesson 6:
Improves Analysis of Uncultured
Weighted % of Clones




                                                                                                                  0
                                                                                                                       0.125
                                                                                                                                       0.250
                                                                                                                                                      0.375
                                                                                                                                                              0.500
                                                                        Al
                                                                          ph
                                                                                 ap
                                                                                   ro
                                                                                      t  eo
                                                                         Be                       ba
                                                                           ta                       ct
                                                                                                          er
                                                                                 pr                          ia
                                                                                   ot
                                                                                         eo
                                                                    G                       b
                                                                     am                            ac
                                                                             m                        t  er
                                                                              ap                            ia
                                                                                      ro
                                                                    Ep                   teo
                                                                         si                       ba
                                                                            lo                      ct
                                                                                 np                       er
                                                                                      ro                     ia
                                                                                   eo    t
                                                                        De             ba
                                                                          lta             ct
                                                                             pr              er
                                                                                ot              ia
                                                                                   eo
                                                                                       ba
                                                                              C           ct
                                                                                ya           er
                                                                                   no           ia
                                                                                       ba
                                                                                          ct
                                                                                             er
                                                                                   Fi           ia
                                                                                      rm
                                                                                         ic
                                                                                            ut
                                                                              Ac
                                                                                 tin
                                                                                               es

                                                                                                                      analysis
                                                                                                                      improves

                                                                                     ob
                                                                                        ac
                                                                                            te
                                                                                               ria
                                                                                      C
                                                                                        hl
                                                                                           or
                                                                                             ob
                                                                                                  i
                                                                                                       C




                                         Major Phylogenetic Group
                                                                                                           FB
                                                                                                                                                                      Sargasso Phylotypes




                                                                                                                      metagenomic
                                                                                                                      GEBA Project




                                                                                         C
                                                                                              hl
                                                                                                   or
                                                                                                     of
                                                                                                          le
                                                                                                            xi
                                                                                      Sp
                                                                                             iro
                                                                                                  ch
                                                                                                       ae
                                                                                                         te
                                                                                      Fu                     s
                                                                                             so
                                                                                               ba
                                                                    De                              ct
                                                                      in                                er
                                                                                                           ia
                                                                             oc
                                                                               oc
                                                                                 cu
                                                                                                                                                                                            Metagenomic Phylotyping




                                                                                         s-
                                                                                              Th
                                                                                  Eu      er
                                                                                    ry       m
                                                                                      ar       u
                                                                                        ch s
                                                                                           ae
                                                                                 C            ot
                                                                                                 a
                                                                                   re
                                                                                     na
                                                                                       rc
                                                                                          ha
                                                                                             eo
                                                                                                ta
                                                                                                                               EFG
                                                                                                                               EFTu



                                                                                                                               rRNA
                                                                                                                               RecA
                                                                                                                               RpoB
                                                                                                                               HSP70




Venter et al., Science 304: 66-74. 200
Weighted % of Clones




                                                                                                                  0
                                                                                                                      0.125
                                                                                                                                      0.250
                                                                                                                                                       0.375
                                                                                                                                                                     0.500
                                                                        Al
                                                                          ph
                                                                                 ap
                                                                                   ro
                                                                                      t  eo
                                                                         Be                       ba
                                                                           ta                       ct
                                                                                                          er
                                                                                 pr                          ia
                                                                                   ot
                                                                                         eo
                                                                    G                       b
                                                                     am                            ac
                                                                             m                        t  er
                                                                              ap                            ia
                                                                                      ro
                                                                    Ep                   teo
                                                                         si                       ba
                                                                            lo                      ct
                                                                                 np                       er
                                                                                      ro                     ia
                                                                                   eo    t
                                                                        De             ba
                                                                          lta             ct
                                                                             pr              er
                                                                                ot              ia
                                                                                   eo
                                                                                       ba
                                                                              C           ct
                                                                                ya           er
                                                                                   no           ia
                                                                                       ba
                                                                                          ct
                                                                                             er
                                                                                   Fi           ia
                                                                                      rm
                                                                                         ic
                                                                                            ut
                                                                                               es
                                                                              Ac
                                                                                 tin
                                                                                     ob
                                                                                        ac
                                                                                            te
                                                                                               ria
                                                                                      C
                                                                                        hl
                                                                                           or
                                                                                             ob
                                                                                                  i
                                                                                                                                                     But not a lot



                                                                                                       C




                                         Major Phylogenetic Group
                                                                                                           FB
                                                                                                                                                                             Sargasso Phylotypes




                                                                                         C
                                                                                              hl
                                                                                                   or
                                                                                                     of
                                                                                                          le
                                                                                                            xi
                                                                                      Sp
                                                                                             iro
                                                                                                  ch
                                                                                                       ae
                                                                                                         te
                                                                                      Fu                     s
                                                                                             so
                                                                                               ba
                                                                    De                              ct
                                                                      in                                er
                                                                                                           ia
                                                                             oc
                                                                               oc
                                                                                 cu
                                                                                                                                                                                                   Metagenomic Phylotyping




                                                                                         s-
                                                                                              Th
                                                                                  Eu      er
                                                                                    ry       m
                                                                                      ar       u
                                                                                        ch s
                                                                                           ae
                                                                                 C            ot
                                                                                                 a
                                                                                   re
                                                                                     na
                                                                                       rc
                                                                                          ha
                                                                                             eo
                                                                                                ta
                                                                                                                              EFG
                                                                                                                              EFTu



                                                                                                                              rRNA
                                                                                                                              RecA
                                                                                                                              RpoB
                                                                                                                              HSP70




Venter et al., Science 304: 66-74. 200
• AND THEN ALL OF THEM WERE
  DECEIVED
• For each of these areas - need to do a
  MUCH better job ...
Improving Phylotyping
Major Issues in Phylotpying
Beyond Moore’s Law                 Metagenomics




                     Short reads
Major Issues in Phylotpying
Beyond Moore’s Law                 Metagenomics




                     Short reads


                                    WE NEED NEW
                                    METHODS
Method 1: Each is an island

• Each new sequences is an island

• Take reference data
• Build alignment, models, trees
• Add new sequence to reference alignment
  and build tree
STAP                                             ss-rRNA Taxonomy Pip
       Figure 1. A flow chart of the STAP pipeline.
       doi:10.1371/journal.pone.0002566.g001

       STAP database, and the query sequence is aligned to them using               a
       the CLUSTALW profile alignment algorithm [40] as described                   w
       above for domain assignment. By adapting the profile alignment               s
                                                                                    a
                                                                                    t
                                                                                    o
                                                                                    G
                                                                                    t

                                                                                    t

            Each sequence
                                                                                    s
                                                                                    T
                                                                                    c

            analyzed separately                                                     a
                                                                                    q
                                                                                    c
                                                                                    e
                                                                                    b

                                                                                    b
                                                                                    S
                                                                                    p
                                                                                    a
       Figure 2. Domain assignment. In Step 1, STAP assigns a domain to             t
       each query sequence based on its position in a maximum likelihood            d
       tree of representative ss-rRNA sequences. Because the tree illustrated       ‘
       here is not rooted, domain assignment would not be accurate and              s
       reliable (sequence similarity based methods cannot make an accurate
                                                                                    s
       assignment in this case either). However the figure illustrates an
       important role of the tree-based domain assignment step, namely              s
       automatic identification of deep-branching environmental ss-rRNAs.           d
       doi:10.1371/journal.pone.0002566.g002                                        a


              PLoS ONE | www.plosone.org                                        5




                 Wu et al. 2008 PLoS One
AMPHORA




Wu and Eisen Genome
Biology 2008 9:R151
doi:10.1186/
gb-2008-9-10-r151         Guide tree
Phylotyping w/ Proteins




Wu and Eisen Genome Biology 2008 9:R151   doi:10.1186/gb-2008-9-10-r151
Whole Genome Tree




               Wu and Eisen
               Genome Biology
               2008 9:R151 doi:
               10.1186/
               gb-2008-9-10-r151
Method 2: Most in the Family
Phylogenetic Challenge

          xxxxxxxxxxxxxxxxxxxxxxx

        xxxxxx             xxxxxxxxxxxxx

                         xxxxxxxxxxxxxx




        xxxxxxxxxxxxxx




A single tree with everything?
Phylogenetic Challenge

               xxxxxxxxxxxxxxxxxxxxxxx

             xxxxxx             xxxxxxxxxxxxx

                              xxxxxxxxxxxxxx




             xxxxxxxxxxxxxx




    A single tree with everything
(as long as there is a lot of overlap)
Phylogenetic Challenge

               xxxxxxxxxxxxxxxxxxxxxxx

             xxxxxx             xxxxxxxxxxxxx

                              xxxxxxxxxxxxxx




             xxxxxxxxxxxxxx




    A single tree with everything
(as long as there is a lot of overlap)
Phylogenetic Challenge




A single tree with everything?
rRNA in Sargasso Metagenome




Venter et al., Science
304: 66. 2004
STAP All           ss-rRNA Taxonomy Pip




                                                          Combine all into
                                                          one alignment



Figure 1. A flow chart of the STAP pipeline.
RecA in Sargasso




Venter et al., Science
304: 66. 2004
Weighted % of Clones




                                                                                                                  0
                                                                                                                      0.125
                                                                                                                                      0.250
                                                                                                                                                     0.375
                                                                                                                                                                     0.500
                                                                        Al
                                                                          ph
                                                                                 ap
                                                                                   ro
                                                                                      t  eo
                                                                         Be                       ba
                                                                           ta                       ct
                                                                                                          er
                                                                                 pr                          ia
                                                                                   ot
                                                                                         eo
                                                                    G                       b
                                                                     am                            ac
                                                                             m                        t  er
                                                                              ap                            ia
                                                                                      ro
                                                                    Ep                   teo
                                                                         si                       ba
                                                                            lo                      ct
                                                                                 np                       er
                                                                                      ro                     ia
                                                                                   eo    t
                                                                        De             ba
                                                                          lta             ct
                                                                             pr              er
                                                                                ot              ia
                                                                                   eo
                                                                                       ba
                                                                              C
                                                                                                                                                             EFG




                                                                                          ct
                                                                                ya           er
                                                                                   no           ia
                                                                                       ba
                                                                                          ct
                                                                                             er
                                                                                   Fi           ia
                                                                                      rm
                                                                                         ic
                                                                                                                                                             EFTu




                                                                                            ut
                                                                                               es
                                                                              Ac
                                                                                 tin
                                                                                     ob
                                                                                        ac
                                                                                            te
                                                                                               ria
                                                                                      C
                                                                                        hl
                                                                                                                                                             HSP70




                                                                                           or
                                                                                             ob
                                                                                                  i
                                                                                                       C




                                         Major Phylogenetic Group
                                                                                                           FB
                                                                                                                                                                             Sargasso Phylotypes




                                                                                         C
                                                                                                                                                             RecA




                                                                                              hl
                                                                                                   or
                                                                                                     of
                                                                                                          le
                                                                                                            xi
                                                                                      Sp
                                                                                             iro
                                                                                                  ch
                                                                                                       ae
                                                                                                         te
                                                                                                                                                             RpoB




                                                                                      Fu                     s
                                                                                             so
                                                                                               ba
                                                                    De                              ct
                                                                      in                                er
                                                                                                           ia
                                                                             oc
                                                                               oc
                                                                                 cu
                                                                                         s-
                                                                                                                                                             rRNA




                                                                                              Th
                                                                                  Eu      er
                                                                                    ry       m
                                                                                      ar       u
                                                                                        ch s
                                                                                           ae
                                                                                 C            ot
                                                                                                 a
                                                                                   re
                                                                                     na
                                                                                                                                                                                                   Protein vs. rRNA Sargasso Data




                                                                                       rc
                                                                                          ha
                                                                                             eo
                                                                                                ta
Venter et al., Science 304: 66-74. 200
Kembel Correction
Method 3: All in the family

• Combine new sequences into one tree

• Take reference data
• Build alignment, models, trees
• Add all sequences to reference alignment
  and build tree
Phylogenetic Challenge




A single tree with everything?
Phylogenetic Challenge




A single tree with everything?
PhylOTU                                                   Finding Metagenomic OT




Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this general
workflow of PhylOTU. See Results section for details. Bio 2011
      PhylOTU - Sharpton et al. PLoS Comp.
doi:10.1371/journal.pcbi.1001061.g001
Phylosift/ pplacer
Method 4: All in the genome

• Combine new sequences from different
  gene families into one tree

• Take reference data
• Build alignment, models
• Concatenate
• Add all sequences to reference alignment
  and build tree
Challenge

• Each gene poorly sampled in
  metagenomes
• Can we combine all into a single tree?
Kembel Combiner




Kembel et al. The phylogenetic diversity of metagenomes. PLoS One 2011
Kembel Combiner


                  VOL. 73, 2007                                            PHYL


                                                                     TABLE 1.
                                     Measure

                  Only presence/absence of taxa considered                   Qua
                  Additionally accounts for the no. of times that            Qua
                   each taxon was observed



                  cally defined by a sequence similarity threshold) in the sam
                  as equally related. Newer ␤ diversity measures that incorpo
                  phylogenetic information are more powerful because they
                  count for the degree of divergence between sequences (13
                  29, 30). Phylogenetic ␤ diversity measures can also be ei
                  quantitative or qualitative depending on whether abundanc
                  taken into account. The original, unweighted UniFrac mea
                  (13) is a qualitative measure. Unweighted UniFrac meas
                  the distance between two communities by calculating the f
                  tion of the branch length in a phylogenetic tree that lead
                  descendants in either, but not both, of the two commun
                  (Fig. 1A). The fixation index (FST), which measures
                  distance between two communities by comparing the gen
                  diversity within each community to the total genetic diversit
                  the communities combined (18), is a quantitative measure
                  accounts for different levels of divergence between sequen
                  The phylogenetic test (P test), which measures the significa
                  of the association between environment and phylogeny (18
                  typically used as a qualitative measure because duplicate
                  quences are usually removed from the tree. However, th
                  test may be used in a semiquantitative manner if all clo
                  even those with identical or near-identical sequences, are
                  cluded in the tree (13).
                     Here we describe a quantitative version of UniFrac tha
                  call “weighted UniFrac.” We show that weighted UniFrac
                  haves similarly to the FST test in situations where both




                    FIG. 1. Calculation of the unweighted and the weighted Uni
                  measures. Squares and circles represent sequences from two diffe
                  environments. (a) In unweighted UniFrac, the distance between
Improving Phylotyping II

• We need to analyze more gene families
Families/PD not uniform
    31	





                            6
More Markers
   Phylogenetic group    Genome   Gene     Maker
                         Number   Number   Candidates
   Archaea               62       145415   106
   Actinobacteria        63       267783   136
   Alphaproteobacteria   94       347287   121
   Betaproteobacteria    56       266362   311
   Gammaproteobacter     126      483632   118
   ia
   Deltaproteobacteria   25       102115   206
   Epislonproteobacter   18       33416    455
   ia
   Bacteriodes           25       71531    286
   Chlamydae             13       13823    560
   Chloroflexi            10       33577    323
   Cyanobacteria         36       124080   590
   Firmicutes            106      312309   87
   Spirochaetes          18       38832    176
   Thermi                5        14160    974
   Thermotogae           9        17037    684
Improving Functional Predictions
Improving Functional Predictions

• We need to analyze even more gene
  families
Sifting Families
                   Representative
                     Genomes




                     Extract          New
                    Protein         Genomes
                   Annotation



                                      Extract
                     All v. All
                                     Protein
                      BLAST
                                    Annotation



    Homology
                                    Screen for
    Clustering
                                    Homologs
      (MCL)




           SFams                      HMMs




                     Align 
                      Build            Sharpton et al. submitted
Figure 1
                     HMMs
B
A




             C




    Sharpton et al. submitted
Phylogenetic Contrasts
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012
"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012

Contenu connexe

Similaire à "Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012

Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenPhylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenJonathan Eisen
 
DNA and the hidden world of microbes
DNA and the hidden world of microbesDNA and the hidden world of microbes
DNA and the hidden world of microbesJonathan Eisen
 
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006Jonathan Eisen
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Leighton Pritchard
 
20110602labseminar pub
20110602labseminar pub20110602labseminar pub
20110602labseminar pub裕樹 奥田
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen Jonathan Eisen
 
454/Illumina Marker Gene Studies (rRNA)
454/Illumina Marker Gene Studies (rRNA)454/Illumina Marker Gene Studies (rRNA)
454/Illumina Marker Gene Studies (rRNA)Holly Bik
 
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Jonathan Eisen
 
(第1章第2部分)Prok.regulation(trp operon).pdf
(第1章第2部分)Prok.regulation(trp operon).pdf(第1章第2部分)Prok.regulation(trp operon).pdf
(第1章第2部分)Prok.regulation(trp operon).pdfssuser13f50b1
 
Transcription
TranscriptionTranscription
Transcriptionjoyjulie
 
281 lec21 phage_repressor
281 lec21 phage_repressor281 lec21 phage_repressor
281 lec21 phage_repressorhhalhaddad
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expressionMichael Barton
 

Similaire à "Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012 (19)

Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenPhylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
 
DNA and the hidden world of microbes
DNA and the hidden world of microbesDNA and the hidden world of microbes
DNA and the hidden world of microbes
 
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
 
Transcription
TranscriptionTranscription
Transcription
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2
 
20110602labseminar pub
20110602labseminar pub20110602labseminar pub
20110602labseminar pub
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen
 
454/Illumina Marker Gene Studies (rRNA)
454/Illumina Marker Gene Studies (rRNA)454/Illumina Marker Gene Studies (rRNA)
454/Illumina Marker Gene Studies (rRNA)
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
 
Gene translation
Gene translationGene translation
Gene translation
 
(第1章第2部分)Prok.regulation(trp operon).pdf
(第1章第2部分)Prok.regulation(trp operon).pdf(第1章第2部分)Prok.regulation(trp operon).pdf
(第1章第2部分)Prok.regulation(trp operon).pdf
 
Transcription
TranscriptionTranscription
Transcription
 
281 lec21 phage_repressor
281 lec21 phage_repressor281 lec21 phage_repressor
281 lec21 phage_repressor
 
Dna replication
Dna replicationDna replication
Dna replication
 
dna cloning.pdf
dna cloning.pdfdna cloning.pdf
dna cloning.pdf
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expression
 
Dna notes
Dna notesDna notes
Dna notes
 

Plus de Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfJonathan Eisen
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesJonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingJonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionJonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionJonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionJonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingJonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionJonathan Eisen
 

Plus de Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Dernier

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Dernier (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

"Phylogeny-driven studies in genomics and metagenomics" talk by Jonathan Eisen at #CSMUBC2012

  • 1. Phylogeny-Driven Approaches to Genomics and Metagenomics June 23, 2012 Canadian Society for Microbiology Jonathan A. Eisen University of California, Davis @phylogenomics
  • 2. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter Klenk
  • 4. Phylogeny: What is it? • Phylogeny is a description of the evolutionary history of relationships among organisms (or their parts). • This is frequently portrayed in a diagram called a phylogenetic tree. • Phylogenies can be more complex than a bifurcating tree (e.g., lateral gene transfer, recombination, hybridization)
  • 5. Whatever the History: Trying to Incorporate it is Critical from Lake et al. doi: 10.1098/rstb.2009.0035
  • 6. Phylogeny • Applies to • Species • Genes • Genomes
  • 7. Phylogeny: What is it good for?
  • 8. Phylogeny: What is it good for? Uses of Phylogeny in Genomics and Metagenomics
  • 9. Uses of Phylogeny in Genomics and Metagenomics Example 1: Phylotyping
  • 10. rRNA Phylotyping DNA extraction PCR Makes lots of Sequence PCR copies of the rRNA genes rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ Phylogenetic tree Sequence alignment = Data matrix rRNA2 rRNA1 rRNA2 rRNA1 A C A C A C 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA4 rRNA3 rRNA2 T A C A G T rRNA3 rRNA3 C A C T G T 5’...ACGGCAAAATAGGTGGATT E. coli Humans rRNA4 C A C A G T CTAGCGATATAGA... 3’ Yeast E. coli A G A C A G rRNA4 5’...ACGGCCCGATAGGTGGATT Humans T A T A G T CTAGCGCCATAGA... 3’ Yeast T A C A G T
  • 11. rRNA Phylotyping • Collect DNA from environment • PCR amplify rRNA genes using broad (so-called universal) primers • Sequence • Align to others • Infer evolutionary tree • Unknowns “identified” by placement on tree
  • 12. Era IV: Genomes in Environment shotgun sequence Metagenomics
  • 13. rRNA Phylotyping in Sargasso Venter et al., Science 304: 66. 2004
  • 14. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004
  • 15. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C EFG ct ya er no ia ba ct er Fi ia rm ic EFTu ut es Ac tin ob ac te ria C hl HSP70 or ob i C Major Phylogenetic Group FB Sargasso Phylotypes C RecA hl or of le xi Sp iro ch ae te s RpoB Fu so ba De ct in er ia oc oc cu s- rRNA Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta Venter et al., Science 304: 66-74. 2004
  • 19. Binning challenge Best binning method: reference genomes
  • 20. Binning challenge Best binning method: reference genomes
  • 21. Binning challenge No reference genome? What do you do?
  • 22. Binning challenge No reference genome? What do you do? Composition, Assembly, others
  • 23. Binning challenge No reference genome? What do you do? Phylogeny
  • 24. Sulcia makes amino acids Baumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
  • 26. Side benefit II: PG Ecology
  • 27. rRNA survey • Sequence rRNAs • Cluster
  • 28. rRNA survey OTU1 • Sequence OTU2 rRNAs OTU3 • Cluster OTU4 • Identify OTU5 OTU6 “OTUs” OTU7 OTU8 OTU9 OTU10
  • 29. OTUs on Tree OTU1 OTU5 OTU4 OTU6 OTU2 OTU3 OTU7 OTU9 OTU8 OTU10
  • 30. OTUs on Tree OTU1 • Clades OTU5 • Rates of OTU4 change OTU6 • LGT OTU2 OTU3 • Convergence OTU7 • Character OTU9 OTU8 history OTU10
  • 31. Unifrac nuscript typically used as a qualitative measure because duplicate se- Weighted UniFrac. Weighted UniFrac is a new variant of the original un- quences are usually removed from the tree. However, the P weighted UniFrac measure that weights the branches of a phylogenetic tree test may be used in a semiquantitative manner if all clones, based on the abundance of information (Fig. 1B). Weighted UniFrac is thus a quantitative measure of ␤ diversity that can detect changes in how many se- even those with identical or near-identical sequences, are in- quences from each lineage are present, as well as detect changes in which taxa cluded in the tree (13). are present. This ability is important because the relative abundance of different Here we describe a quantitative version of UniFrac that we kinds of bacteria can be critical for describing community changes. In contrast, call “weighted UniFrac.” We show that weighted UniFrac be- the original, unweighted UniFrac (Fig. 1A) is a qualitative ␤ diversity measure haves similarly to the FST test in situations where both are because duplicate sequences contribute no additional branch length to the tree (by definition, the branch length that separates a pair of duplicate sequences is zero, because no substitutions separate them). The first step in applying weighted UniFrac is to calculate the raw weighted UniFrac value (u), according to the first equation: NIH-PA Author Manuscript ͸ n uϭ bi ϫ ͯA Ϫ B ͯ Ai T B T i i Here, n is the total number of branches in the tree, bi is the length of branch i, Ai and Bi are the numbers of sequences that descend from branch i in commu- nities A and B, respectively, and AT and BT are the total numbers of sequences in communities A and B, respectively. In order to control for unequal sampling effort, Ai and Bi are divided by AT and BT. If the phylogenetic tree is not ultrametric (i.e., if different sequences in the sample have evolved at different rates), clustering with weighted UniFrac will place more emphasis on communities that contain quickly evolving taxa. Since these taxa are assigned more branch length, a comparison of the communities FIG. 1. Calculation of the unweighted and the weighted UniFrac that contain them will tend to produce higher values of u. In some situations, it measures. Squares and circles represent sequences from two different may be desirable to normalize u so that it has a value of 0 for identical commu- environments. (a) In unweighted UniFrac, the distance between the nities and 1 for nonoverlapping communities. This is accomplished by dividing u circle and square communities is calculated as the fraction of the by a scaling factor (D), which is the average distance of each sequence from the branch length that has descendants from either the square or the circle root, as shown in the equation as follows: environment (black) but not both (gray). (b) In weighted UniFrac, ͸ ͩ branch lengths are weighted by the relative abundance of sequences in ͪ n the square and circle communities; square sequences are weighted Aj Bj Dϭ dj ϫ ϩ twice as much as circle sequences because there are twice as many total AT BT circle sequences in the data set. The width of branches is proportional Figure 1. j NIH-PA Author Manuscript to the degree to which each branch is weighted in the calculations, and Here, dj is the distance of sequence j from the root, (PD) and PD Gain (G) for the grey community. The Estimates of Phylogenetic Diversity Aj and Bj are the numbers gray branches have no weight. Branches 1 and 2 have heavy weights of times the sequences were observed in communitieswhite, and grey communities. (A) PD is the sum of the boxes represent taxa from the black, A and B, respectively, and since the descendants are biased toward the square and circles, respec- AT and BT are the total numbers of sequences from communities A and B, tively. Branch 3 contributes no value since it has an equal contribution branches leading to the grey taxa. (B) G is the sum of the branches leading only to the grey respectively. from circle and square sequences after normalization. Clustering with normalized u values treatsshowing the increase inof taxa. (C) PD rarefaction curves each sample equally instead branch length with sampling effort for the intestinal and stool bacteria from three healthy individuals. Aligned16S rRNA sequences from the three individuals were available with the Supplementary Materials in (Eckburg, et al., 2005). The Arb parsimony insertion tool was used to add the sequences to a tree containing over 9,000 sequences (Hugenholtz, 2002) that is available for download at the rRNA Database Project II website (Maidak, et al., 2001). The curves represent the average values for 50 replicate trials. FEMS Microbiol Rev. Author manuscript; available in PMC 2009 July 1.
  • 33. RecA, RpoB in GOS GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Wu et al PLoS One 2011
  • 34. Uses of Phylogeny in Genomics and Metagenomics Example 2: Functional Diversity and Functional Predictions
  • 35. Predicting Function • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve
  • 36. Predicting Function • Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding • Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used • Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
  • 37. From Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 38. Blast Search of H. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair • Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 39. MutL?? Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 40. Overlaying Functions onto Tree MutS2 Aquae MSH5 StrpyBacsuSynsp Deira Helpy Yeast Human Borbu Celeg Metth MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr Deira Theaq Bacsu Borbu Thema Synsp Strpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids Res MutS1 26: 4291-4300.
  • 41.
  • 42. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 Based on 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Eisen, 1998 Genome Res 8: Duplication 163-167.
  • 43. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 1 2 3 4 5 6 2A 2B 3A 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Duplication Eisen, 1998 Genome Res 8:
  • 44. Diversity of Proteorhodopsins Venter et al., 2004
  • 45. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 46. Wu et al. 2005 PLoS Genetics 1: e65.
  • 47.
  • 48. Uses of Phylogeny in Genomics and Metagenomics Example 3: Selecting Organisms for Study
  • 49. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of bacteria OP8 Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 50. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 51. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some studies Deferribacteres Chrysiogenetes in other phyla NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 52. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Eukaryotes Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 53. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Viruses Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 54.
  • 56. GEBA: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)
  • 57. GEBA Now • 300+ genomes • Rich sampling of major groups of cultured organisms
  • 58. GEBA Lesson 1: The rRNA Tree of Life is a Useful Tool From Wu et al. 2009 Nature 462, 1056-1060
  • 59. GEBA Lesson 2: The rRNA Tree of Life is not perfect ... 16s WGT, 23S Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
  • 60. GEBA Lesson 3: Phylogeny improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction
  • 61. GEBA Lesson 4 : Metadata Important
  • 62. GEBA Lesson 5: Improves discovering new genetic diversity
  • 63. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein C. boidinii  gi57157304 S. cerevisiae  gi14318479 L. starkeyi  gi166080363  S. japonicus  gi213407080 ACTIN A. cliftonii  gi14269497 99 U. pertusa  gi50355609 H. sapiens  gi4501889 M. cerebralis  gi46326807 67 C. cinerea  gi169844021 N. crassa  gi85101929 ARP1 100 I. scapularis  gi215507378  51 100 H. sapiens  gi5031569 65 S. japonicus  gi213404844 100 S. cerevisiae  gi6320175 ARP2 D. melanogaster  gi24642545 100 G. gallus  gi45382569 75 C. neoformans  gi58266690 S. cerevisiae  gi6322525 ARP3 100 D. melanogaster  gi17737543 100 H. sapiens  gi5031573  H. ochraceum  gi227395998 BARP S. cerevisiae  gi1008244  73 P. patens  gi168051992  ARP4 99 A. thaliana  gi18394608  94 S. cerevisiae  gi1301932 100 S. japonicus  gi213408393  ARP5 87 D. discoideum  gi66802418 74 D. melanogaster  gi17737347 97 S. cerevisiae  gi6323114 100 D. hansenii gi21851 1921 ARP6 100 O. sativa  gi182657420  A. thaliana gi1841 1737 ARP7 D. melanogater  gi19920358 100 M. musculus  gi226246593 ARP10 0.5  Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin Wu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  • 64. Protein Family Rarefaction • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families
  • 65. Wu et al. 2009 Nature 462, 1056-1060
  • 66. Wu et al. 2009 Nature 462, 1056-1060
  • 67. Wu et al. 2009 Nature 462, 1056-1060
  • 68. Wu et al. 2009 Nature 462, 1056-1060
  • 69. Wu et al. 2009 Nature 462, 1056-1060
  • 70. Synapomorphies exist Wu et al. 2009 Nature 462, 1056-1060
  • 71. GEBA Lesson 6: Improves Analysis of Uncultured
  • 72. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C ct ya er no ia ba ct er Fi ia rm ic ut Ac tin es analysis improves ob ac te ria C hl or ob i C Major Phylogenetic Group FB Sargasso Phylotypes metagenomic GEBA Project C hl or of le xi Sp iro ch ae te Fu s so ba De ct in er ia oc oc cu Metagenomic Phylotyping s- Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta EFG EFTu rRNA RecA RpoB HSP70 Venter et al., Science 304: 66-74. 200
  • 73. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C ct ya er no ia ba ct er Fi ia rm ic ut es Ac tin ob ac te ria C hl or ob i But not a lot C Major Phylogenetic Group FB Sargasso Phylotypes C hl or of le xi Sp iro ch ae te Fu s so ba De ct in er ia oc oc cu Metagenomic Phylotyping s- Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta EFG EFTu rRNA RecA RpoB HSP70 Venter et al., Science 304: 66-74. 200
  • 74. • AND THEN ALL OF THEM WERE DECEIVED
  • 75. • For each of these areas - need to do a MUCH better job ...
  • 77. Major Issues in Phylotpying Beyond Moore’s Law Metagenomics Short reads
  • 78. Major Issues in Phylotpying Beyond Moore’s Law Metagenomics Short reads WE NEED NEW METHODS
  • 79. Method 1: Each is an island • Each new sequences is an island • Take reference data • Build alignment, models, trees • Add new sequence to reference alignment and build tree
  • 80. STAP ss-rRNA Taxonomy Pip Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using a the CLUSTALW profile alignment algorithm [40] as described w above for domain assignment. By adapting the profile alignment s a t o G t t Each sequence s T c analyzed separately a q c e b b S p a Figure 2. Domain assignment. In Step 1, STAP assigns a domain to t each query sequence based on its position in a maximum likelihood d tree of representative ss-rRNA sequences. Because the tree illustrated ‘ here is not rooted, domain assignment would not be accurate and s reliable (sequence similarity based methods cannot make an accurate s assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely s automatic identification of deep-branching environmental ss-rRNAs. d doi:10.1371/journal.pone.0002566.g002 a PLoS ONE | www.plosone.org 5 Wu et al. 2008 PLoS One
  • 81. AMPHORA Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/ gb-2008-9-10-r151 Guide tree
  • 82. Phylotyping w/ Proteins Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151
  • 83. Whole Genome Tree Wu and Eisen Genome Biology 2008 9:R151 doi: 10.1186/ gb-2008-9-10-r151
  • 84. Method 2: Most in the Family
  • 85. Phylogenetic Challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everything?
  • 86. Phylogenetic Challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everything (as long as there is a lot of overlap)
  • 87. Phylogenetic Challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everything (as long as there is a lot of overlap)
  • 88. Phylogenetic Challenge A single tree with everything?
  • 89. rRNA in Sargasso Metagenome Venter et al., Science 304: 66. 2004
  • 90. STAP All ss-rRNA Taxonomy Pip Combine all into one alignment Figure 1. A flow chart of the STAP pipeline.
  • 91. RecA in Sargasso Venter et al., Science 304: 66. 2004
  • 92. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C EFG ct ya er no ia ba ct er Fi ia rm ic EFTu ut es Ac tin ob ac te ria C hl HSP70 or ob i C Major Phylogenetic Group FB Sargasso Phylotypes C RecA hl or of le xi Sp iro ch ae te RpoB Fu s so ba De ct in er ia oc oc cu s- rRNA Th Eu er ry m ar u ch s ae C ot a re na Protein vs. rRNA Sargasso Data rc ha eo ta Venter et al., Science 304: 66-74. 200
  • 94. Method 3: All in the family • Combine new sequences into one tree • Take reference data • Build alignment, models, trees • Add all sequences to reference alignment and build tree
  • 95. Phylogenetic Challenge A single tree with everything?
  • 96. Phylogenetic Challenge A single tree with everything?
  • 97. PhylOTU Finding Metagenomic OT Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this general workflow of PhylOTU. See Results section for details. Bio 2011 PhylOTU - Sharpton et al. PLoS Comp. doi:10.1371/journal.pcbi.1001061.g001
  • 99.
  • 100. Method 4: All in the genome • Combine new sequences from different gene families into one tree • Take reference data • Build alignment, models • Concatenate • Add all sequences to reference alignment and build tree
  • 101. Challenge • Each gene poorly sampled in metagenomes • Can we combine all into a single tree?
  • 102. Kembel Combiner Kembel et al. The phylogenetic diversity of metagenomes. PLoS One 2011
  • 103. Kembel Combiner VOL. 73, 2007 PHYL TABLE 1. Measure Only presence/absence of taxa considered Qua Additionally accounts for the no. of times that Qua each taxon was observed cally defined by a sequence similarity threshold) in the sam as equally related. Newer ␤ diversity measures that incorpo phylogenetic information are more powerful because they count for the degree of divergence between sequences (13 29, 30). Phylogenetic ␤ diversity measures can also be ei quantitative or qualitative depending on whether abundanc taken into account. The original, unweighted UniFrac mea (13) is a qualitative measure. Unweighted UniFrac meas the distance between two communities by calculating the f tion of the branch length in a phylogenetic tree that lead descendants in either, but not both, of the two commun (Fig. 1A). The fixation index (FST), which measures distance between two communities by comparing the gen diversity within each community to the total genetic diversit the communities combined (18), is a quantitative measure accounts for different levels of divergence between sequen The phylogenetic test (P test), which measures the significa of the association between environment and phylogeny (18 typically used as a qualitative measure because duplicate quences are usually removed from the tree. However, th test may be used in a semiquantitative manner if all clo even those with identical or near-identical sequences, are cluded in the tree (13). Here we describe a quantitative version of UniFrac tha call “weighted UniFrac.” We show that weighted UniFrac haves similarly to the FST test in situations where both FIG. 1. Calculation of the unweighted and the weighted Uni measures. Squares and circles represent sequences from two diffe environments. (a) In unweighted UniFrac, the distance between
  • 104. Improving Phylotyping II • We need to analyze more gene families
  • 106. More Markers Phylogenetic group Genome Gene Maker Number Number Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacter 126 483632 118 ia Deltaproteobacteria 25 102115 206 Epislonproteobacter 18 33416 455 ia Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684
  • 108. Improving Functional Predictions • We need to analyze even more gene families
  • 109. Sifting Families Representative Genomes Extract New Protein Genomes Annotation Extract All v. All Protein BLAST Annotation Homology Screen for Clustering Homologs (MCL) SFams HMMs Align Build Sharpton et al. submitted Figure 1 HMMs
  • 110. B A C Sharpton et al. submitted