SlideShare une entreprise Scribd logo
1  sur  85
Télécharger pour lire hors ligne
Phylogenomic Approaches to the
               Study of Microbial Diversity
                                  September 6, 2012
                            Bay Area Illumina User’s Meeting

                                   Jonathan A. Eisen
                             University of California, Davis
                                  @phylogenomics




Thursday, September 6, 12
Phylogenomic Approaches to
                            Studying Microbial Diversity

                                    Example 1:

                                    Phylotyping
                                       and
                               Phylogenetic Diversity

Thursday, September 6, 12
rRNA Phylotyping
                            DNA
                            extraction                           PCR

                                                             Makes lots of                  Sequence
                               PCR                           copies of the                 rRNA genes
                                                             rRNA genes
                                                              in sample


                                                                                            rRNA1
                                                                                 5’...ACACACATAGGTGGAGCTA
                                                                                       GCGATCGATCGA... 3’
                                          Sequence alignment = Data matrix
                                                                                            rRNA2
                                               rRNA1     A   C   A   C   A   C   5’..TACAGTATAGGTGGAGCTAG
                                                                                        CGACGATCGA... 3’
                                               rRNA2     T   A   C   A   G   T
                                                                                            rRNA3
                                               rRNA3     C   A   C   T   G   T   5’...ACGGCAAAATAGGTGGATT
                                               rRNA4     C   A   C   A   G   T         CTAGCGATATAGA... 3’

                                               E. coli   A   G   A   C   A   G               rRNA4
                                                                                 5’...ACGGCCCGATAGGTGGATT
                                              Humans     T   A   T   A   G   T         CTAGCGCCATAGA... 3’
                                               Yeast     T   A   C   A   G   T

Thursday, September 6, 12
Phylotyping




Thursday, September 6, 12
Phylotyping




                             E. coli           Humans

                                       Yeast




Thursday, September 6, 12
Phylotyping




                               E. coli              Humans

                                           Yeast




                                   OTU2   OTU1

                                                  OTU4
                            OTU3



                             E. coli               Humans

                                          Yeast


Thursday, September 6, 12
Phylotyping
                            B
                      A




  Cluster                   C




Thursday, September 6, 12
Phylotyping
                            B
                      A




  Cluster                   C




                            B
                      A



 OTUs                       C




Thursday, September 6, 12
Phylotyping
                                   B
                      A




  Cluster                          C




                                   B
                      A



 OTUs                              C




                            OTU1

                            OTU2

                            OTU3

                            OTU4


Thursday, September 6, 12
Phylotyping
                                   B
                      A




  Cluster                          C




                                   B
                      A



 OTUs                              C




                                              OTU2   OTU1

                            OTU1                             OTU4
                                       OTU3
                            OTU2

                            OTU3        E. coli              Humans
                            OTU4                     Yeast


Thursday, September 6, 12
Phylotyping




                             E. coli           Humans

                                       Yeast




Thursday, September 6, 12
Phylotyping




                                                        Just
                             E. coli           Humans
                                                        Phylogeny
                                       Yeast




Thursday, September 6, 12
Phylotyping
                                   B
                      A




  Cluster                          C




                                                                        Just
                                   B      E. coli              Humans
                                                                        Phylogeny
                      A

                                                      Yeast
 OTUs                              C




                                              OTU2   OTU1

                            OTU1                             OTU4
                                       OTU3
                            OTU2

                            OTU3        E. coli               Humans
                            OTU4                     Yeast


Thursday, September 6, 12
Phylotyping
        • OTUs
              • Taxonomic lists
              • Relative abundance of taxa
              • Ecological metrics (alpha and beta diversity)
        • Phylogenetic metrics
              •   Binning
              •   Identification of novel groups
              •   Clades
              •   Rates of change
              •   LGT
              •   Convergence
              •   PD
              •   Phylogenetic ecology (e.g., Unifrac)
Thursday, September 6, 12
What’s New in Phylotyping




Thursday, September 6, 12
What’s New in Phylotyping I

        • More PCR products

        • Deeper sequencing
              • The rare biosphere
              • Relative abundance estimates

        • More samples (with barcoding)
              • Times series
              • Spatially diverse sampling
              • Fine scale sampling

Thursday, September 6, 12
Earth Microbiome Project




Thursday, September 6, 12
Thursday, September 6, 12
Things You Could Do
      • Mississippi River: 2320 miles long




Thursday, September 6, 12
Things You Could Do
      • Mississippi River: 2320 miles long
            • 1 site / mile
            • 3 samples / site
            • 6960 samples
                  • rRNA PCR w/ barcodes
                  • metagenomics w/ barcodes
            • Miseq Run:
                  • 30 million sequence reads
                  • 4310 sequences / sample
            • Hiseq 2000
                  • 6 billion sequence reads
                  • 862,068 sequences / sample

Thursday, September 6, 12
Things You Could Do
      • Mississippi River: 12,249,600 feet long
            • 1 site / 500 feet
            • 3 samples / site
            • 73497 samples
                  • rRNA PCR w/ barcodes
                  • metagenomics w/ barcodes
            • Miseq Run:
                  • 30 million sequence reads
                  • 408 sequences / sample
            • Hiseq 2000
                  • 6 billion sequence reads
                  • 81,635 sequences / sample

Thursday, September 6, 12
What’s New in Phylotyping II

        • Metagenomics avoids biases of rRNA
          PCR




                                            shotgun
                                                 sequence




Thursday, September 6, 12
Metagenomic Phylotyping
                                   B
                      A




  Cluster                          C




                                                                        Just
                                   B      E. coli              Humans
                                                                        Phylogeny
                      A

                                                      Yeast
 OTUs                              C




                                              OTU2   OTU1

                            OTU1                             OTU4
                                       OTU3
                            OTU2

                            OTU3        E. coli               Humans
                            OTU4                     Yeast


Thursday, September 6, 12
Phylogenetic Challenge




                                      ??


Thursday, September 6, 12
Phylogenetic Challenge




                                      ??


Thursday, September 6, 12
Phylogenetic Challenge




                                Multiple approaches


Thursday, September 6, 12
Method 1: Each is an island




Thursday, September 6, 12
Method 1: Each is an island




          • Build alignment, models, trees for full length seqs
          • Analyze fragmented reads one at a time


Thursday, September 6, 12
Method 1: Each is an island




          • Build alignment, models, trees for full length seqs
          • Analyze fragmented reads one at a time


Thursday, September 6, 12
Method 1: Each is an island




          • Build alignment, models, trees for full length seqs
          • Analyze fragmented reads one at a time


Thursday, September 6, 12
STAP                                             ss-rRNA Taxonomy Pip
                                                      Figure 1. A flow chart of the STAP pipeline.
                                                      doi:10.1371/journal.pone.0002566.g001

                                                      STAP database, and the query sequence is aligned to them using               a
                                                      the CLUSTALW profile alignment algorithm [40] as described                   w
                                                      above for domain assignment. By adapting the profile alignment               s
                                                                                                                                   a
                                                                                                                                   t
                                                                                                                                   o
                                                                                                                                   G
                                                                                                                                   t

                                                                                                                                   t

                                                           Each sequence
                                                                                                                                   s
                                                                                                                                   T
                                                                                                                                   c

                                                           analyzed separately                                                     a
                                                                                                                                   q
                                                                                                                                   c
                                                                                                                                   e
                                                                                                                                   b

                                                                                                                                   b
                                                                                                                                   S
                                                                                                                                   p
                                                                                                                                   a
                                                      Figure 2. Domain assignment. In Step 1, STAP assigns a domain to             t
                                                      each query sequence based on its position in a maximum likelihood            d
                                                      tree of representative ss-rRNA sequences. Because the tree illustrated       ‘
                                                      here is not rooted, domain assignment would not be accurate and              s
                                                      reliable (sequence similarity based methods cannot make an accurate
                                                                                                                                   s
                                                      assignment in this case either). However the figure illustrates an
                                                      important role of the tree-based domain assignment step, namely              s
                                                      automatic identification of deep-branching environmental ss-rRNAs.           d
                                                      doi:10.1371/journal.pone.0002566.g002                                        a


                                                             PLoS ONE | www.plosone.org                                        5




                                                                Wu et al. 2008 PLoS One

Figure 1. A flow chart of the STAP pipeline.
Thursday, September 6, 12
AMPHORA




    Wu and Eisen Genome
    Biology 2008 9:R151
    doi:10.1186/
    gb-2008-9-10-r151           Guide tree
Thursday, September 6, 12
Phylotyping w/ Proteins




   Wu and Eisen Genome Biology 2008 9:R151   doi:10.1186/gb-2008-9-10-r151
Thursday, September 6, 12
Method 2: Most in the Family




Thursday, September 6, 12
Phylogenetic Challenge

                                    xxxxxxxxxxxxxxxxxxxxxxx

                                   xxxxxx           xxxxxxxxxxxxx

                                                 xxxxxxxxxxxxxx




                                   xxxxxxxxxxxxxx




                                            ??


Thursday, September 6, 12
Method 2: Most in family

                                       xxxxxxxxxxxxxxxxxxxxxxx

                                      xxxxxx           xxxxxxxxxxxxx

                                                   xxxxxxxxxxxxxx




                                      xxxxxxxxxxxxxx




                            One tree for those w/ overlap


Thursday, September 6, 12
rRNA in Sargasso Metagenome




     Venter et al., Science
     304: 66. 2004

Thursday, September 6, 12
RecA Phylotyping in Sargasso Data




     Venter et al., Science
     304: 66. 2004

Thursday, September 6, 12
Weighted % of Clones




                                                                                                          0
                                                                                                              0.125
                                                                                                                              0.250
                                                                                                                                             0.375
                                                                                                                                                             0.500
                                                                Al
                                                                  ph
                                                                         ap
                                                                           ro
                                                                              t  eo
                                                                 Be                       ba
                                                                   ta                       ct




Thursday, September 6, 12
                                                                         pr                       er
                                                                           ot                        ia
                                                                                 eo
                                                            G




                            304: 66. 2004
                                                             am                     b      ac
                                                                     m                        t  er
                                                                      ap                            ia
                                                                          ro
                                                            Ep               t   eo
                                                                 si                       ba
                                                                    lo                      ct




                            Venter et al., Science
                                                                         np                       er
                                                                              ro                     ia
                                                                           eo    t
                                                                De             ba
                                                                  lta             ct
                                                                     pr              er
                                                                        ot              ia
                                                                           eo
                                                                               ba
                                                                      C
                                                                                                                                                     EFG




                                                                                  ct
                                                                        ya           er
                                                                           no           ia
                                                                               ba
                                                                                  ct
                                                                                     er
                                                                           Fi           ia
                                                                              rm
                                                                                 ic
                                                                                                                                                     EFTu




                                                                                    ut
                                                                                       es
                                                                      Ac
                                                                         tin
                                                                             ob
                                                                                ac
                                                                                    te
                                                                                       ria
                                                                              C
                                                                                hl
                                                                                                                                                     HSP70




                                                                                   or
                                                                                     ob
                                                                                          i
                                                                                               C




                                 Major Phylogenetic Group
                                                                                                   FB
                                                                                                                                                                     Sargasso Phylotypes




                                                                                 C
                                                                                                                                                     RecA




                                                                                      hl
                                                                                           or
                                                                                             of
                                                                                                  le
                                                                                                    xi
                                                                              Sp
                                                                                     iro
                                                                                          ch
                                                                                               ae
                                                                                                 te
                                                                                                     s
                                                                                                                                                     RpoB




                                                                              Fu
                                                                                     so
                                                                                       ba
                                                            De                              ct
                                                              in                                er
                                                                                                   ia
                                                                     oc
                                                                                                                                                                                           Sargasso Phylotyping




                                                                       oc
                                                                         cu
                                                                                 s-
                                                                                                                                                     rRNA




                                                                                      Th
                                                                          Eu      er
                                                                            ry       m
                                                                              ar       u
                                                                                ch s
                                                                                   ae
                                                                         C            ot
                                                                                         a
                                                                           re
                                                                             na
                                                                               rc
                                                                                  ha
                                                                                     eo
                                                                                        ta
STAP, QIIME, Mothur           ss-rRNA Taxonomy Pip




                                                              Combine all into
                                                              one alignment



               Figure 1. A flow chart of the STAP pipeline.
               doi:10.1371/journal.pone.0002566.g001
Thursday, September 6, 12
Method 3: All in the family




Thursday, September 6, 12
Phylogenetic Challenge




                                      ??


Thursday, September 6, 12
Phylogenetic Challenge




                            A single tree with everything?


Thursday, September 6, 12
rRNA analysis
                                   B
                      A




  Cluster                          C




                                                                        Just
                                   B      E. coli              Humans
                                                                        Phylogeny
                      A

                                                      Yeast
 OTUs                              C




                                              OTU2   OTU1

                            OTU1                             OTU4
                                       OTU3
                            OTU2

                            OTU3        E. coli               Humans
                            OTU4                     Yeast


Thursday, September 6, 12
PhylOTU                                                                 Finding Meta




                    Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in
                    workflow of PhylOTU. See Results section for details.
 Sharpton TJ,      Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011)
                    doi:10.1371/journal.pcbi.1001061.g001
 PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel
 Taxa from Metagenomic used toPLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
               alignment Data. build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap betw
                          sequence alignment of full-length reference sequences and         clusters and recognized taxonomic designations, and
Thursday, September 6, 12 metagenomic reads. The final step of the alignment process is a   the accuracy of PhylOTU clusters from shotgun re
RecA, RpoB in GOS

                                                                        GOS 1

                                                                        GOS 2




                                                                        GOS 3

                                                                        GOS 4




   Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking
   the Fourth Domain in Metagenomic Data: Searching for, Discovering,
                                                                        GOS 5
   and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic
   Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011


Thursday, September 6, 12
Phylosift/ pplacer




     Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric
     Lowe, and others
Thursday, September 6, 12
Method 4: All in the genome




Thursday, September 6, 12
Multiple Genes?




                            A single tree with everything?




Thursday, September 6, 12
Kembel Combiner




     Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS
     ONE 6(8): e23214. doi:10.1371/journal.pone.0023214

Thursday, September 6, 12
typically used as a qualitative measure because duplicate s
                                                                                      quences are usually removed from the tree. However, the
                                                                                      test may be used in a semiquantitative manner if all clone




                                Kembel Combiner
                                                                                      even those with identical or near-identical sequences, are i
                                                                                      cluded in the tree (13).
                                                                                         Here we describe a quantitative version of UniFrac that w
                                                                                      call “weighted UniFrac.” We show that weighted UniFrac b
                                                                                      haves similarly to the FST test in situations where both a




                                                                                         FIG. 1. Calculation of the unweighted and the weighted UniFr
                                                                                      measures. Squares and circles represent sequences from two differe
                                                                                      environments. (a) In unweighted UniFrac, the distance between t
                                                                                      circle and square communities is calculated as the fraction of t
                                                                                      branch length that has descendants from either the square or the circ
                                                                                      environment (black) but not both (gray). (b) In weighted UniFra
                                                                                      branch lengths are weighted by the relative abundance of sequences
                                                                                      the square and circle communities; square sequences are weight
                                                                                      twice as much as circle sequences because there are twice as many tot
                                                                                      circle sequences in the data set. The width of branches is proportion
                                                                                      to the degree to which each branch is weighted in the calculations, an
                                                                                      gray branches have no weight. Branches 1 and 2 have heavy weigh
                                                                                      since the descendants are biased toward the square and circles, respe
                                                                                      tively. Branch 3 contributes no value since it has an equal contributio
                                                                                      from circle and square sequences after normalization.




     Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS
     ONE 6(8): e23214. doi:10.1371/journal.pone.0023214

Thursday, September 6, 12
Uses of Phylogeny
                    in Genomics and Metagenomics

                                  Example 2:

                            Functional Diversity and
                             Functional Predictions




Thursday, September 6, 12
PHYLOGENENETIC PREDICTION OF GENE FUNCTION



                                        EXAMPLE A                                   METHOD                           EXAMPLE B

                                                 2A                         CHOOSE GENE(S) OF INTEREST                        5


                                              3A                                                                          1 3 4
                                                   2B                                                                 2
                                                                               IDENTIFY HOMOLOGS                             5
                                         1A 2A 1B 3B                                                                       6



                                                                                ALIGN SEQUENCES

                                1A      2A    3A 1B        2B      3B                                      1    2         3       4   5   6



                                                                              CALCULATE GENE TREE


                                                         Duplication?


                               1A       2A 3A 1B          2B      3B                                       1    2         3       4   5   6



                                                                                OVERLAY KNOWN
                                                                              FUNCTIONS ONTO TREE

                                                         Duplication?


                                        2A 3A 1B          2B      3B                                      1      2        3       4   5   6
                               1A



                                                                              INFER LIKELY FUNCTION
                                                                              OF GENE(S) OF INTEREST
                                                                                                         Ambiguous
                                                         Duplication?



                            Species 1        Species 2          Species 3

                                                                                                                                              Based on
                             1A 1B            2A 2B              3A 3B                                     1    2         3       4   5   6


                                                                                ACTUAL EVOLUTION
                                                                            (ASSUMED TO BE UNKNOWN)                                           Eisen, 1998
                                                                                                                                              Genome Res 8:
                                                         Duplication                                                                          163-167.

Thursday, September 6, 12
Diversity of Proteorhodopsins




                                                    Venter et al., 2004.
                                                    Science 304: 66.
Thursday, September 6, 12
Improving Functional Predictions

        • Same methods discussed for phylotyping
          improve phylogenomic functional
          prediction for protein families
        • Increase in sequence diversity helps too




Thursday, September 6, 12
NMF in Metagenomes
Characterizing the niche-space distributions of components


                                                                              0 .1   0 .2             0 .3           0 .4                  0 .5        0 .6                                                                  0 .2   0 .4   0 .6   0 .8   1 .0



                   Polyne sia Archipe la gos_ G S 0 4 8 a _ C ora l R e e f
                                India n O ce a n_ G S 1 2 0 _ O pe n O ce a n
                        Polyne sia Archipe la gos_ G S 0 4 9 _ C oa sta l
                        G a la pa gos Isla nds_ G S 0 2 6 _ O pe n O ce a n
                                India n O ce a n_ G S 1 1 9 _ O pe n O ce a n
                                                                                                                                                                                                                                                                                 G e ne ra l
                                     C a ribbe a n S e a _ G S 0 1 5 _ C oa sta l
                                     C a ribbe a n S e a _ G S 0 1 9 _ C oa sta l
                                India n O ce a n_ G S 1 1 4 _ O pe n O ce a n                                                                                                                                                                                                      H igh
                 E a ste rn Tropica l Pa cific_ G S 0 2 3 _ O pe n O ce a n                                                                                                                                                                                                        M e dium
                              India n O ce a n_ G S 1 1 0 a _ O pe n O ce a n
                             India n O ce a n_ G S 1 0 8 a _ La goon R e e f                                                                                                                                                                                                       Low
                             C a ribbe a n S e a _ G S 0 1 8 _ O pe n O ce a n                                                                                                                                                                                                     NA
                                G a la pa gos Isla nds_ G S 0 3 4 _ C oa sta l
                              India n O ce a n_ G S 1 2 2 a _ O pe n O ce a n
                                India n O ce a n_ G S 1 2 1 _ O pe n O ce a n
                             C a ribbe a n S e a _ G S 0 1 7 _ O pe n O ce a n
                              India n O ce a n_ G S 1 1 2 a _ O pe n O ce a n
                                India n O ce a n_ G S 1 1 3 _ O pe n O ce a n
                               India n O ce a n_ G S 1 4 8 _ F ringing R e e f
                              C a ribbe a n S e a _ G S 0 1 6 _ C oa sta l S e a
                                India n O ce a n_ G S 1 2 3 _ O pe n O ce a n
                                        India n O ce a n_ G S 1 4 9 _ H a rbor
                                G a la pa gos Isla nds_ G S 0 2 7 _ C oa sta l
                 E a ste rn Tropica l Pa cific_ G S 0 2 2 _ O pe n O ce a n                                                                                                                                                                                                      W a te r de pth
     S ites




                             S a rga sso S e a _ G S 0 0 1 c_ O pe n O ce a n
                                G a la pa gos Isla nds_ G S 0 3 5 _ C oa sta l
                         G a la pa gos Isla nds_ G S 0 3 0 _ W a rm S e e p
                                G a la pa gos Isla nds_ G S 0 2 9 _ C oa sta l                                                                                                                                                                                                     >4000m
                 G a la pa gos Isla nds_ G S 0 3 1 _ C oa sta l upwe lling
                         India n O ce a n_ G S 1 1 7 a _ C oa sta l sa m ple
                                                                                                                                                                                                                                                                                   2000!4000m
                                G a la pa gos Isla nds_ G S 0 2 8 _ C oa sta l                                                                                                                                                                                                     900!2000m
                                G a la pa gos Isla nds_ G S 0 3 6 _ C oa sta l                                                                                                                                                                                                     100!200m
              Polyne sia Archipe la gos_ G S 0 5 1 _ C ora l R e e f Atoll
                   N orth Am e rica n E a st C oa st_ G S 0 1 4 _ C oa sta l                                                                                                                                                                                                       20!100m
                   N orth Am e rica n E a st C oa st_ G S 0 0 6 _ E stua ry                                                                                                                                                                                                        0!20m
                        E a ste rn Tropica l Pa cific_ G S 0 2 1 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 0 9 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 1 1 _ E stua ry
                   N orth Am e rica n E a st C oa st_ G S 0 0 8 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 1 3 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 0 4 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 0 7 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 0 3 _ C oa sta l
                   N orth Am e rica n E a st C oa st_ G S 0 0 2 _ C oa sta l
              N orth Am e rica n E a st C oa st_ G S 0 0 5 _ E m baym e nt




                                                                                            Co                        Co                          Co                       Co                       Co




                                                                                                                                                                                                                                                                  Chlorophyll
                                                                                                                                                                                                                                                                      Salinity


                                                                                                                                                                                                                                                                 Temperature

                                                                                                                                                                                                                                                                 Water Depth
                                                                                                                                                                                                                                                                Sample Depth


                                                                                                                                                                                                                                                                   Insolation
                                                                                                 mp                         mp                         mp                       mp                       mp
                                                                                                      on                         on                         on                       on                       on
                                                                                                           en                         en                         en                       en                       en
                                                                                                                t1                         t2                         t3                       t4                       t5




                                                                                                       (a)                                                                                                                             (b)                        (c)




 Figure 3: a) Niche-space distributions for our five components (H T );Weitz,site-
             Non-negative c) environmental variables for the sites. w/ matrices Dushoff,
                          ˆ ˆ
 similarity matrix (H T H);
                              matrix factorization                      b) the
                                                                    Langille, Neches,
                                                                    The           are
 aligned so that et al. Inrow corresponds to One. site in each matrix. Sites are
             Jiang the same press PLoS the same
                                                                    Levin, etc
 ordered by applying spectral reordering to the similarity matrix (see Materials and
 Methods). Rows are aligned across the three matrices.
Thursday, September 6, 12
Uses of Phylogeny
                    in Genomics and Metagenomics

                                Example 3:

                       Selecting Organisms for Study




Thursday, September 6, 12
GEBA




                            http://www.jgi.doe.gov/programs/GEBA/pilot.html
Thursday, September 6, 12
GEBA: Components
         • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
           Eisen, Eddy Rubin, Jim Bristow)
         • Project management (David Bruce, Eileen Dalin, Lynne
           Goodwin)
         • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
         • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla
           Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen,
           Jan-Fang Cheng)
         • Annotation and data release (Nikos Kyrpides, Victor
           Markowitz, et al)
         • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu,
           Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain,
           Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati,
           Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
         • Adopt a microbe education project (Cheryl Kerfeld)
         • Outreach (David Gilbert)
         • $$$ (DOE, Eddy Rubin, Jim Bristow)

Thursday, September 6, 12
GEBA Now

        • 300+ genomes
        • Rich sampling of major groups of
          cultured organisms




Thursday, September 6, 12
GEBA Lesson 1




Thursday, September 6, 12
Protein Family Rarefaction

        • Take data set of multiple complete
          genomes
        • Identify all protein families using MCL
        • Plot # of genomes vs. # of protein families




Thursday, September 6, 12
Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
Synapomorphies exist




Wu et al. 2009 Nature 462, 1056-1060

Thursday, September 6, 12
GEBA Lesson 2




Thursday, September 6, 12
Weighted % of Clones




                                                                                                                                               0
                                                                                                                                                   0.125
                                                                                                                                                                   0.250
                                                                                                                                                                                  0.375
                                                                                                                                                                                          0.500
                                                                                                     Al
                                                                                                       ph
                                                                                                              ap
                                                                                                                ro
                                                                                                                   t  eo
                                                                                                      Be                       ba
                                                                                                        ta                       ct
                                                                                                                                       er
                                                                                                              pr                          ia




Thursday, September 6, 12
                                                                                                                ot
                                                                                                                      eo
                                                                                                 G                       b
                                                                                                  am                            ac
                                                                                                          m                        t  er
                                                                                                           ap                            ia
                                                                                                                   ro
                                                                                                 Ep                   teo
                                                                                                      si                       ba
                                                                                                         lo                      ct
                                                                                                              np                       er
                                                                                                                   ro                     ia
                                                                                                                eo    t
                                                                                                     De             ba
                                                                                                       lta             ct
                                                                                                          pr              er
                                                                                                             ot              ia
                                                                                                                eo
                                                                                                                    ba
                                                                                                           C           ct
                                                                                                             ya           er
                                                                                                                no           ia
                                                                                                                    ba
                                                                                                                       ct
                                                                                                                          er
                                                                                                                Fi           ia
                                                                                                                   rm
                                                                                                                      ic
                                                                                                                         ut
                                                                                                                            es
                                                                                                           Ac
                                                                                                              tin
                                                                                                                  ob
                                                                                                                     ac
                                                                                                                         te
                                                                                                                            ria
                                                                                                                   C
                                                                                                                     hl
                                                                                                                        or
                                                                                                                          ob
                                                                                                                               i
                                                                                                                                    C




                                                                      Major Phylogenetic Group
                                                                                                                                        FB
                                                                                                                                                                                                  Sargasso Phylotypes




                                                                                                                                                           phylotyping &



                                                                                                                      C
                                                                                                                           hl
                                                                                                                                or
                                                                                                                                                           GEBA benefits




                                                                                                                                  of
                                                                                                                                       le
                                                                                                                                         xi
                                                                                                                   Sp
                                                                                                                          iro
                                                                                                                               ch
                                                                                                                                    ae
                                                                                                                                      te
                                                                                                                   Fu                     s
                                                                                                                          so
                                                                                                                            ba
                                                                                                 De                              ct
                                                                                                   in                                er
                                                                                                                                        ia
                                                                                                          oc
                                                                                                            oc
                                                                                                              cu
                                                                                                                                                                                                                        Metagenomic Phylotyping




                                                                                                                                                           functional prediction




                                                                                                                      s-
                                                                                                                           Th
                                                                                                               Eu      er
                                                                                                                 ry       m
                                                                                                                   ar       u
                                                                                                                     ch s
                                                                                                                        ae
                                                                                                              C            ot
                                                                                                                              a
                                                                                                                re
                                                                                                                  na
                                                                                                                    rc
                                                                                                                       ha
                                                                                                                          eo
                                                                                                                             ta
                            Venter et al., Science 304: 66-74. 2004
                                                                                                                                                           EFG
                                                                                                                                                           EFTu



                                                                                                                                                           rRNA
                                                                                                                                                           RecA
                                                                                                                                                           RpoB
                                                                                                                                                           HSP70
GEBA improves genome annotation



            • Took 56 GEBA genomes and compared results vs. 56
              randomly sampled new genomes
            • Better definition of protein family sequence “patterns”
            • Greatly improves “comparative” and “evolutionary”
              based predictions
            • Conversion of hypothetical into conserved hypotheticals
            • Linking distantly related members of protein families
            • Improved non-homology prediction




Thursday, September 6, 12
Weighted % of Clones




                                                                                                                                               0
                                                                                                                                                   0.125
                                                                                                                                                                   0.250
                                                                                                                                                                                    0.375
                                                                                                                                                                                                  0.500
                                                                                                     Al
                                                                                                       ph
                                                                                                              ap
                                                                                                                ro
                                                                                                                   t  eo
                                                                                                      Be                       ba
                                                                                                        ta                       ct
                                                                                                                                       er
                                                                                                              pr                          ia




Thursday, September 6, 12
                                                                                                                ot
                                                                                                                      eo
                                                                                                 G                       b
                                                                                                  am                            ac
                                                                                                          m                        t  er
                                                                                                           ap                            ia
                                                                                                                   ro
                                                                                                 Ep                   teo
                                                                                                      si                       ba
                                                                                                         lo                      ct
                                                                                                              np                       er
                                                                                                                   ro                     ia
                                                                                                                eo    t
                                                                                                     De             ba
                                                                                                       lta             ct
                                                                                                          pr              er
                                                                                                             ot              ia
                                                                                                                eo
                                                                                                                    ba
                                                                                                           C           ct
                                                                                                             ya           er
                                                                                                                no           ia
                                                                                                                    ba
                                                                                                                       ct
                                                                                                                          er
                                                                                                                Fi           ia
                                                                                                                   rm
                                                                                                                      ic
                                                                                                                         ut
                                                                                                                            es
                                                                                                           Ac
                                                                                                              tin
                                                                                                                  ob
                                                                                                                     ac
                                                                                                                         te
                                                                                                                            ria
                                                                                                                   C
                                                                                                                     hl
                                                                                                                        or
                                                                                                                          ob
                                                                                                                               i
                                                                                                                                                                                  But not a lot



                                                                                                                                    C




                                                                      Major Phylogenetic Group
                                                                                                                                        FB
                                                                                                                                                                                                          Sargasso Phylotypes




                                                                                                                      C
                                                                                                                           hl
                                                                                                                                or
                                                                                                                                  of
                                                                                                                                       le
                                                                                                                                         xi
                                                                                                                   Sp
                                                                                                                          iro
                                                                                                                               ch
                                                                                                                                    ae
                                                                                                                                      te
                                                                                                                   Fu                     s
                                                                                                                          so
                                                                                                                            ba
                                                                                                 De                              ct
                                                                                                   in                                er
                                                                                                                                        ia
                                                                                                          oc
                                                                                                            oc
                                                                                                              cu
                                                                                                                                                                                                                                Metagenomic Phylotyping




                                                                                                                      s-
                                                                                                                           Th
                                                                                                               Eu      er
                                                                                                                 ry       m
                                                                                                                   ar       u
                                                                                                                     ch s
                                                                                                                        ae
                                                                                                              C            ot
                                                                                                                              a
                                                                                                                re
                                                                                                                  na
                                                                                                                    rc
                                                                                                                       ha
                                                                                                                          eo
                                                                                                                             ta
                            Venter et al., Science 304: 66-74. 2004
                                                                                                                                                           EFG
                                                                                                                                                           EFTu



                                                                                                                                                           rRNA
                                                                                                                                                           RecA
                                                                                                                                                           RpoB
                                                                                                                                                           HSP70
Improving Functional Predictions




Thursday, September 6, 12
Sifting Families
                                                                Representative
                                                                  Genomes



                                                          B
           A                                                      Extract
                                                                 Protein
                                                                                   New
                                                                                 Genomes
                                                                Annotation



                                                                                   Extract
                                                                  All v. All
                                                                                  Protein
                                                                   BLAST
                                                                                 Annotation



                                                 Homology
                                                                                 Screen for
                                                   (MCL)  C
                                                 Clustering
                                                                                 Homologs




                                                        SFams                      HMMs




                                                                  Align &
                                                                   Build
                 Sharpton et al. submitted   Figure 1
                                                                  HMMs


Thursday, September 6, 12
Improving Phylotyping




Thursday, September 6, 12
More Markers
                               Phylogenetic group      Genome   Gene     Maker
                                                       Number   Number   Candidates
                               Archaea                 62       145415   106
                               Actinobacteria          63       267783   136
                               Alphaproteobacteria     94       347287   121
                               Betaproteobacteria      56       266362   311
                               Gammaproteobacteria     126      483632   118
                               Deltaproteobacteria     25       102115   206
                               Epislonproteobacteria   18       33416    455
                               Bacteriodes             25       71531    286
                               Chlamydae               13       13823    560
                               Chloroflexi             10       33577    323
                               Cyanobacteria           36       124080   590
                               Firmicutes              106      312309   87
                               Spirochaetes            18       38832    176
                               Thermi                  5        14160    974
                               Thermotogae             9        17037    684




Thursday, September 6, 12
Better Reference Tree




    Morgan et al.
    submitted
Thursday, September 6, 12
GEBA Lesson 3



                            We have still only scratched the
                             surface of microbial diversity




Thursday, September 6, 12
PD: All




                                From Wu et al. 2009 Nature 462, 1056-1060
Thursday, September 6, 12
GEBA uncultured
      Number of SAGs from Candidate Phyla




                                                                  406
                                                      1
                                               OD1

                                                     OP1

                                                           OP3

                                                                 SAR
      Site   A: Hydrothermal vent               4      1    -     -
      Site   B: Gold Mine                       6     13    2     -
      Site   C: Tropical gyres (Mesopelagic)    -      -    -     2
      Site   D: Tropical gyres (Photic zone)    1      -    -     -




 Sample collections at 4 additional sites are underway.




                                                                              Phil Hugenholtz




                                                                             76

Thursday, September 6, 12
GEBA Lesson IV



                            Need Experiments from Across
                                 the Tree of Life too




Thursday, September 6, 12
Conclusion




Thursday, September 6, 12
Thursday, September 6, 12
MICROBES




Thursday, September 6, 12
Acknowledgements

              • $$$
                    •   DOE
                    •   NSF
                    •   GBMF
                    •   Sloan
                    •   DARPA
                    •   DSMZ
                    •   DHS
              • People, places
                    • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides
                    • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell
                      Neches, Jenna Morgan-Lang
                    • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak,
                      Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward,
                      Hans-Peter Klenk



Thursday, September 6, 12

Contenu connexe

Tendances (7)

Wang_Yang_201609_PhD
Wang_Yang_201609_PhDWang_Yang_201609_PhD
Wang_Yang_201609_PhD
 
Retos de la Bioinformatica
Retos de la BioinformaticaRetos de la Bioinformatica
Retos de la Bioinformatica
 
I Psc
I PscI Psc
I Psc
 
Identification and characterization of effector genes from wheat stripe rust
Identification and characterization of effector genes from wheat stripe rustIdentification and characterization of effector genes from wheat stripe rust
Identification and characterization of effector genes from wheat stripe rust
 
Mehlomakulu N N ethesis 20FEB15
Mehlomakulu N N ethesis 20FEB15Mehlomakulu N N ethesis 20FEB15
Mehlomakulu N N ethesis 20FEB15
 
Comparative Genomics for Marker Development in Cassava
Comparative Genomics for Marker Development in CassavaComparative Genomics for Marker Development in Cassava
Comparative Genomics for Marker Development in Cassava
 
Gene expression on rat1 fibroblast cells after transformation by evi1
Gene expression on rat1 fibroblast cells after transformation by evi1Gene expression on rat1 fibroblast cells after transformation by evi1
Gene expression on rat1 fibroblast cells after transformation by evi1
 

En vedette

Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & Microbiomes
Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & MicrobiomesJonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & Microbiomes
Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & MicrobiomesJonathan Eisen
 
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...Robert "Rusty" Bryant
 
Microbial Diversity: Tapping the Untapped
Microbial Diversity: Tapping the UntappedMicrobial Diversity: Tapping the Untapped
Microbial Diversity: Tapping the Untappedsachhatre
 
Choosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachChoosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachJoão André Carriço
 
Genetically modified organism
Genetically modified organismGenetically modified organism
Genetically modified organismerikatrinidad
 
Bacterial diversity presentation1
Bacterial diversity presentation1Bacterial diversity presentation1
Bacterial diversity presentation1Deepika Rana
 
Microbial diversity and ecology: Microbial evolution
Microbial diversity and ecology: Microbial evolutionMicrobial diversity and ecology: Microbial evolution
Microbial diversity and ecology: Microbial evolutionChun-Yao Chen
 
Ruminant Animals
Ruminant AnimalsRuminant Animals
Ruminant Animalsb.stev
 
B.Sc. Microbiology II Bacteriology Unit III Microbial Diversity
B.Sc. Microbiology II Bacteriology Unit III Microbial DiversityB.Sc. Microbiology II Bacteriology Unit III Microbial Diversity
B.Sc. Microbiology II Bacteriology Unit III Microbial DiversityRai University
 
BiS2C: Lecture 9: Microbial Diversity
BiS2C: Lecture 9: Microbial DiversityBiS2C: Lecture 9: Microbial Diversity
BiS2C: Lecture 9: Microbial DiversityJonathan Eisen
 
Ruminants ( Shakira sulehri)
Ruminants ( Shakira sulehri)Ruminants ( Shakira sulehri)
Ruminants ( Shakira sulehri)Shakira Sulehri
 
Bakers yeast production and characteristics
Bakers yeast production and characteristicsBakers yeast production and characteristics
Bakers yeast production and characteristicsHazem Hussein
 
Intro to ruminant digestion
Intro to ruminant digestionIntro to ruminant digestion
Intro to ruminant digestionDr Neo
 

En vedette (20)

Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & Microbiomes
Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & MicrobiomesJonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & Microbiomes
Jonathan Eisen Talk for #UCDavis #HostMicrobe on Phylogeny & Microbiomes
 
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...
Pszczolkowski et al. 2016 Effect of Craft Brewer's Yeast on Fermentation and ...
 
Microbial Diversity: Tapping the Untapped
Microbial Diversity: Tapping the UntappedMicrobial Diversity: Tapping the Untapped
Microbial Diversity: Tapping the Untapped
 
Choosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachChoosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative Approach
 
Genetically modified organism
Genetically modified organismGenetically modified organism
Genetically modified organism
 
De Long - Natural Materials, Systems and Extremophiles - Spring Review 2012
De Long - Natural Materials, Systems and Extremophiles - Spring Review 2012De Long - Natural Materials, Systems and Extremophiles - Spring Review 2012
De Long - Natural Materials, Systems and Extremophiles - Spring Review 2012
 
Bacterial diversity presentation1
Bacterial diversity presentation1Bacterial diversity presentation1
Bacterial diversity presentation1
 
Extremophiles
ExtremophilesExtremophiles
Extremophiles
 
Microbial diversity and ecology: Microbial evolution
Microbial diversity and ecology: Microbial evolutionMicrobial diversity and ecology: Microbial evolution
Microbial diversity and ecology: Microbial evolution
 
Soil microbial diversity
Soil microbial diversitySoil microbial diversity
Soil microbial diversity
 
Ruminant Animals
Ruminant AnimalsRuminant Animals
Ruminant Animals
 
B.Sc. Microbiology II Bacteriology Unit III Microbial Diversity
B.Sc. Microbiology II Bacteriology Unit III Microbial DiversityB.Sc. Microbiology II Bacteriology Unit III Microbial Diversity
B.Sc. Microbiology II Bacteriology Unit III Microbial Diversity
 
Common Extremophiles
Common ExtremophilesCommon Extremophiles
Common Extremophiles
 
Agricultural microbiology
Agricultural microbiologyAgricultural microbiology
Agricultural microbiology
 
Extremophiles
ExtremophilesExtremophiles
Extremophiles
 
BiS2C: Lecture 9: Microbial Diversity
BiS2C: Lecture 9: Microbial DiversityBiS2C: Lecture 9: Microbial Diversity
BiS2C: Lecture 9: Microbial Diversity
 
Ruminants ( Shakira sulehri)
Ruminants ( Shakira sulehri)Ruminants ( Shakira sulehri)
Ruminants ( Shakira sulehri)
 
Bakers yeast production and characteristics
Bakers yeast production and characteristicsBakers yeast production and characteristics
Bakers yeast production and characteristics
 
Rumen manupilation
Rumen manupilationRumen manupilation
Rumen manupilation
 
Intro to ruminant digestion
Intro to ruminant digestionIntro to ruminant digestion
Intro to ruminant digestion
 

Plus de Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfJonathan Eisen
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesJonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingJonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionJonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionJonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionJonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingJonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionJonathan Eisen
 

Plus de Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Dernier

call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfSasikiranMarri
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?bkling
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...sdateam0
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfDolisha Warbi
 
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATROApril 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATROKanhu Charan
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurNavdeep Kaur
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfSreeja Cherukuru
 
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS  CLASSIFICATIONS.pdfLUNG TUMORS AND ITS  CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS CLASSIFICATIONS.pdfDolisha Warbi
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptMumux Mirani
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptxMohamed Rizk Khodair
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
POST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxPOST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxvirengeeta
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptxBibekananda shah
 

Dernier (20)

call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdf
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
 
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATROApril 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
 
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS  CLASSIFICATIONS.pdfLUNG TUMORS AND ITS  CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.ppt
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptx
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
POST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxPOST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptx
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
 

"Phylogenomic approaches to microbial diversity" Talk by Jonathan Eisen at #IlluminaBayArea meeting

  • 1. Phylogenomic Approaches to the Study of Microbial Diversity September 6, 2012 Bay Area Illumina User’s Meeting Jonathan A. Eisen University of California, Davis @phylogenomics Thursday, September 6, 12
  • 2. Phylogenomic Approaches to Studying Microbial Diversity Example 1: Phylotyping and Phylogenetic Diversity Thursday, September 6, 12
  • 3. rRNA Phylotyping DNA extraction PCR Makes lots of Sequence PCR copies of the rRNA genes rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ Sequence alignment = Data matrix rRNA2 rRNA1 A C A C A C 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA2 T A C A G T rRNA3 rRNA3 C A C T G T 5’...ACGGCAAAATAGGTGGATT rRNA4 C A C A G T CTAGCGATATAGA... 3’ E. coli A G A C A G rRNA4 5’...ACGGCCCGATAGGTGGATT Humans T A T A G T CTAGCGCCATAGA... 3’ Yeast T A C A G T Thursday, September 6, 12
  • 5. Phylotyping E. coli Humans Yeast Thursday, September 6, 12
  • 6. Phylotyping E. coli Humans Yeast OTU2 OTU1 OTU4 OTU3 E. coli Humans Yeast Thursday, September 6, 12
  • 7. Phylotyping B A Cluster C Thursday, September 6, 12
  • 8. Phylotyping B A Cluster C B A OTUs C Thursday, September 6, 12
  • 9. Phylotyping B A Cluster C B A OTUs C OTU1 OTU2 OTU3 OTU4 Thursday, September 6, 12
  • 10. Phylotyping B A Cluster C B A OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 Yeast Thursday, September 6, 12
  • 11. Phylotyping E. coli Humans Yeast Thursday, September 6, 12
  • 12. Phylotyping Just E. coli Humans Phylogeny Yeast Thursday, September 6, 12
  • 13. Phylotyping B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 Yeast Thursday, September 6, 12
  • 14. Phylotyping • OTUs • Taxonomic lists • Relative abundance of taxa • Ecological metrics (alpha and beta diversity) • Phylogenetic metrics • Binning • Identification of novel groups • Clades • Rates of change • LGT • Convergence • PD • Phylogenetic ecology (e.g., Unifrac) Thursday, September 6, 12
  • 15. What’s New in Phylotyping Thursday, September 6, 12
  • 16. What’s New in Phylotyping I • More PCR products • Deeper sequencing • The rare biosphere • Relative abundance estimates • More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale sampling Thursday, September 6, 12
  • 19. Things You Could Do • Mississippi River: 2320 miles long Thursday, September 6, 12
  • 20. Things You Could Do • Mississippi River: 2320 miles long • 1 site / mile • 3 samples / site • 6960 samples • rRNA PCR w/ barcodes • metagenomics w/ barcodes • Miseq Run: • 30 million sequence reads • 4310 sequences / sample • Hiseq 2000 • 6 billion sequence reads • 862,068 sequences / sample Thursday, September 6, 12
  • 21. Things You Could Do • Mississippi River: 12,249,600 feet long • 1 site / 500 feet • 3 samples / site • 73497 samples • rRNA PCR w/ barcodes • metagenomics w/ barcodes • Miseq Run: • 30 million sequence reads • 408 sequences / sample • Hiseq 2000 • 6 billion sequence reads • 81,635 sequences / sample Thursday, September 6, 12
  • 22. What’s New in Phylotyping II • Metagenomics avoids biases of rRNA PCR shotgun sequence Thursday, September 6, 12
  • 23. Metagenomic Phylotyping B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 Yeast Thursday, September 6, 12
  • 24. Phylogenetic Challenge ?? Thursday, September 6, 12
  • 25. Phylogenetic Challenge ?? Thursday, September 6, 12
  • 26. Phylogenetic Challenge Multiple approaches Thursday, September 6, 12
  • 27. Method 1: Each is an island Thursday, September 6, 12
  • 28. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a time Thursday, September 6, 12
  • 29. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a time Thursday, September 6, 12
  • 30. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a time Thursday, September 6, 12
  • 31. STAP ss-rRNA Taxonomy Pip Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using a the CLUSTALW profile alignment algorithm [40] as described w above for domain assignment. By adapting the profile alignment s a t o G t t Each sequence s T c analyzed separately a q c e b b S p a Figure 2. Domain assignment. In Step 1, STAP assigns a domain to t each query sequence based on its position in a maximum likelihood d tree of representative ss-rRNA sequences. Because the tree illustrated ‘ here is not rooted, domain assignment would not be accurate and s reliable (sequence similarity based methods cannot make an accurate s assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely s automatic identification of deep-branching environmental ss-rRNAs. d doi:10.1371/journal.pone.0002566.g002 a PLoS ONE | www.plosone.org 5 Wu et al. 2008 PLoS One Figure 1. A flow chart of the STAP pipeline. Thursday, September 6, 12
  • 32. AMPHORA Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/ gb-2008-9-10-r151 Guide tree Thursday, September 6, 12
  • 33. Phylotyping w/ Proteins Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151 Thursday, September 6, 12
  • 34. Method 2: Most in the Family Thursday, September 6, 12
  • 35. Phylogenetic Challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx ?? Thursday, September 6, 12
  • 36. Method 2: Most in family xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx One tree for those w/ overlap Thursday, September 6, 12
  • 37. rRNA in Sargasso Metagenome Venter et al., Science 304: 66. 2004 Thursday, September 6, 12
  • 38. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004 Thursday, September 6, 12
  • 39. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct Thursday, September 6, 12 pr er ot ia eo G 304: 66. 2004 am b ac m t er ap ia ro Ep t eo si ba lo ct Venter et al., Science np er ro ia eo t De ba lta ct pr er ot ia eo ba C EFG ct ya er no ia ba ct er Fi ia rm ic EFTu ut es Ac tin ob ac te ria C hl HSP70 or ob i C Major Phylogenetic Group FB Sargasso Phylotypes C RecA hl or of le xi Sp iro ch ae te s RpoB Fu so ba De ct in er ia oc Sargasso Phylotyping oc cu s- rRNA Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta
  • 40. STAP, QIIME, Mothur ss-rRNA Taxonomy Pip Combine all into one alignment Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 Thursday, September 6, 12
  • 41. Method 3: All in the family Thursday, September 6, 12
  • 42. Phylogenetic Challenge ?? Thursday, September 6, 12
  • 43. Phylogenetic Challenge A single tree with everything? Thursday, September 6, 12
  • 44. rRNA analysis B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 Yeast Thursday, September 6, 12
  • 45. PhylOTU Finding Meta Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in workflow of PhylOTU. See Results section for details. Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) doi:10.1371/journal.pcbi.1001061.g001 PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic used toPLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 alignment Data. build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap betw sequence alignment of full-length reference sequences and clusters and recognized taxonomic designations, and Thursday, September 6, 12 metagenomic reads. The final step of the alignment process is a the accuracy of PhylOTU clusters from shotgun re
  • 46. RecA, RpoB in GOS GOS 1 GOS 2 GOS 3 GOS 4 Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, GOS 5 and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011 Thursday, September 6, 12
  • 47. Phylosift/ pplacer Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others Thursday, September 6, 12
  • 48. Method 4: All in the genome Thursday, September 6, 12
  • 49. Multiple Genes? A single tree with everything? Thursday, September 6, 12
  • 50. Kembel Combiner Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Thursday, September 6, 12
  • 51. typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone Kembel Combiner even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Thursday, September 6, 12
  • 52. Uses of Phylogeny in Genomics and Metagenomics Example 2: Functional Diversity and Functional Predictions Thursday, September 6, 12
  • 53. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 Based on 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Eisen, 1998 Genome Res 8: Duplication 163-167. Thursday, September 6, 12
  • 54. Diversity of Proteorhodopsins Venter et al., 2004. Science 304: 66. Thursday, September 6, 12
  • 55. Improving Functional Predictions • Same methods discussed for phylotyping improve phylogenomic functional prediction for protein families • Increase in sequence diversity helps too Thursday, September 6, 12
  • 56. NMF in Metagenomes Characterizing the niche-space distributions of components 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .2 0 .4 0 .6 0 .8 1 .0 Polyne sia Archipe la gos_ G S 0 4 8 a _ C ora l R e e f India n O ce a n_ G S 1 2 0 _ O pe n O ce a n Polyne sia Archipe la gos_ G S 0 4 9 _ C oa sta l G a la pa gos Isla nds_ G S 0 2 6 _ O pe n O ce a n India n O ce a n_ G S 1 1 9 _ O pe n O ce a n G e ne ra l C a ribbe a n S e a _ G S 0 1 5 _ C oa sta l C a ribbe a n S e a _ G S 0 1 9 _ C oa sta l India n O ce a n_ G S 1 1 4 _ O pe n O ce a n H igh E a ste rn Tropica l Pa cific_ G S 0 2 3 _ O pe n O ce a n M e dium India n O ce a n_ G S 1 1 0 a _ O pe n O ce a n India n O ce a n_ G S 1 0 8 a _ La goon R e e f Low C a ribbe a n S e a _ G S 0 1 8 _ O pe n O ce a n NA G a la pa gos Isla nds_ G S 0 3 4 _ C oa sta l India n O ce a n_ G S 1 2 2 a _ O pe n O ce a n India n O ce a n_ G S 1 2 1 _ O pe n O ce a n C a ribbe a n S e a _ G S 0 1 7 _ O pe n O ce a n India n O ce a n_ G S 1 1 2 a _ O pe n O ce a n India n O ce a n_ G S 1 1 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 8 _ F ringing R e e f C a ribbe a n S e a _ G S 0 1 6 _ C oa sta l S e a India n O ce a n_ G S 1 2 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 9 _ H a rbor G a la pa gos Isla nds_ G S 0 2 7 _ C oa sta l E a ste rn Tropica l Pa cific_ G S 0 2 2 _ O pe n O ce a n W a te r de pth S ites S a rga sso S e a _ G S 0 0 1 c_ O pe n O ce a n G a la pa gos Isla nds_ G S 0 3 5 _ C oa sta l G a la pa gos Isla nds_ G S 0 3 0 _ W a rm S e e p G a la pa gos Isla nds_ G S 0 2 9 _ C oa sta l >4000m G a la pa gos Isla nds_ G S 0 3 1 _ C oa sta l upwe lling India n O ce a n_ G S 1 1 7 a _ C oa sta l sa m ple 2000!4000m G a la pa gos Isla nds_ G S 0 2 8 _ C oa sta l 900!2000m G a la pa gos Isla nds_ G S 0 3 6 _ C oa sta l 100!200m Polyne sia Archipe la gos_ G S 0 5 1 _ C ora l R e e f Atoll N orth Am e rica n E a st C oa st_ G S 0 1 4 _ C oa sta l 20!100m N orth Am e rica n E a st C oa st_ G S 0 0 6 _ E stua ry 0!20m E a ste rn Tropica l Pa cific_ G S 0 2 1 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 9 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 1 _ E stua ry N orth Am e rica n E a st C oa st_ G S 0 0 8 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 4 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 7 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 2 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 5 _ E m baym e nt Co Co Co Co Co Chlorophyll Salinity Temperature Water Depth Sample Depth Insolation mp mp mp mp mp on on on on on en en en en en t1 t2 t3 t4 t5 (a) (b) (c) Figure 3: a) Niche-space distributions for our five components (H T );Weitz,site- Non-negative c) environmental variables for the sites. w/ matrices Dushoff, ˆ ˆ similarity matrix (H T H); matrix factorization b) the Langille, Neches, The are aligned so that et al. Inrow corresponds to One. site in each matrix. Sites are Jiang the same press PLoS the same Levin, etc ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices. Thursday, September 6, 12
  • 57. Uses of Phylogeny in Genomics and Metagenomics Example 3: Selecting Organisms for Study Thursday, September 6, 12
  • 58. GEBA http://www.jgi.doe.gov/programs/GEBA/pilot.html Thursday, September 6, 12
  • 59. GEBA: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow) Thursday, September 6, 12
  • 60. GEBA Now • 300+ genomes • Rich sampling of major groups of cultured organisms Thursday, September 6, 12
  • 61. GEBA Lesson 1 Thursday, September 6, 12
  • 62. Protein Family Rarefaction • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families Thursday, September 6, 12
  • 63. Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 64. Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 65. Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 66. Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 67. Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 68. Synapomorphies exist Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 69. GEBA Lesson 2 Thursday, September 6, 12
  • 70. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia Thursday, September 6, 12 ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C ct ya er no ia ba ct er Fi ia rm ic ut es Ac tin ob ac te ria C hl or ob i C Major Phylogenetic Group FB Sargasso Phylotypes phylotyping & C hl or GEBA benefits of le xi Sp iro ch ae te Fu s so ba De ct in er ia oc oc cu Metagenomic Phylotyping functional prediction s- Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta Venter et al., Science 304: 66-74. 2004 EFG EFTu rRNA RecA RpoB HSP70
  • 71. GEBA improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Thursday, September 6, 12
  • 72. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be ba ta ct er pr ia Thursday, September 6, 12 ot eo G b am ac m t er ap ia ro Ep teo si ba lo ct np er ro ia eo t De ba lta ct pr er ot ia eo ba C ct ya er no ia ba ct er Fi ia rm ic ut es Ac tin ob ac te ria C hl or ob i But not a lot C Major Phylogenetic Group FB Sargasso Phylotypes C hl or of le xi Sp iro ch ae te Fu s so ba De ct in er ia oc oc cu Metagenomic Phylotyping s- Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta Venter et al., Science 304: 66-74. 2004 EFG EFTu rRNA RecA RpoB HSP70
  • 74. Sifting Families Representative Genomes B A Extract Protein New Genomes Annotation Extract All v. All Protein BLAST Annotation Homology Screen for (MCL) C Clustering Homologs SFams HMMs Align & Build Sharpton et al. submitted Figure 1 HMMs Thursday, September 6, 12
  • 76. More Markers Phylogenetic group Genome Gene Maker Number Number Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Thursday, September 6, 12
  • 77. Better Reference Tree Morgan et al. submitted Thursday, September 6, 12
  • 78. GEBA Lesson 3 We have still only scratched the surface of microbial diversity Thursday, September 6, 12
  • 79. PD: All From Wu et al. 2009 Nature 462, 1056-1060 Thursday, September 6, 12
  • 80. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 76 Thursday, September 6, 12
  • 81. GEBA Lesson IV Need Experiments from Across the Tree of Life too Thursday, September 6, 12
  • 85. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter Klenk Thursday, September 6, 12