The document discusses sequencing genomes from diverse bacterial phyla to better understand microbial evolution and diversity. It describes a project that sequenced genomes from 8 understudied phyla to improve the bacterial phylogenetic tree. While this helped resolve deep evolutionary relationships, the tree remains biased towards a few well-studied phyla. More genome sequencing is needed to characterize the vast majority of bacterial diversity.
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
Phylogenomics and the Diversity and Diversification of Microbes
1. Phylogenomics and the Diversity
and Diversification of Microbes
Jonathan A. Eisen
UC Davis
Calacademy Talk
December 16, 2010 1
Monday, November 26, 12
2. Phylogenomics of Novelty
Origin of New Genome
Functions and Dynamics
Processes
•Evolvability
•New genes •Repair and recombination
•Changes in old genes processes
•Changes in pathways •Intragenomic variation
Species Evolution
•Phylogenetic history
•Vertical vs. horizontal descent
•Needed to track gain/loss of
processes, infer convergence
Monday, November 26, 12
3. Phylogenomic Analysis
• Evolutionary reconstructions greatly
improve genome analyses
• Genome analysis greatly improves
evolutionary reconstructions
• There is a feedback loop such that these
should be integrated
3
Monday, November 26, 12
4. Outline
• Introduction
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
4
Monday, November 26, 12
5. Outline
• Introductino
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
• All of these going to be told in context of a
recent project “A Genomic Encyclopedia of
Bacteria and Archaea” (aka GEBA)
5
Monday, November 26, 12
24. Genome Sequences Have
Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
studies
• New enzymes and materials for engineering
and synthetic biology
14
Monday, November 26, 12
26. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, November 26, 12
27. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002 17
Monday, November 26, 12
28. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002 18
Monday, November 26, 12
29. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002 19
Monday, November 26, 12
30. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group phyla of
OP8
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9 three phyla
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
• Some other
NKB19
Verrucomicrobia phyla are only
Chlamydia
OP3
Planctomycetes
sparsely
Spriochaetes
Coprothmermobacter sampled
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Monday, November 26, 12
31. Need for Tree Guidance Well Established
• Common approach within some eukaryotic
groups
• Many small projects funded to fill in some
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
21
Monday, November 26, 12
32. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Aquificae sequence more
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1 phyla
OP11 22
Monday, November 26, 12
34. Bacterial aTOL Project AIMS
• Improve resolution of deep branches in the
bacterial tree
• Launch biological studies of these phyla
• Leverage data for interpreting
environmental surveys
24
Monday, November 26, 12
36. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Still highly
OP10
Thermomicrobia
Chloroflexi
biased in terms
TM7
Deinococcus-Thermus
Dictyoglomus
of the tree
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Monday, November 26, 12
38. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Archaea
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Monday, November 26, 12
39. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Eukaryotes
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Monday, November 26, 12
40. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Viruses
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Monday, November 26, 12
41. Proteobacteria
• GEBA TM6
OS-K • At least 40
Acidobacteria
• A genomic Termite Group
OP8
phyla of bacteria
encyclopedia Nitrospira
Bacteroides • Genome
Chlorobi
of bacteria Fibrobacteres
Marine GroupA
sequences are
and archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria • Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter
OP10
• Solution: Really
Thermomicrobia
Chloroflexi Fill in the Tree
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eisen & Ward, PIs Thermotogae
OP1
OP11
Monday, November 26, 12
43. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify those with a cultured representative in
DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 100+ (covering breadth of
bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
Monday, November 26, 12
44. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
34
Monday, November 26, 12
45. GEBA Phylogenomic Lesson 1
The rRNA Tree of Life is a Useful Tool
for Identifying Phylogenetically Novel
Genomes
35
Monday, November 26, 12
46. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, November 26, 12
47. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, November 26, 12
48. “Whole Genome” Concatenation Tree
w/ AMPHORA
See Wu and Eisen, Genome Biology 2008 9: R151
http://bobcat.genomecenter.ucdavis.edu/AMPHORA/
Monday, November 26, 12
51. PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060
Monday, November 26, 12
52. GEBA Phylogenomic Lesson 2
rRNA Tree is good but not perfect
and better genomic sampling improves
phylogenetic inference
42
Monday, November 26, 12
53. 16s Says Hyphomonas is in Rhodobacteriales
Badger et al.
2005
43
Monday, November 26, 12
54. WGT and individual gene trees:
Its Related to Caulobacterales
Badger et al.
2005
44
Monday, November 26, 12
55. GEBA Phylogenomic Lesson 3
Phylogeny-driven genome selection
helps discover new genetic diversity
Monday, November 26, 12
56. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, November 26, 12
57. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Monday, November 26, 12
64. Structural Novelty
• Of the 17000 protein families in the GEBA56, 1800
are novel in sequence (Wu)
• Structural modeling suggests many are structurally
novel too (D'haeseleer)
• 372 being crystallized by the PSI (Kerfeld)
Monday, November 26, 12
65. Phylogenetic Distribution Novelty:
Bacterial Actin Related Protein
C. boidinii gi57157304
S. cerevisiae gi14318479
L. starkeyi gi166080363
S. japonicus gi213407080 ACTIN
A. cliftonii gi14269497
99 U. pertusa gi50355609
H. sapiens gi4501889
M. cerebralis gi46326807
67 C. cinerea gi169844021
N. crassa gi85101929 ARP1
100 I. scapularis gi215507378
51 100 H. sapiens gi5031569
65 S. japonicus gi213404844
100 S. cerevisiae gi6320175
ARP2
D. melanogaster gi24642545
100 G. gallus gi45382569
75 C. neoformans gi58266690
S. cerevisiae gi6322525 ARP3
100 D. melanogaster gi17737543
100 H. sapiens gi5031573
H. ochraceum gi227395998 BARP
S. cerevisiae gi1008244
73 P. patens gi168051992 ARP4
99 A. thaliana gi18394608
94 S. cerevisiae gi1301932
100 S. japonicus gi213408393 ARP5
87 D. discoideum gi66802418
74 D. melanogaster gi17737347
97 S. cerevisiae gi6323114
100 D. hansenii gi21851 1921 ARP6
100 O. sativa gi182657420
A. thaliana gi1841 1737 ARP7
D. melanogater gi19920358
100 M. musculus gi226246593 ARP10
0.5
Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla,
Victor Kunin
See also Guljamow et al. 2007 Current Biology.
Monday, November 26, 12
67. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• All improved both by “phylogenomic” type
analyses that integrate evolutionary
reconstructions and understanding of how
new functions evolve
53
Monday, November 26, 12
68. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
Kostas Natalia Thanos Nikos Iain
Mavrommatis Ivanova Lykidis Kyrpides Anderson
Monday, November 26, 12
69. GEBA Phylogenomic Lesson 4
Metadata and individual genome
papers important
55
Monday, November 26, 12
70. SIGS
http://standardsingenomics.org/
56
Monday, November 26, 12
71. GEBA Phylogenomic Lesson 5
Improves analysis of genome data
from uncultured organisms
57
Monday, November 26, 12
72. Great Plate Count Anomaly
Culturing Microscope
Count Count
58
Monday, November 26, 12
73. Great Plate Count Anomaly
Culturing Microscope
Count <<<< Count
59
Monday, November 26, 12
74. Great Plate Count Anomaly
DNA
Culturing Microscope
Count <<<< Count
60
Monday, November 26, 12
76. rRNA Phylotyping
• Collect DNA from
environment
• PCR amplify rRNA
genes using broad (so-
called universal) primers
• Sequence
• Align to others
• Infer evolutionary tree
• Unknowns “identified”
by placement on tree
• Some use BLAST, but
not as good as phylogeny
62
Monday, November 26, 12
77. Uses of rRNA sequences
The Hidden Majority Richness estimates
Hugenholtz 2002 Bohannan and
Hughes 2003
63
Monday, November 26, 12
78. rRNA: A Phylogenetic Anchor to
Determine Who’s Out There
Eisen et
al. 1992
64
Monday, November 26, 12
79. rRNA: A Phylogenetic Anchor to
Determine Who’s Out There
Eisen et
al. 1992
64
Monday, November 26, 12
80. rRNA: A Phylogenetic Anchor to
Determine Who’s Out There
Eisen et
al. 1992
64
Monday, November 26, 12
81. rRNA: A Phylogenetic Anchor to
Determine Who’s Out There
Biology not Eisen et
similar enough al. 1992
64
Monday, November 26, 12
82. Metagenomics
shotgun
clone
Monday, November 26, 12
85. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b ac
ta
pr t er
ot
e ia
G ob
am ac
m t
Monday, November 26, 12
ap er
ro ia
Ep te
si ob
lo ac
np t er
ro ia
De t eo
lta b ac
pr te
ot ria
eo
b
C ac
ya t er
n ob ia
ac
ter
Fi ia
rm
ic
u te
Ac s
tin
ob
ac
t er
C ia
hl
or
ob
i
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
or
of
le
Sp xi
iro
ch
ae
te
Fu
so s
De ba
in ct
er
oc ia
oc
cu
s-
Eu The
ry r
ar mu
ch s
ae
C ot
re a
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
68
EFG
EFTu
rRNA
RecA
RpoB
HSP70
86. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b ac
ta
pr t er
ot
e ia
G ob
am ac
m t
Monday, November 26, 12
ap er
ro ia
Ep te
si ob
lo ac
np t er
ro ia
De t eo
lta b ac
pr te
ot ria
eo
b
C ac
ya t er
n ob ia
ac
ter
Fi ia
rm
ic
u te
Ac s
tin
ob
ac
t er
C ia
hl
or
ob
i
without good
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
Cannot be done
hl
or
of
le
Sp xi
iro
ch
ae
te
Fu
so s
De ba
in ct
er
oc ia
sampling of genomes
oc
cu
s-
Eu The
ry r
ar mu
ch s
ae
C ot
re a
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
69
EFG
EFTu
rRNA
RecA
RpoB
HSP70
99. Commonly Used Binning Methods
Did not Work Well
• Assembly
– Only Baumannia generated good contigs
• Depth of coverage
– Everything else 0-1X coverage
• Nucleotide composition
– No detectible peaks in any vector we looked at
81
Monday, November 26, 12
105. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b ac
ta
pr t er
ot
e ia
G ob
am ac
m t
Monday, November 26, 12
ap er
ro ia
Ep te
si ob
lo ac
np t er
ro ia
De t eo
lta b ac
pr te
ot ria
eo
b
C ac
ya t er
n ob ia
ac
ter
Fi ia
rm
ic
u te
Ac s
tin
ob
ac
t er
C ia
hl
or
ob
i
without good
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
Cannot be done
hl
or
of
le
Sp xi
iro
ch
ae
te
Fu
so s
De ba
in ct
er
oc ia
sampling of genomes
oc
cu
s-
Eu The
ry r
ar mu
ch s
ae
C ot
re a
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
87
EFG
EFTu
rRNA
RecA
RpoB
HSP70