Talk by Jonathan Eisen given in December 2000 as guest seminar at the University of Maryland. Title; "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach"
Models of Human Diseases Conference (2010) Tetrahymena model by Dr. R. Pearl...Medical Education Advising
Similaire à "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach" talk in 12/2000 by J. Eisen (20)
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
"Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach" talk in 12/2000 by J. Eisen
1. TIGRTIGR
Phylogenomics:
Combining Evolutionary
Reconstructions and Genome
Analysis into a Single
Composite Approach
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changes
ArchaeaBacteriaEukarya
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZeamays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZeamays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
C
D
F
A’
B’
E’
E.coli
E. coli
B
C
D
F
A’
B’
D’
E’
V. cholerae
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
B1
A1
B2
A2
B3
A3
A2
A1 A2
A3
B2
B1
B3
B2
24
23
22
21
20
19
18171615
14
13
12
11
10
9
6
7
258
26
27
28
29
30
1 2 3
4
5
3132
B1
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
B3 24
23
22
21
20
19
18171615
14
13
12
11
10
9
6
7
258
26
27
28
29
3
3231
30
4
5
2 1
A1
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A2
3132
6
7
8
9
10
11
12
13
19
18171615
14
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A3
2
6
7
8
9
10
11
12
13
19
18171615
14
20
21
22
23
24
25
26
27
5
4
3 31 30
29
28
1 32
B2
Inversion
Around
Terminus (*)
Inversion
Around
Terminus (*)
Inversion
Around
Origin(*)
Inversion
Around
Origin(*)
* *
* *
* *
* *
Figure 4
Common
Ancestorof
A and B
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
Three V. cholerae
Photolyases
Phr.S thyp
PHR E. coli
ORFA00965*********
phr.neucr
Phr.Tricho
Phr.Yeast
Phr.B firm
phr.strpy
phr.haloba
PHR STRGR
pCRY1.huma
phr.mouse
phr2.human
phr2.mouse
phr.drosop
phr3.Synsp
ORF02295.Vibch********
phr.neigo
ORF01792.Vibch*******
Phr.Adiant
Phr2.Adian
Phr3.Adian
phr.tomato
CRY1 ARATH
phr.phycom
CRY2 ARATH
PHH1.arath
PHR1 SINAL
phr.chlamy
PHR ANANI
phr.Synsp
PHR SYNY3
phr.Theth
Rh.caps
MTHF type
Class I CPD
Photolyases
6-4
Photolyases
Blue
Light
Receptors
8-HDF type
CPD
Photolyases
Three Photolyase Homologs inV. cholerae
UvrA2
UvrA2 S. coelicolorDrrC S. peuceteusUvrA2 D. radioduransDuplication
in UvrA
family
UvrA1
UvrA H. influenzaeUvrA E. coliUvrA N. gonorrhoaeaUvrA R. prowazekiiUvrA S. mutansUvrA S. pyogenesUvrA S. pneumoniaeUvrA B. subtilisUvrA M. luteusUvrA M. tuberculosisUvrA M. hermoautotrophicumUvrA H. pyloriUvrA C. jejuniUvrA P. gingivalisUvrA C. tepidumuvra1 D. radioduransUvrA T. thermophilusUvrA T. pallidumUvrA B. burgdorefiUvrA T. maritimaUvrA A. aeolicusUvrA Synechocystis sp.
UvrA1UvrA2OppDFUUPNodILivFXylGNrtDCPstBMDRHlyBTAP1CFTR, SURA. ABC TransportersB. UvrA Subfamily
01020304050600510152005010015005101520
Number of Species With High Hits050100150200250
Frequency05101520
Papa BearMama BearBaby Bear
010020030040050005101520
E. coli
2. TIGRTIGR
Topics of Discussion
• Introduction to phylogenomics
• Phylogenomics Examples
– Functional prediction
– Not making functional predictions
– Gene duplication
– Genetic exchange within genomes
– Gene loss
– Specialization
– Horizontal gene transfer
5. TIGRTIGR
Uses of Evolutionary Analysis in
Molecular Biology
• Identification of mutation patterns (e.g., ts/tv ratio)
• Amino-acid/nucleotide substitution patterns useful in
structural studies (e.g., rRNA)
• Sequence searching matrices (e.g., PAM, Blosum)
• Motif analysis (e.g., Blocks)
• Functional predictions
• Classifying multigene families
• Evolutionary history puts other information into
perspective (e.g., duplications, gene loss)
TIGRTIGR
6. TIGRTIGR
Evolutionary Studies Improve
Most Aspects of Genome Analysis
• Phylogeny of species places comparative data in perspective
• Evolution of genes and gene families
– Functional predictions
– Identification of orthologs and paralogs
– Species specific mutation patterns
• Evolution of pathways
– Convergence
– Prediction of function
• Evolution of gene order/genome rearrangements
• Phylogenetic distribution patterns
• Identification of novel features
7. TIGRTIGR
Genome Information and Analysis
Improves Studies of Evolution
• Complete genome information particularly useful
• Unbiased sampling
• More sequences of genes
• Presence/absence information needed to infer certain
events (e.g., gene loss, duplication)
• Genome wide mutation and substitution patterns (e.g.,
strand bias)
• Diversification and duplication
8. TIGRTIGR
Phylogenomic Analysis
• There are feedback loop between evolutionary and genome
analysis such that for many studies, genome and
evolutionary analyses are interdependent.
• Therefore, I have proposed that they actually be combined
into a single composite approach I refer to as
phylogenomics
• Phylogenomics involves combining evolutionary
reconstructions of genes, proteins, pathways, and species
with analysis of complete genome sequences.
12. TIGRTIGR
Predicting Function
• Identification of motifs
• Homology/similarity based methods
– Highest hit
– Top hits
– Clusters of orthologous groups
– HMM models
– Structural threading and modeling
– Evolutionary reconstructions
TIGRTIGR
13. TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common
ancestor (e.g., all globins)
• Orthologs: homologs that have diverged after speciation
events (e.g., human and chimp β-globins)
• Paralogs: homologs that have diverged after gene
duplication events (e.g., α and β globin).
• Xenologs: homologs that have diverged after lateral
transfer events
• Positional homology: common ancestry of specific amino
acid or nucleotide positions in different genes
14. TIGRTIGR
Phylogenomic Analysis of the
MutS Family of Proteins
• Published analysis
– Eisen JA et al. 1997. Nature Medicine
3(10):1076-1078.
– Eisen JA. 1998. Nucleic Acids Research 26(18):
4291-4300
16. TIGRTIGR
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with
much higher p value than other MutS
homologs
17. TIGRTIGR
H. pylori and MutS
• Prior to this genome, all species that
encoded a MutS homolog also encoded a
MutL homolog
• Experimental studies have shown MutS and
MutL always work together in mismatch
repair
• Problem: what do we conclude about H.
pylori mismatch repair
18. TIGRTIGR
Phylogenetic Tree of MutS Family
AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHuman
21. TIGRTIGR
Overlaying Functions onto Tree
AquaeTrepaRatFlyXenlaMouseHumanYeastNeucrArathBorbuSynspNeigoThemaStrpyBacsuEcoliTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuHumanCelegYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2
22. TIGRTIGR
Functional Prediction Using Tree
AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMSH1
Repair
in Mictochondria
MSH3
Repair of Loops
in Nucleus
MSH6
Repair of Mismatches
in Nucleus
MutS1
Repair of Loops and Mismatches
StrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4
Meiotic Crossing-Over
MSH5
Meiotic Crossing-Over
MutS2 Unknown FunctionsMSH2
Repair of Loops and Mismatches
in Nucleus
24. TIGRTIGR
Why was the MutS2 Family Missed?
Blast Search of Syn. sp. MutS#2
Sequences producing significant alignments: (bits) Value
sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17
sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16
sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14
sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14
sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14
• Blast search pulls up standard MutS genes
but with only a moderate p value (10-17
)
25. TIGRTIGR
Problems with Similarity Based
Functional Prediction
• Prone to database error propagation.
• Cannot identify orthologous groups reliably.
• Perform poorly in cases of evolutionary rate
variation and non-hierarchical trees (similarity will
not reflect evolutionary relationships in these cases)
• May be misled by modular proteins or large
insertion/deletion events.
• Are not set up to deal with expanding data sets.
TIGRTIGR
28. TIGRTIGR
Evolutionary
Method
PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous
37. TIGRTIGR
Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
38. TIGRTIGR
Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bacterial genomes of the similar size
40. TIGRTIGR
Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerevisiae 988
Drosophila 387
B. subtilits 284
S. pombe 116
Xenopus 56
C. elegans 25
A. thaliana 20
Methanogens 16
Haloferax 5
Giardia 0
42. TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Aids functional predictions
• Recent duplications may be indicative of species’
specific adaptations
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in
different parts of genome
47. TIGRTIGR
Levels of Paralogy Within A Genome
• All
– All members of a gene family are linked together
• Top matches
– Only top matching pairs are linked together.
Therefore, if in a large gene family, only the pair
from the most recent duplication event is included
• Recent
– Operational definition based on comparison to other
species. Only pairs which are more similar to each
other than to selected other species are included.
48. TIGRTIGR
C. pneumoniae Paralogs - All
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
49. TIGRTIGR
C. pneumoniae Paralogs - Top
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
50. TIGRTIGR
C. pneumoniae Paralogs – Recent
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
55. TIGRTIGR
Why Gene Loss is Useful to Identify
• Indicates that gene is not absolutely required for
survival
• Helps distinguish likelihood of gene transfers
• Correlated loss of same gene in different species
may indicate selective advantage of loss of that
gene
• Correlated loss of genes in a pathway indicates a
conserved association among those genes
57. TIGRTIGR
51234
E. coliH. influenzaeN. gonorrhoeaeH. pyloriSyn. spB. subtilisS. pyogenesM. pneumoniaeM. genitaliumA. aeolicusD. radioduransT. pallidumB.burgdorferiA. aeolicusS pyogenesB. subtilisSyn. spD. radioduransB. burgdorferiSyn. spB. subtilisS. pyogenesA. aeolicusD. radioduransB. burgdorferiMutS2MutS1A.B.Gene
Duplication
Gene
Duplication
Ancient Duplication in MutS Family
58. TIGRTIGR
Loss of MMR
• Lost in many pathogen species
• Mechanism of loss
– gene deletion (e.g., M. tuberculosis, H. pylori)
– frameshifts (e.g., N. meningitidis, S.
pneumoniae)
– some species have evolved systems to turn
MMR on and off depending on conditions (e.g.,
E. coli)
59. TIGRTIGR
Need for Phylogenomics Example:
Gene Duplication and Loss
• Genome analysis required to determine number of
homologs in different species
• Evolutionary analysis required to divide into
orthology groups and identify gene duplications
• Genome analysis is then required to determine
presence and absence of orthologs
• Then loss of orthologs can be traced onto
evolutionary tree of species
62. TIGRTIGR
Species Distribution of Homologs of
D. radiodurans Genes
01020304050600510152005010015005101520
Number of Species With High Hits050100150200250
Frequency05101520
Papa BearMama BearBaby Bear010020030040050005101520
E. coli
63. TIGRTIGR
Specialized Genetic Elements
(Chromosome II and Megaplasmid)
• Many two component systems
• Nitrogen metabolism
• LexA
• Ribonucleotide reductase
• UvrA2
• Many transcription factors (e.g., HepA)
• Iron metabolism
65. TIGRTIGR
V. cholerae vs. E. coli All Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
66. TIGRTIGR
V. cholerae vs. E. coli Top Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
67. TIGRTIGR
V. cholerae vs. E. coli
Only if EC-Orf is Closest in All Genomes
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
68. TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
71. TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
C
D
F
A’
B’
E’
E. coli
E. coli
B
C
D
F
A’
B’
D’
E’
V. cholerae
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
72. TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
73. TIGRTIGR C. trachomatis MoPn
C.pneumoniaeAR39
Origin
Termination
C. trachomatis vs C. pneumoniae Dot Plot
77. TIGRTIGR
Examples of Horizontal Transfers
• Antibiotic resistance genes on plasmids
• Insertion sequences
• Pathogenicity islands
• Toxin resistance genes on plasmids
• Agrobacterium Ti plasmid
• Viruses and viroids
• Organelle to nucleus transfers
78. TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in
environmental adaptations and/or pathogenicity
• Helps identify transposons, integrons, and other
vectors of gene transfer
• Helps identify species associations in the
environment
80. TIGRTIGR
How to Infer Gene Transfers
• Unusual distribution patterns
• Unusual nucleotide composition
• High sequence similarity to supposedly
distantly related species
• Unusual gene trees
• Observe transfer events
81. TIGRTIGR
E. coli and S. typhimurium Transfer
E. coliS. typhimuriumOld ModelE. coliS. typhimuriumNew Model
82. TIGRTIGR
Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**
Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal
Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%)
Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%)
SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%)
Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%)
Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%)
** 1010-5-5
over 60% of sequenceover 60% of sequence
83. TIGRTIGR
Evidence for lateral gene transfer inEvidence for lateral gene transfer in
ThermotogaThermotoga
1. 81 archaeal-like genes are clustered in 15 regions which
range in size from ~ 4 to 20 kb; many share conserved gene
order with their archaeal counterparts.
2. Many of the archaeal-like genes correspond to regions with
a significantly different base composition than the rest of
the chromosome.
3. Some of these regions are associated with a 30 bp repeat
structure found only in thermophiles.
4. Initial phylogenetic analyses of some of these genes lends
support to lateral gene transfer.
86. TIGRTIGR
A. thaliana T1E2.8 is a
Chloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPRYEASTCHLPNCHLTRAQUAECAMJEHELPYBBURTREPATHEMABACSUDEIRAMCYTUMCYTUSYNSPSYNSPODONT CPSTMYCGEMYCPNCHLPNCHLTRCHLPNCHLTRARCFUARCFUMETJAPYRHOMETTHMETTHYEASTYEASTYEASTYEASTCELEGYEASTYEASTYEASTCELEGYEASTYEASTCELEGYEASTCELEGCELEG
EukaryaArchaeaBacteriaCyano/Cpst
95. TIGRTIGR
Evolutionary Genome Scanning
• Distribution patterns/phylogenetic profiles
• Patterns of evolution (ds/dn, correlations, constraints)
• Lateral gene transfers (organellar genes, Pathogenicity islands)
• Subdividing gene families
• Functional predictions (gene trees, PG profiles)
• Gene duplications
• Gene loss
• Specialization
• Comparing close relatives
• Species evolution
96. TIGRTIGR
Evolutionary Diversity Still Poorly
Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted
97. TIGRTIGR
True Phylogenetic Methods
Work Best
MutS2.SynsMutS2.BacsMutS2.HelpMutS2.DeirMutsl.MettMSH4.CelegMSH4.YeastMSH4.humanmMutS.SacoMSH3.yeastC23C11.SpoMSH1.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastMSH5.HumanMSH5.CelegMSH5.YeastMSH2.HumanMSH2.MouseMSH2.YeastMutS.EcoliMutS.SynspMutS.DeiraMutS.Bacsu
MutS.EcoliMutS.SynspMutS.BacsuMutS.DeiraMSH2.HumanMSH2.MouseMSH2.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastC23C11.SpoMSH1.YeastMSH3.yeastMSH4.CelegMSH4.humanMSH5.CelegMSH5.YeastmMutS.SacoMSH5.HumanMSH4.YeastMutS2.SynsMutS2.BacsMutS2.DeirMutS2.HelpMutsl.Mett
UPGMANeighbor-Joining
98. TIGRTIGR
Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O. White,
A. Stoltzfus, J. Peterson
• Genome sequences and analysis: J. Heidelberg, T. Read, H.
Tettelin, K. Nelson, J. Peterson, R. Fleischmann, D. Bryant
• Horizontal transfers: K. Nelson, W. F. Doolittle
• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore
• $$$: DOE, NSF, NIH, ONR
99. TIGRTIGR
Evolutionary Diversity Still Poorly
Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted
101. TIGRTIGR
TIGTIG
RR
OtherOther
peoplepeople
Mom and DadMom and Dad
S. KarlinS. Karlin
M. FeldmanM. Feldman
A. M. CampbellA. M. Campbell
R. FernaldR. Fernald
R. ShaferR. Shafer
D. AckerlyD. Ackerly
D. GoldsteinD. Goldstein
M. EisenM. Eisen
J. CourcelleJ. Courcelle
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
NSFNSF
J. HeidelberJ. Heidelber
T.ReadT.Read
S. KaulS. Kaul
M-I BenitoM-I Benito
J. C. VenterJ. C. VenterC. FraseC. Fraser
S. SalzbergS. Salzberg
O. WhiteO. White
K. NelsonK. Nelson
$$$$$$
ONRONR
DOEDOE
NIHNIH
H. TettelinH. Tettelin
107. TIGRTIGR
Phylogenomics I:
Presence/Absence of Homologs
• Important to have complete genomes
• Similarity searches with high “homology
threshold” (to prevent false positives)
• Iterative searches (to prevent false negatives)
• Multiple sequence alignments to confirm
assignment of homology and to divide up
multi-domain proteins
108. TIGRTIGR
Phylogenomics II:
Phylogenetic Analysis of Homologs
• Multiple sequence alignment
• Mask alignment (exclude certain regions)
– ambiguous regions of alignment
– hypervariable regions and regions with large gaps
• Phylogenetic tree with method of choice
• Robustness checks
– bootstrapping
– compare trees with different alignments
– compare trees with different tree-building methods
109. TIGRTIGR
Phylogenomics III:
Inferring Evolutionary Events
• Infer evolutionary distribution patterns (overlay
presence/absence onto species tree)
• Compare gene tree vs. species tree
• Compare gene tree vs. evolutionary distribution
• Infer gene duplication and transfer events
• Combine gene transfer and duplication information with
evolutionary distribution analysis to infer gene loss, gene
origin, and timing of gene duplications and transfers
110. TIGRTIGR
Phylogenomics IV:
Functional Predictions and Evolution
• Overlay experimentally determined functions
onto gene tree
• Infer changes in function
– many changes suggests caution should be used in
making new predictions
• Predict functions based on position in tree
relative to genes with known functions and
based on orthology groups
111. TIGRTIGR
Phylogenomics V:
Pathway Analysis
• Correlated presence/absence of all genes in pathway in different
species?
– If not, maybe non-orthologous gene displacement
– Alternatively, pathway may be different between species
• Correlated evolutionary events for genes in pathway
– loss of all genes at once
– correlated duplications?
• Compare evolution of function between pathways
– The number of times an activity has evolved helps in making
predictions of function/phenotype
112. TIGRTIGR
Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete genomes
• Phylogenetic trees of each gene family
• Infer evolutionary events (gene origin, duplication, loss and transfer)
• Refine presence/absence (orthologs, paralogs, subfamilies)
• Functional predictions and functional evolution
• Analysis of pathways
113. TIGRTIGR
Evolution as a Screening
Method
• Gene duplications
• Gene loss
• Lateral gene transfers
• Organellar genes
• Structurally constrained genes
• Correlated evolutionary changes