SlideShare une entreprise Scribd logo
1  sur  73
Phylogenetics Workshop 
Part I : Introduction 
De Landtsheer Sébastien, University of Luxemburg 
Ahead of the BeNeLux Bioinformatics Conference 2011
Outline of the Workshop 
Part I : 
• General introduction 
• Alignments 
• Distance-based methods 
Part II : 
• Maximum likelihood trees 
• Bayesian trees 
Part III : 
• Advanced bayesian phylogenetics 
• Hypothesis testing
Outline of Part I 
• General introduction : what is 
phylogenetics ? 
• Basic DNA alignment algorithm 
• Distance matrices 
• Distance-based tree inference methods
Software featured in Part I 
• Seaview (http://pbil.univ-lyon1.fr/software/seaview.html) 
• BioEdit (http://www.mbio.ncsu.edu/bioedit/bioedit.html) 
• MEGA (http://www.megasoftware.net/) 
• FigTree (http://tree.bio.ed.ac.uk/software/figtree/)
What is Phylogenetics ? 
• Classification of living species into 
categories 
• Study of characters → states 
• Underlying assumption of evolution 
(cladogram / dendrogram)
What is Phylogenetics 
• Characters : 
– Morphological 
– Biochemical 
– Genetic 
• States : 
– Continuous 
– Discontinous
Different types of Phylogenetic trees 
• Phylogenetic tree : graphical representation of 
our hypothesis about the evolution of a group of 
organisms 
• Can represent different quantities (time/genetic 
distance) and be displayed in different ways 
• There are several possible methods, and there 
is no single method that is best
Phylogenetic trees jargon 
Internal 
branches 
Root 
(if there is) 
Node 
Terminal 
branches 
Leaves or 
Tips or 
OTUs
Properties of Phylogenetic trees 
• Rooted vs Unrooted
Properties of Phylogenetic trees 
• The real face of unrooted trees 
=
Properties of Phylogenetic trees 
• The real face of unrooted trees : undirected 
= 
Multiple possibilities for rooting the tree
Properties of Phylogenetic trees 
• Where to place the root ? 
– Midpoint rooting : equally distant from the two most distantly 
related taxa on the tree. Makes sense but more often than not it 
is wrong 
– Outgroup : using one distantly related taxon (uncontroversial) 
• Marsupial for eutherian study 
• Treeshrew for primate study 
• SIV for HIV study
Properties of Phylogenetic trees 
• How to root unrooted trees ? 
1) Midpoint rooting 
= 
Assumes that the rates of evolution have stayed +/- constant
Properties of Phylogenetic trees 
• How to root unrooted trees ? 
2) Using an outgroup 
= 
Problem : difficult to find the proper outgroup 
(not ambiguous choice but still not too distant)
Properties of Phylogenetic trees 
• Rooted trees tell a story (directed) 
Most Recent Common Ancestor (MRCA)
Properties of Phylogenetic trees 
• Branch swapping : only horizontal distance matters 
=
Properties of Phylogenetic trees 
• Many topologies are always possible : 
Number of possible rooted trees for n sequences 
= (2n-3)! / (2n-2 (n-2))! 
2 sequences: 1 
3 sequences: 3 
4 sequences: 15 
5 sequences: 105 
6 sequences: 954 
7 sequences: 10395 
8 sequences: 135135 
9 sequences: 2027025 
10 sequences: 34459425 
51 sequences: >1080 (nb of particles in the universe)
DNA alignments 
• Aligning two sequences: the Needleman–Wunsch 
algorithm 
– Construct a similarity matrix 
– Assign similarity scores based on an arbitrary scoring system 
– Finds the best GLOBAL alignment between two sequence = the 
maximum number of residues from one sequence that can be 
aligned with the other one
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 
G 0 
A 0 
C 0 
T 0 
C 0 
G 0 
T 0
DNA alignments 
• The score in one cell is the maximum of different 
possibilities : 
– 0 
– The upper left cell plus the value of the similarity between the 
two residues 
– The upper cell plus the value of a gap (in the upper sequence) 
– The left cell plus the value of a gap (in the left sequence) 
Hi,j = max { Hi-1,j-1+s(ai,bj), Hi,j-1+Pg(k), Hi-1,j+Pg(k) } 
There is a penality for gap opening and for gap extension
DNA alignments 
• For the example we will use the following scoring matrix : 
– Identity : +1 
– Gap : 0 
• In real life ClustalW uses different scoring matrices 
depending the code (AA or DNA) and can be set to use 
word matches (k-tuples). All parameters are editable
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 0 
G 0 
A 0 
C 0 
T 0 
C 0 
G 0 
T 0
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 0 1 1 2 2 2 2 2 3 
G 0 
A 0 
C 0 
T 0 
C 0 
G 0 
T 0
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 0 1 1 2 2 2 2 2 3 
G 0 0 1 
A 0 1 1 
C 0 1 1 
T 0 1 2 
C 0 1 2 
G 0 1 2 
T 0 1 3
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 0 1 1 2 2 2 2 2 3 
G 0 0 1 2 2 2 2 2 3 3 
A 0 1 1 2 2 3 3 3 3 3 
C 0 1 1 2 2 3 4 4 4 4 
T 0 1 2 2 3 3 4 4 4 5 
C 0 1 2 2 3 3 4 5 5 5 
G 0 1 2 3 3 3 4 5 6 6 
T 0 1 3 3 4 3 4 5 6 7
DNA alignments 
A T G T A C C G T 
0 0 0 0 0 0 0 0 0 0 
T 0 0 1 1 2 2 2 2 2 3 
G 0 0 1 2 2 2 2 2 3 3 
A 0 1 1 2 2 3 3 3 3 3 
C 0 1 1 2 2 3 4 4 4 4 
T 0 1 2 2 3 3 4 4 4 5 
C 0 1 2 2 3 3 4 5 5 5 
G 0 1 2 3 3 3 4 5 6 6 
T 0 1 3 3 4 3 4 5 6 7
DNA alignments 
• Final sequence : 
A T G T A C - C G T 
- T G - A C T C G T
DNA alignments 
• More technological alignment methods include : 
– T-COFFEE computes a tree that is the consistent with the 
pairwise alignments scores computed from a variety of sources. 
Computationnaly intensive (not good for big datasets) 
– MUSCLE is an iterative refinement algorithm. Very fast 
– MAFFT uses fast Fourier Transform to detect homologous 
regions. Very fast 
– Genetic Algorithms (ex : SAGA) generates a population of 
alignments that evolves according to selection and crossing. 
Very slow but allows to define custom scoring functions. Need to 
be run several times (stochastic) 
– Hidden Markov models (HMMs) used to be innacurate methods. 
They are better now but still slow and difficult to use
DNA alignments 
• Good practice for alignments : 
– Use a variety of algorithms 
– Align at the nucleotide but also at the amino acid level 
(TranslatorX or manually) 
– Compare the different outputs 
– Check manualy : 
• Consistancy given ORF (frame-shift) 
• Sequencing errors 
– The alignment also can be seen as an hypothesis, 
therefore it needs to make sense from the biological 
point of view : genes have to be HOMOLOGS (share 
ancestry)
Building trees with distance methods 
• The distance between 2 sequences can be calculated in 
different ways: 
– number of differences 
– according to a substitution model 
• The clustering can be achieved in different ways: 
– UPGMA 
– Neighbor-joining 
– (Parsimony)
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
1. Calculate the pairwise distance matrix 
A B C D E F 
A 0 1 3 6 7 10 
B 1 0 3 6 7 10 
C 3 3 0 5 6 9 
D 6 6 5 0 1 7 
E 7 7 6 1 0 8 
F 10 10 9 7 8 0
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
2. Group the 2 most closely related sequences 
A B C D E F 
A 0 1 3 6 7 10 
B 1 0 3 6 7 10 
C 3 3 0 5 6 9 
D 6 6 5 0 1 7 
E 7 7 6 1 0 8 
F 10 10 9 7 8 0 
A 
B 
0.5 
0.5
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
3. Recalculate the distance matrix and take the next smallest distance 
A/B C D E F 
A/B 0 3 6 7 10 
C 3 0 5 6 9 
D 6 5 0 1 7 
E 7 6 1 0 8 
F 10 9 7 8 0 
A 
B 
0.5 
0.5 
D 
E 
0.5 
0.5
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
3. Recalculate the distance matrix and take the next smallest distance 
A 
B 
0.5 
0.5 
D 
E 
0.5 
0.5 
A/B C D/E F 
1 
A/B 0 3 6.5 10 
C 3 0 5.5 9 
D/E 6.5 5.5 0 7.5 
F 10 9 7.5 0 1.5 
C
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
3. Recalculate the distance matrix and take the next smallest distance 
A 
B 
0.5 
0.5 
D 
E 
0.5 
0.5 
C 
1 
1.5 
A/B/ 
C D/E F 
A/B/C 0 6 9.5 
D/E 6 0 7.5 
F 9.5 7.5 0 
1.5 
2.5
Building trees with distance methods 
• Building a UPGMA tree with the number of differences : 
3. Recalculate the distance matrix and take the next smallest distance 
A 
B 
0.5 
0.5 
D 
E 
0.5 
0.5 
C 
1 
1.5 
1.5 
2.5 
A/B/C/D/E F 
A/B/C/D/E 0 8.5 
F 8.5 0 
4.25 F 
1.25
Building trees with distance methods 
• Assumption of the UPGMA method : constant rate of evolution 
across time and for all branches. This assumption is frequently 
violated in real-life datasets and therefore the UPGMA can find a 
wrong tree. 
• How can we relax this assumption ? We calculate the total 
divergence for each tip and compute a corrected distance matrix 
• Starting from a star-like tree, we create branches to minimize the 
length of the tree and agglomeratively join the closest neighbors 
=> Neighbor-joining
Building trees with distance methods 
• Building a Neighbog-Joining tree with the number of differences 
A 
B 
1 
4 
1 TRUE topology where 
D 
E 
3 
2 
C 
1 
2 
1 
1 
4 F 
B has accumulated 4 
times as much 
mutations as A since 
their divergence
Building trees with distance methods 
• Building a Neighbog-Joining tree with the number of differences 
A 
B 
1 
4 
D 
E 
3 
2 
C 
1 
2 
1 
1 
4 F 
1 
A B C D E F 
A 0 5 4 7 6 8 
B 5 0 7 10 9 11 
C 4 7 0 7 6 8 
D 7 10 7 0 5 9 
E 6 9 6 5 0 8 
F 8 11 8 9 8 0 
UPGMA would cluster A and C 
together because B is more 
distant
Building trees with distance methods 
• A global divergence is calculated by summing all distances, and a 
new distance matrix is computed 
A B C D E F 
A 0 5 4 7 6 8 
B 5 0 7 10 9 11 
C 4 7 0 7 6 8 
D 7 10 7 0 5 9 
E 6 9 6 5 0 8 
F 8 11 8 9 8 0 
Div 30 42 32 38 34 44 
A B C D E F 
A 0 -13 -11.5 -10 -10 -10.5 
B -13 0 -11.5 -10 -10 -10.5 
C -11.5 -11.5 0 -10.5 -10.5 -11 
D -10 -10 -10.5 0 -13 -11.5 
E -10 -10 -10.5 -13 0 -11.5 
F -10.5 -10.5 -11 -11.5 -11.5 0 
Div(A) = Σi dist(A,i) = 5+4+7+6+8 = 30 
Div(B) = Σi dist(B,i) = 5+7+10+9+11 = 42 
Div(C) = Σi dist(C,i) = 32 
Div(D) = Σi dist(D,i) = 38 
Div(E) = Σi dist(E,i) = 34 
Div(F) = Σi dist(F,i) = 44 
M(i,j) = dist(i,j)-(Div(i)+Div(j))/N-2 
M(A,B) = 5-(30+42)/4 = -13 
M(A,C) = 4-(30+32)/4=-11.5 
etc…
Building trees with distance methods 
• Starting with a star-like tree, the nodes are created sequentially 
A 
B 
C 
D 
E 
F 
A 
B 
C 
D 
E 
F 1 4 
…
Advantages and disadvantages of 
the Neighbor-Joining method 
• Fast method that will always produce a reasonnable tree. Always 
produces the same tree if the same alignment is used 
• Relaxes the most irrealistic assumptions of the UPGMA 
• Long Branches Attraction : two taxa with similar converging 
properties (increased GC content or high evolutionary rates) will 
have the tendency to group together
How to test the reliability of trees ? 
• One popular method : BOOTSTRAPPING 
– Randomly generates new alignment from the original one, by drawing 
positions with replacement 
– The new alignments will have the same length, but slightly different 
composition than the original one (i.e. some positions will be represented 
more than once and some positions will be omitted) 
– Tree reconstruction is applied to these new alignment. 
– The clustering in the original tree are investigated, to see how often they 
occur in the bootstrapped trees. The more a group appears, the more 
that node is supported by a high bootstrap value
How to test the reliability of trees ? 
• Bootstrapping example : 1) The Data 
x y 
1 0.969977 
2 1.744463 
3 3.073277 
4 4.510589 
5 5.471489 
6 5.599175 
7 7.03988 
8 7.812655 
9 8.913299 
10 9.971481 
11 9.98552 
12 10.24078 
13 10.59902 
14 12.61131 
15 12.63132 
16 13.83974 
17 16.03453 
18 17.27271 
19 19.25622 
20 19.26901 
Original Data 
y = 0.9176x + 0.2072 
R2 = 0.9794 
20 
18 
16 
14 
12 
10 
8 
6 
4 
2 
0 
0 2 4 6 8 10 12 14 16 18 20 
X 
Y
How to test the reliability of trees ? 
• Bootstrapping example : 2) Resampling
How to test the reliability of trees ? 
• Bootstrapping example : 3) Analyse the Resamples
How to test the reliability of trees ? 
• Boostrapping example : 4) Assess the reliability of the original 
estimates with the dispersion of the estimates of the resamples 
Original Data + Bootstraps 
20 
18 
16 
14 
12 
10 
8 
6 
4 
2 
0 
0 2 4 6 8 10 12 14 16 18 20 
X 
Y
How to test the reliability of trees ? 
• BOOTSTRAPPING : 
Taxon A : ATGCGAGTTTAGCAG 
Taxon B : ATGCGAGCTTAACTG 
Taxon C : ATACTAGCTTAGCTG 
Taxon D : ATGCTATCTTAGGTG 
Alignment s1 
Alignment s2 
Alignment s3 
Alignment s4 
AB 
CD 
AB 
CD 
AB 
CD 
AB 
CD 
AB 
CD 
A+B : 4/4 = 100% 
C+D : 3/4 = 75% 
A+B+C+D : 4/4 = 100% 
A 
B 
C 
D 
100 
100 
75
Genetic distances 
• A multitude of forces act on sequences (mutation, selection, drift) 
and therefore two sequences coming from a common ancestor will 
diverge with time 
• The problem with counting the number of difference (p-distance) is 
that it does not take into account multiple substitutions on the same 
site 
• Therefore we need to model the substitution process 
=> time-homogenous continuous stationary Markov Process
Genetic distances 
Example : double substitution 
ATGTCTTTG ATGTCGTTG 
ATGTCATTG 
* * 
ATGTCATTG ATGTCATTG 
p-distance = 1 but 2 substitutions occured !
Genetic distances 
Example : back-mutation 
ATGTCTTTG ATGTCATTG 
ATGTCATTG 
* * 
ATGTCATTG ATGTCATTG 
p-distance = 0 but 2 substitutions occured !
Genetic distances 
Example : convergence 
ATGTCTTTG ATGTCTTTG 
ATGTCATTG 
* 
ATGTCATTG ATGTCTTTG 
p-distance = 0 but 2 substitutions occured ! 
*
Genetic distances 
• How does the p-distance correlates with speciation time ? 
When we look at the divergence of proteins in distantly related 
organisms, we expect a linear relation (e.g. the more distant 
organisms share less and less identities) 
=> correct but we always underestimate the genetic 
distance if we only count the number of differences
Genetic distances 
• How does the p-distance correlates with speciation time ? 
Observed p difference 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
0 0.5 1 1.5 2 2.5 3 
Non-linear relation because of multiple, parallel, and back-substitutions
Genetic distances 
• How to model sequence evolution ? (Jukes and Cantor, 1969) 
– All possible substitutions have the same probability 
– All 4 nucleotides have the same frquency = 25% 
– The chance for a particular substitution is a simple function of time 
– The chance for a nucleotide to not change is therefore a decreasing 
function of time 
– Two random sequences (diverged for an infinite time) will still have 25% 
identity (there are only 4 nucleotides)
Genetic distances 
• The JC69 matrix : 
to A C G T 
from 
A ¼+3/4*X ¼-1/4*X ¼-1/4*X ¼-1/4*X 
C ¼-1/4*X ¼+3/4*X ¼-1/4*X ¼-1/4*X 
G ¼-1/4*X ¼-1/4*X ¼+3/4*X ¼-1/4*X 
T ¼-1/4*X ¼-1/4*X ¼-1/4*X ¼+3/4*X 
X = e-μ.t 
Sums of columns = sums of lines : the rate of appearance of 
nucleotides 
is the same as the rate of disparition (nucleotides are at equilibrium)
Genetic distances 
• How to model sequence evolution ? (Jukes and Cantor, 1969) 
– Example : we count 20 differences between two 100bp-long sequences 
• d = -3/4 * ln( 1 - 4/3 * p ) 
• p = 0.2 
• d = 0.232 
• => there are 3 mutations that have occured but that we do not see, because 
they have occured in a position where another mutation had already occured 
– Does this now efficiently model the substitution process ?
Genetic distances 
• How to model sequence evolution ? Some facts 
– Transitions are more likely than transversions 
purines pyrimidines 
A T 
G C
Genetic distances 
• How to model sequence evolution ? Some facts 
– Not all positions evolve at the same rate : 
the chance for an amino acid change is 
different for the third position than for the 
other positions
Genetic distances 
• How to model sequence evolution ? Some facts 
– Not all positions evolve at the same rate : 
some codons are under strong purifying 
selection, while some other are under 
diversifying selection 
=> they do not evolve at the same rate
Genetic distances 
• How to model sequence evolution ? 
– Better models have been designed to take into account the individuality 
of each substitution rate. 
– Rate heterogeneity models take into account the inter-position 
differences. Some positions are allowed to evolve faster than other 
– Genomes have their proper nucleotide compositions (GC-content)
Genetic distances 
• Some models of nucleotide substitution 
- JC69 : a=b=c=d=e=f 
A=C=G=T=1/4 
- K80 : b=e, a=c=d=f 
A=C=G=T=1/4 
- HKY85 : b=e, a=c=d=f 
A ≠ C ≠ G ≠ T 
- TN93 : b, e, a=c=d=f 
A ≠ C ≠ G ≠ T 
- GTR : a, b, c, d, e, f 
A ≠ C ≠ G ≠ T 
More models are possible 
(12-parameters, codons) but 
are generally not used
Genetic distances 
• Site heterogeneity models 
– Usually described as a Gamma distribution (discretized in 4 – 10 
categories) 
– An arbitrary proportion of invariant sites is sometimes added 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
k=1 
k=1.5 
k=3 
k=5 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Genetic distances 
• Which model to chose ? 
– The simplest models make irrealistic assumptions 
– Why don’t we choose always the most complex models ? 
• Difficult to compute 
• Parameters values difficult to get from the data 
• Danger of overfitting
HOLY DATA 
MODEL 
HYPOTHESIS
Genetic distances 
• Overfitting 
Measurement of some phenomenon 
90 
80 
70 
60 
50 
40 
30 
20 
10 
0 
0 1 2 3 4 5 6 7 8 9 10 
Input 
Output
Genetic distances 
• Overfitting 
R2 = 1 !?
Genetic distances 
• How to chose the appropriate model ? 
– Likelihood ratio tests for nested models (more on that later) 
– Many information criteria have been designed (more on that later also) 
– Trial-and error, depends of the dataset 
• more data -> more complex model 
• better data -> more complex models 
• Litterature search
File formats (flat files) 
• FASTA (.fas, .fst, .fasta) 
Most common sequence format, no header 
>seq1 
ATCGTGCATACGAGCT 
>seq2 
ATCGTGCATACGACGT 
>seq3 
ATCGTGCATACGAAGT
File formats (flat files) 
• NEXUS (.nex) 
Contains blocks with sequence and tree information 
#NEXUS 
Begin Data; 
Dimensions ntax=3 nchar =16; 
Format datatype=Nucleotide gap=-; 
[insert comment here] 
seq1 ATCGTGCATACGAGCT 
seq2 ATCGTGCATACGACGT 
seq3 ATCGTGCATACGAAGT 
End;
Practicals : Phylogenetics Part I 
1. Download the file « PrimatesNuc_1.txt». Open it and identify its format. Rename it 
with the correct extension. 
2. Load the file in BioEdit and run a multiple alignment (select all sequences then click 
« Accessory application -> ClustalW multiple alignment »). Save the resulting file 
3. Load the original file in Seaview and check alignment options (Align -> Alignment 
options). Select ClustalW2 and run a multiple alignment (Align all). Save the resulting 
file. Then, reload the original data, change the option to Muscle and run the alignment 
again. Save this file too 
4. To generate an consistency-based alignment with T-COFFEE, access the web page 
http://www.tcoffee.org/, submit the original data and save the resulting alignment 
5. To generate an alignment with MAFFT, access the web page 
http://mafft.cbrc.jp/alignment/server/index.html, submit the original data with the 
default options, and save the resulting alignment 
6. Now we can compare the alignments obtained by the different methods. Access the 
web page http://bibiserv.techfak.uni-bielefeld.de/altavist/, select option 2 for 
comparing two alignments and compare the different alignments you produced. Which 
alignment is the most different ? Which are the most identical ? Can you guess why ? 
Open the alignments in BioEdit and spot the differences.
Practicals : Phylogenetics Part I 
7. Open MEGA. Import the MAFFT alignment. Open it as « analyse », consider it 
« nucleotides », « coding sequence », with the standard genetic code. Press F4 to 
open the alignment explorer. Try the different options in the « Statistics » menu. 
8. In the « Models » menu, select « Find Best DNA/Protein Model (ML) ». Leave the 
default options and run. Which model has the best likelihood ? Which model is the 
most appropriate ? 
9. Go to the « Distances » menu and select « Compute pairwise distances ». Now 
select the proper options for this analysis (substitution model and site heterogeneity 
model). Are chimps closer to humans or to gorillas ? (you might need to export the 
data to Excel) 
10. Go to the « Phylogeny » menu and select « Construct/Test UPGMA Tree ». Leave 
the default options and compute. Does the human/chimp/gorilla clustering fit with 
your knowledge ? Redo the analysis with appropriate options (substitution model 
and site heterogeneity model). Does it get any better ? 
11. Go to the « Phylogeny » menu and select « Construct/Test Neighbor-Joining Tree ». 
Select the appropriate options and compute the tree. Do chimps cluster with 
humans or gorillas ? Being able to explain is important 
12. Try the same with the ClustalW alignment. Draw some conclusions for yourself
Practicals : Phylogenetics Part I 
13. Which of these 4 unrooted trees does not have the same topology as the 3 other 
ones ?

Contenu connexe

Tendances

Distance based method
Distance based method Distance based method
Distance based method Adhena Lulli
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic treesmartyynyyte
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofbhavnesthakur
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis Nitin Naik
 
Algorithm research project neighbor joining
Algorithm research project neighbor joiningAlgorithm research project neighbor joining
Algorithm research project neighbor joiningJay Mehta
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
Yeast two hybrid system
Yeast two hybrid systemYeast two hybrid system
Yeast two hybrid systemiqraakbar8
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function predictionLars Juhl Jensen
 

Tendances (20)

Distance based method
Distance based method Distance based method
Distance based method
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
artificial neural network-gene prediction
artificial neural network-gene predictionartificial neural network-gene prediction
artificial neural network-gene prediction
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Phylogenetics: Tree building
Phylogenetics: Tree buildingPhylogenetics: Tree building
Phylogenetics: Tree building
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
BLAST
BLASTBLAST
BLAST
 
Algorithm research project neighbor joining
Algorithm research project neighbor joiningAlgorithm research project neighbor joining
Algorithm research project neighbor joining
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
Yeast two hybrid system
Yeast two hybrid systemYeast two hybrid system
Yeast two hybrid system
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 

En vedette

2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
Introduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for BioinformaticsIntroduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for Bioinformaticsibogicevic
 
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic TreesBIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic TreesJonathan Eisen
 
What is a phylogenetic tree
What is a phylogenetic treeWhat is a phylogenetic tree
What is a phylogenetic treeislam jan buneri
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in RKlaus Schliep
 

En vedette (12)

Distance
DistanceDistance
Distance
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
Introduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for BioinformaticsIntroduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for Bioinformatics
 
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic TreesBIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
What is a phylogenetic tree
What is a phylogenetic treeWhat is a phylogenetic tree
What is a phylogenetic tree
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
Parsimony analysis
Parsimony analysisParsimony analysis
Parsimony analysis
 

Similaire à Phylogenetics1

2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekingeProf. Wim Van Criekinge
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...ABINASHPADHY6
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecastingDevon Barrow
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxChijiokeNsofor
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
SA of Genome_YanzheYin
SA of Genome_YanzheYinSA of Genome_YanzheYin
SA of Genome_YanzheYinYanzhe Yin
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshellAvinash Kumar
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
DNA cell cycle by flow cytometry
DNA cell cycle by flow cytometryDNA cell cycle by flow cytometry
DNA cell cycle by flow cytometryRichard Hastings
 

Similaire à Phylogenetics1 (20)

Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
BioINfo.pptx
BioINfo.pptxBioINfo.pptx
BioINfo.pptx
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
SA of Genome_YanzheYin
SA of Genome_YanzheYinSA of Genome_YanzheYin
SA of Genome_YanzheYin
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshell
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
DNA cell cycle by flow cytometry
DNA cell cycle by flow cytometryDNA cell cycle by flow cytometry
DNA cell cycle by flow cytometry
 

Dernier

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Dernier (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Phylogenetics1

  • 1. Phylogenetics Workshop Part I : Introduction De Landtsheer Sébastien, University of Luxemburg Ahead of the BeNeLux Bioinformatics Conference 2011
  • 2. Outline of the Workshop Part I : • General introduction • Alignments • Distance-based methods Part II : • Maximum likelihood trees • Bayesian trees Part III : • Advanced bayesian phylogenetics • Hypothesis testing
  • 3. Outline of Part I • General introduction : what is phylogenetics ? • Basic DNA alignment algorithm • Distance matrices • Distance-based tree inference methods
  • 4. Software featured in Part I • Seaview (http://pbil.univ-lyon1.fr/software/seaview.html) • BioEdit (http://www.mbio.ncsu.edu/bioedit/bioedit.html) • MEGA (http://www.megasoftware.net/) • FigTree (http://tree.bio.ed.ac.uk/software/figtree/)
  • 5. What is Phylogenetics ? • Classification of living species into categories • Study of characters → states • Underlying assumption of evolution (cladogram / dendrogram)
  • 6. What is Phylogenetics • Characters : – Morphological – Biochemical – Genetic • States : – Continuous – Discontinous
  • 7. Different types of Phylogenetic trees • Phylogenetic tree : graphical representation of our hypothesis about the evolution of a group of organisms • Can represent different quantities (time/genetic distance) and be displayed in different ways • There are several possible methods, and there is no single method that is best
  • 8. Phylogenetic trees jargon Internal branches Root (if there is) Node Terminal branches Leaves or Tips or OTUs
  • 9. Properties of Phylogenetic trees • Rooted vs Unrooted
  • 10. Properties of Phylogenetic trees • The real face of unrooted trees =
  • 11. Properties of Phylogenetic trees • The real face of unrooted trees : undirected = Multiple possibilities for rooting the tree
  • 12. Properties of Phylogenetic trees • Where to place the root ? – Midpoint rooting : equally distant from the two most distantly related taxa on the tree. Makes sense but more often than not it is wrong – Outgroup : using one distantly related taxon (uncontroversial) • Marsupial for eutherian study • Treeshrew for primate study • SIV for HIV study
  • 13. Properties of Phylogenetic trees • How to root unrooted trees ? 1) Midpoint rooting = Assumes that the rates of evolution have stayed +/- constant
  • 14. Properties of Phylogenetic trees • How to root unrooted trees ? 2) Using an outgroup = Problem : difficult to find the proper outgroup (not ambiguous choice but still not too distant)
  • 15. Properties of Phylogenetic trees • Rooted trees tell a story (directed) Most Recent Common Ancestor (MRCA)
  • 16. Properties of Phylogenetic trees • Branch swapping : only horizontal distance matters =
  • 17. Properties of Phylogenetic trees • Many topologies are always possible : Number of possible rooted trees for n sequences = (2n-3)! / (2n-2 (n-2))! 2 sequences: 1 3 sequences: 3 4 sequences: 15 5 sequences: 105 6 sequences: 954 7 sequences: 10395 8 sequences: 135135 9 sequences: 2027025 10 sequences: 34459425 51 sequences: >1080 (nb of particles in the universe)
  • 18. DNA alignments • Aligning two sequences: the Needleman–Wunsch algorithm – Construct a similarity matrix – Assign similarity scores based on an arbitrary scoring system – Finds the best GLOBAL alignment between two sequence = the maximum number of residues from one sequence that can be aligned with the other one
  • 19. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 G 0 A 0 C 0 T 0 C 0 G 0 T 0
  • 20. DNA alignments • The score in one cell is the maximum of different possibilities : – 0 – The upper left cell plus the value of the similarity between the two residues – The upper cell plus the value of a gap (in the upper sequence) – The left cell plus the value of a gap (in the left sequence) Hi,j = max { Hi-1,j-1+s(ai,bj), Hi,j-1+Pg(k), Hi-1,j+Pg(k) } There is a penality for gap opening and for gap extension
  • 21. DNA alignments • For the example we will use the following scoring matrix : – Identity : +1 – Gap : 0 • In real life ClustalW uses different scoring matrices depending the code (AA or DNA) and can be set to use word matches (k-tuples). All parameters are editable
  • 22. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 0 G 0 A 0 C 0 T 0 C 0 G 0 T 0
  • 23. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 0 1 1 2 2 2 2 2 3 G 0 A 0 C 0 T 0 C 0 G 0 T 0
  • 24. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 0 1 1 2 2 2 2 2 3 G 0 0 1 A 0 1 1 C 0 1 1 T 0 1 2 C 0 1 2 G 0 1 2 T 0 1 3
  • 25. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 0 1 1 2 2 2 2 2 3 G 0 0 1 2 2 2 2 2 3 3 A 0 1 1 2 2 3 3 3 3 3 C 0 1 1 2 2 3 4 4 4 4 T 0 1 2 2 3 3 4 4 4 5 C 0 1 2 2 3 3 4 5 5 5 G 0 1 2 3 3 3 4 5 6 6 T 0 1 3 3 4 3 4 5 6 7
  • 26. DNA alignments A T G T A C C G T 0 0 0 0 0 0 0 0 0 0 T 0 0 1 1 2 2 2 2 2 3 G 0 0 1 2 2 2 2 2 3 3 A 0 1 1 2 2 3 3 3 3 3 C 0 1 1 2 2 3 4 4 4 4 T 0 1 2 2 3 3 4 4 4 5 C 0 1 2 2 3 3 4 5 5 5 G 0 1 2 3 3 3 4 5 6 6 T 0 1 3 3 4 3 4 5 6 7
  • 27. DNA alignments • Final sequence : A T G T A C - C G T - T G - A C T C G T
  • 28. DNA alignments • More technological alignment methods include : – T-COFFEE computes a tree that is the consistent with the pairwise alignments scores computed from a variety of sources. Computationnaly intensive (not good for big datasets) – MUSCLE is an iterative refinement algorithm. Very fast – MAFFT uses fast Fourier Transform to detect homologous regions. Very fast – Genetic Algorithms (ex : SAGA) generates a population of alignments that evolves according to selection and crossing. Very slow but allows to define custom scoring functions. Need to be run several times (stochastic) – Hidden Markov models (HMMs) used to be innacurate methods. They are better now but still slow and difficult to use
  • 29. DNA alignments • Good practice for alignments : – Use a variety of algorithms – Align at the nucleotide but also at the amino acid level (TranslatorX or manually) – Compare the different outputs – Check manualy : • Consistancy given ORF (frame-shift) • Sequencing errors – The alignment also can be seen as an hypothesis, therefore it needs to make sense from the biological point of view : genes have to be HOMOLOGS (share ancestry)
  • 30. Building trees with distance methods • The distance between 2 sequences can be calculated in different ways: – number of differences – according to a substitution model • The clustering can be achieved in different ways: – UPGMA – Neighbor-joining – (Parsimony)
  • 31. Building trees with distance methods • Building a UPGMA tree with the number of differences : 1. Calculate the pairwise distance matrix A B C D E F A 0 1 3 6 7 10 B 1 0 3 6 7 10 C 3 3 0 5 6 9 D 6 6 5 0 1 7 E 7 7 6 1 0 8 F 10 10 9 7 8 0
  • 32. Building trees with distance methods • Building a UPGMA tree with the number of differences : 2. Group the 2 most closely related sequences A B C D E F A 0 1 3 6 7 10 B 1 0 3 6 7 10 C 3 3 0 5 6 9 D 6 6 5 0 1 7 E 7 7 6 1 0 8 F 10 10 9 7 8 0 A B 0.5 0.5
  • 33. Building trees with distance methods • Building a UPGMA tree with the number of differences : 3. Recalculate the distance matrix and take the next smallest distance A/B C D E F A/B 0 3 6 7 10 C 3 0 5 6 9 D 6 5 0 1 7 E 7 6 1 0 8 F 10 9 7 8 0 A B 0.5 0.5 D E 0.5 0.5
  • 34. Building trees with distance methods • Building a UPGMA tree with the number of differences : 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 A/B C D/E F 1 A/B 0 3 6.5 10 C 3 0 5.5 9 D/E 6.5 5.5 0 7.5 F 10 9 7.5 0 1.5 C
  • 35. Building trees with distance methods • Building a UPGMA tree with the number of differences : 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 C 1 1.5 A/B/ C D/E F A/B/C 0 6 9.5 D/E 6 0 7.5 F 9.5 7.5 0 1.5 2.5
  • 36. Building trees with distance methods • Building a UPGMA tree with the number of differences : 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 C 1 1.5 1.5 2.5 A/B/C/D/E F A/B/C/D/E 0 8.5 F 8.5 0 4.25 F 1.25
  • 37. Building trees with distance methods • Assumption of the UPGMA method : constant rate of evolution across time and for all branches. This assumption is frequently violated in real-life datasets and therefore the UPGMA can find a wrong tree. • How can we relax this assumption ? We calculate the total divergence for each tip and compute a corrected distance matrix • Starting from a star-like tree, we create branches to minimize the length of the tree and agglomeratively join the closest neighbors => Neighbor-joining
  • 38. Building trees with distance methods • Building a Neighbog-Joining tree with the number of differences A B 1 4 1 TRUE topology where D E 3 2 C 1 2 1 1 4 F B has accumulated 4 times as much mutations as A since their divergence
  • 39. Building trees with distance methods • Building a Neighbog-Joining tree with the number of differences A B 1 4 D E 3 2 C 1 2 1 1 4 F 1 A B C D E F A 0 5 4 7 6 8 B 5 0 7 10 9 11 C 4 7 0 7 6 8 D 7 10 7 0 5 9 E 6 9 6 5 0 8 F 8 11 8 9 8 0 UPGMA would cluster A and C together because B is more distant
  • 40. Building trees with distance methods • A global divergence is calculated by summing all distances, and a new distance matrix is computed A B C D E F A 0 5 4 7 6 8 B 5 0 7 10 9 11 C 4 7 0 7 6 8 D 7 10 7 0 5 9 E 6 9 6 5 0 8 F 8 11 8 9 8 0 Div 30 42 32 38 34 44 A B C D E F A 0 -13 -11.5 -10 -10 -10.5 B -13 0 -11.5 -10 -10 -10.5 C -11.5 -11.5 0 -10.5 -10.5 -11 D -10 -10 -10.5 0 -13 -11.5 E -10 -10 -10.5 -13 0 -11.5 F -10.5 -10.5 -11 -11.5 -11.5 0 Div(A) = Σi dist(A,i) = 5+4+7+6+8 = 30 Div(B) = Σi dist(B,i) = 5+7+10+9+11 = 42 Div(C) = Σi dist(C,i) = 32 Div(D) = Σi dist(D,i) = 38 Div(E) = Σi dist(E,i) = 34 Div(F) = Σi dist(F,i) = 44 M(i,j) = dist(i,j)-(Div(i)+Div(j))/N-2 M(A,B) = 5-(30+42)/4 = -13 M(A,C) = 4-(30+32)/4=-11.5 etc…
  • 41. Building trees with distance methods • Starting with a star-like tree, the nodes are created sequentially A B C D E F A B C D E F 1 4 …
  • 42. Advantages and disadvantages of the Neighbor-Joining method • Fast method that will always produce a reasonnable tree. Always produces the same tree if the same alignment is used • Relaxes the most irrealistic assumptions of the UPGMA • Long Branches Attraction : two taxa with similar converging properties (increased GC content or high evolutionary rates) will have the tendency to group together
  • 43. How to test the reliability of trees ? • One popular method : BOOTSTRAPPING – Randomly generates new alignment from the original one, by drawing positions with replacement – The new alignments will have the same length, but slightly different composition than the original one (i.e. some positions will be represented more than once and some positions will be omitted) – Tree reconstruction is applied to these new alignment. – The clustering in the original tree are investigated, to see how often they occur in the bootstrapped trees. The more a group appears, the more that node is supported by a high bootstrap value
  • 44. How to test the reliability of trees ? • Bootstrapping example : 1) The Data x y 1 0.969977 2 1.744463 3 3.073277 4 4.510589 5 5.471489 6 5.599175 7 7.03988 8 7.812655 9 8.913299 10 9.971481 11 9.98552 12 10.24078 13 10.59902 14 12.61131 15 12.63132 16 13.83974 17 16.03453 18 17.27271 19 19.25622 20 19.26901 Original Data y = 0.9176x + 0.2072 R2 = 0.9794 20 18 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 X Y
  • 45. How to test the reliability of trees ? • Bootstrapping example : 2) Resampling
  • 46. How to test the reliability of trees ? • Bootstrapping example : 3) Analyse the Resamples
  • 47. How to test the reliability of trees ? • Boostrapping example : 4) Assess the reliability of the original estimates with the dispersion of the estimates of the resamples Original Data + Bootstraps 20 18 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 X Y
  • 48. How to test the reliability of trees ? • BOOTSTRAPPING : Taxon A : ATGCGAGTTTAGCAG Taxon B : ATGCGAGCTTAACTG Taxon C : ATACTAGCTTAGCTG Taxon D : ATGCTATCTTAGGTG Alignment s1 Alignment s2 Alignment s3 Alignment s4 AB CD AB CD AB CD AB CD AB CD A+B : 4/4 = 100% C+D : 3/4 = 75% A+B+C+D : 4/4 = 100% A B C D 100 100 75
  • 49. Genetic distances • A multitude of forces act on sequences (mutation, selection, drift) and therefore two sequences coming from a common ancestor will diverge with time • The problem with counting the number of difference (p-distance) is that it does not take into account multiple substitutions on the same site • Therefore we need to model the substitution process => time-homogenous continuous stationary Markov Process
  • 50. Genetic distances Example : double substitution ATGTCTTTG ATGTCGTTG ATGTCATTG * * ATGTCATTG ATGTCATTG p-distance = 1 but 2 substitutions occured !
  • 51. Genetic distances Example : back-mutation ATGTCTTTG ATGTCATTG ATGTCATTG * * ATGTCATTG ATGTCATTG p-distance = 0 but 2 substitutions occured !
  • 52. Genetic distances Example : convergence ATGTCTTTG ATGTCTTTG ATGTCATTG * ATGTCATTG ATGTCTTTG p-distance = 0 but 2 substitutions occured ! *
  • 53. Genetic distances • How does the p-distance correlates with speciation time ? When we look at the divergence of proteins in distantly related organisms, we expect a linear relation (e.g. the more distant organisms share less and less identities) => correct but we always underestimate the genetic distance if we only count the number of differences
  • 54. Genetic distances • How does the p-distance correlates with speciation time ? Observed p difference 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 Non-linear relation because of multiple, parallel, and back-substitutions
  • 55. Genetic distances • How to model sequence evolution ? (Jukes and Cantor, 1969) – All possible substitutions have the same probability – All 4 nucleotides have the same frquency = 25% – The chance for a particular substitution is a simple function of time – The chance for a nucleotide to not change is therefore a decreasing function of time – Two random sequences (diverged for an infinite time) will still have 25% identity (there are only 4 nucleotides)
  • 56. Genetic distances • The JC69 matrix : to A C G T from A ¼+3/4*X ¼-1/4*X ¼-1/4*X ¼-1/4*X C ¼-1/4*X ¼+3/4*X ¼-1/4*X ¼-1/4*X G ¼-1/4*X ¼-1/4*X ¼+3/4*X ¼-1/4*X T ¼-1/4*X ¼-1/4*X ¼-1/4*X ¼+3/4*X X = e-μ.t Sums of columns = sums of lines : the rate of appearance of nucleotides is the same as the rate of disparition (nucleotides are at equilibrium)
  • 57. Genetic distances • How to model sequence evolution ? (Jukes and Cantor, 1969) – Example : we count 20 differences between two 100bp-long sequences • d = -3/4 * ln( 1 - 4/3 * p ) • p = 0.2 • d = 0.232 • => there are 3 mutations that have occured but that we do not see, because they have occured in a position where another mutation had already occured – Does this now efficiently model the substitution process ?
  • 58. Genetic distances • How to model sequence evolution ? Some facts – Transitions are more likely than transversions purines pyrimidines A T G C
  • 59. Genetic distances • How to model sequence evolution ? Some facts – Not all positions evolve at the same rate : the chance for an amino acid change is different for the third position than for the other positions
  • 60. Genetic distances • How to model sequence evolution ? Some facts – Not all positions evolve at the same rate : some codons are under strong purifying selection, while some other are under diversifying selection => they do not evolve at the same rate
  • 61. Genetic distances • How to model sequence evolution ? – Better models have been designed to take into account the individuality of each substitution rate. – Rate heterogeneity models take into account the inter-position differences. Some positions are allowed to evolve faster than other – Genomes have their proper nucleotide compositions (GC-content)
  • 62. Genetic distances • Some models of nucleotide substitution - JC69 : a=b=c=d=e=f A=C=G=T=1/4 - K80 : b=e, a=c=d=f A=C=G=T=1/4 - HKY85 : b=e, a=c=d=f A ≠ C ≠ G ≠ T - TN93 : b, e, a=c=d=f A ≠ C ≠ G ≠ T - GTR : a, b, c, d, e, f A ≠ C ≠ G ≠ T More models are possible (12-parameters, codons) but are generally not used
  • 63. Genetic distances • Site heterogeneity models – Usually described as a Gamma distribution (discretized in 4 – 10 categories) – An arbitrary proportion of invariant sites is sometimes added 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 k=1 k=1.5 k=3 k=5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
  • 64. Genetic distances • Which model to chose ? – The simplest models make irrealistic assumptions – Why don’t we choose always the most complex models ? • Difficult to compute • Parameters values difficult to get from the data • Danger of overfitting
  • 65. HOLY DATA MODEL HYPOTHESIS
  • 66. Genetic distances • Overfitting Measurement of some phenomenon 90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 Input Output
  • 67. Genetic distances • Overfitting R2 = 1 !?
  • 68. Genetic distances • How to chose the appropriate model ? – Likelihood ratio tests for nested models (more on that later) – Many information criteria have been designed (more on that later also) – Trial-and error, depends of the dataset • more data -> more complex model • better data -> more complex models • Litterature search
  • 69. File formats (flat files) • FASTA (.fas, .fst, .fasta) Most common sequence format, no header >seq1 ATCGTGCATACGAGCT >seq2 ATCGTGCATACGACGT >seq3 ATCGTGCATACGAAGT
  • 70. File formats (flat files) • NEXUS (.nex) Contains blocks with sequence and tree information #NEXUS Begin Data; Dimensions ntax=3 nchar =16; Format datatype=Nucleotide gap=-; [insert comment here] seq1 ATCGTGCATACGAGCT seq2 ATCGTGCATACGACGT seq3 ATCGTGCATACGAAGT End;
  • 71. Practicals : Phylogenetics Part I 1. Download the file « PrimatesNuc_1.txt». Open it and identify its format. Rename it with the correct extension. 2. Load the file in BioEdit and run a multiple alignment (select all sequences then click « Accessory application -> ClustalW multiple alignment »). Save the resulting file 3. Load the original file in Seaview and check alignment options (Align -> Alignment options). Select ClustalW2 and run a multiple alignment (Align all). Save the resulting file. Then, reload the original data, change the option to Muscle and run the alignment again. Save this file too 4. To generate an consistency-based alignment with T-COFFEE, access the web page http://www.tcoffee.org/, submit the original data and save the resulting alignment 5. To generate an alignment with MAFFT, access the web page http://mafft.cbrc.jp/alignment/server/index.html, submit the original data with the default options, and save the resulting alignment 6. Now we can compare the alignments obtained by the different methods. Access the web page http://bibiserv.techfak.uni-bielefeld.de/altavist/, select option 2 for comparing two alignments and compare the different alignments you produced. Which alignment is the most different ? Which are the most identical ? Can you guess why ? Open the alignments in BioEdit and spot the differences.
  • 72. Practicals : Phylogenetics Part I 7. Open MEGA. Import the MAFFT alignment. Open it as « analyse », consider it « nucleotides », « coding sequence », with the standard genetic code. Press F4 to open the alignment explorer. Try the different options in the « Statistics » menu. 8. In the « Models » menu, select « Find Best DNA/Protein Model (ML) ». Leave the default options and run. Which model has the best likelihood ? Which model is the most appropriate ? 9. Go to the « Distances » menu and select « Compute pairwise distances ». Now select the proper options for this analysis (substitution model and site heterogeneity model). Are chimps closer to humans or to gorillas ? (you might need to export the data to Excel) 10. Go to the « Phylogeny » menu and select « Construct/Test UPGMA Tree ». Leave the default options and compute. Does the human/chimp/gorilla clustering fit with your knowledge ? Redo the analysis with appropriate options (substitution model and site heterogeneity model). Does it get any better ? 11. Go to the « Phylogeny » menu and select « Construct/Test Neighbor-Joining Tree ». Select the appropriate options and compute the tree. Do chimps cluster with humans or gorillas ? Being able to explain is important 12. Try the same with the ClustalW alignment. Draw some conclusions for yourself
  • 73. Practicals : Phylogenetics Part I 13. Which of these 4 unrooted trees does not have the same topology as the 3 other ones ?