This document outlines and provides examples of different phylogenetic tree construction methods, including UPGMA and neighbor joining. UPGMA assumes a constant mutation rate and joins clusters based on average distances. Neighbor joining does not assume a constant rate and finds the tree that best satisfies the four-point criterion of additive distances. The examples demonstrate the step-by-step process of applying these methods to distance matrices to build phylogenetic trees through an iterative clustering approach.
3. Phylogeny
Understanding life through time,
over long periods of past time,
the connections between all groups of organisms
as understood by ancestor/descendant
relationships,
Tree of life.
6. Phylogeny
Rooted and Unrooted trees:
– Most phylogenetic methods produce unrooted trees,
because they detect differences between sequences,
but have no means to orient residue changes
relatively to time.
7. Phylogeny
Rooted and Unrooted trees:
– Two means to root an unrooted tree :
The outgroup method : include in the analysis a group of
sequences known a priori to be external to the group under study;
the root is by necessity on the branch joining the outgroup to other
sequences.
Make the molecular clock hypothesis : all lineages are supposed
to have evolved with the same speed since divergence from their
common ancestor. Root the tree at the midway point between the
two most distant taxa in the tree, as determined by branch lengths.
The root is at the equidistant point from all tree leaves.
10. Phylogeny
Species Tree and Gene Tree:
Evolutionary relationship
between seven eukaryotes
E gene tree for Na+
-K+
ion
pump membrane protein
family members
12. Phylogeny
Additive Tree:A distance matrix corresponding
to a tree is called additive,
– THEOREM: D is additive if and only if:
For every four indices i,j,k,l, the maximum and median of the
three pairwise sums are identical:
Dij+Dkl < Dik+Djl = Dil+Djk
13. UPGMA
Building Phylogenetic Trees by UPGMA:
– Unweighted Pair – Group Method using arithmetic
Averages,
– Assume constant mutation rate,
– The two sequences with with the shortest
evolutionary distance between them are assumed to
have been the last two diverge, and represented by
the most racent internal node.
14. UPGMA
Building Phylogenetic Trees by UPGMA:
– The distance between two clusters:
Assume we have N sequences,
Cluster X has NX sequences, cluster Y has NY sequences,
dXY : the evlotionary distance between X and Y
∑∈∈
=
YjXi
ij
YX
XY d
NN
d
,
1
15. UPGMA
Building Phylogenetic Trees by UPGMA:
– When cluster X and Y are combined to make a new
cluster Z:
No need to use sequence – sequence distances,
Calculate the distance of each cluster (such as W) to the new
cluster Z
YX
YWYXWX
ZW
NN
dNdN
d
+
+
=
18. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
A – D becomes a new cluster lets say V,
We have to modify the distance matrix,
What are the distances between:
– V and B,
– V and C,
– V and E,
– V and F.
19. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
A – D becomes a new cluster lets say V,
We have to modify the distance matrix,
What are the distances between:
– V and B (Calculate),
6
11
6*16*1
=
+
+
=
+
+
=
DA
DBDABA
VB
NN
dNdN
d
20. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
A – D becomes a new cluster lets say V,
We have to modify the distance matrix,
What are the distances between:
– V and C (Calculate),
8
11
8*18*1
=
+
+
=
+
+
=
DA
DCDACA
VC
NN
dNdN
d
21. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
A – D becomes a new cluster lets say V,
We have to modify the distance matrix,
What are the distances between:
– V and E (Calculate),
2
11
2*12*1
=
+
+
=
+
+
=
DA
DEDAEA
VE
NN
dNdN
d
22. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
A – D becomes a new cluster lets say V,
We have to modify the distance matrix,
What are the distances between:
– V and F (Calculate),
6
11
6*16*1
=
+
+
=
+
+
=
DA
DFDAFA
VF
NN
dNdN
d
25. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
V – E becomes a new cluster lets say W,
We have to modify the distance matrix,
What are the distances between:
– W and B,
– W and C,
– W and F.
26. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
V – E becomes a new cluster lets say W,
We have to modify the distance matrix,
What are the distances between:
– W and B (Calculate),
6
12
6*16*2
=
+
+
=
+
+
=
EV
EBEVBV
WB
NN
dNdN
d
27. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
V – E becomes a new cluster lets say W,
We have to modify the distance matrix,
What are the distances between:
– W and C (Calculate),
8
12
8*18*2
=
+
+
=
+
+
=
EV
ECEVCV
WC
NN
dNdN
d
28. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
V – E becomes a new cluster lets say W,
We have to modify the distance matrix,
What are the distances between:
– W and F (Calculate),
6
12
6*16*2
=
+
+
=
+
+
=
EV
EFEVFV
WF
NN
dNdN
d
31. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
F – B becomes a new cluster lets say X,
We have to modify the distance matrix,
What are the distance between:
– W and X.
32. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
What are the distance between: W and X (Calculate).
6)666666(*
2*3
1
)(
1
1
,
=+++++
=+++++
== ∑∈∈
EFEBDFDBAFAB
XW
XjWi
ij
XW
WX
dddddd
NN
d
NN
d
35. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
X – W becomes a new cluster lets say Y,
We have to modify the distance matrix,
What are the distance between:
– Y and C.
36. UPGMA
Building Phylogenetic Trees by UPGMA:
– Example:
What are the distance between: Y and C (Calculate).
8)88888(*
1*5
1
)(
1
1
,
=++++
=++++
== ∑∈∈
FCBCECDCAC
CY
CjYi
ij
CY
YC
ddddd
NN
d
NN
d
39. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Do not make the assumption of constant mutation
rate,
– Assume that the distances are additive.
41. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– The branch lengths:
)(
2
1
)(
2
1
)(
2
1
3
2
1
ABBCAC
ACBCAB
BCACAB
dddb
dddb
dddb
−+=
−+=
−+=
42. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– The distances between clusters are defined as
UPGMA:
∑∈∈
=
YjXi
ij
YX
XY d
NN
d
,
1
YX
YWYXWX
ZW
NN
dNdN
d
+
+
=
44. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
45. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
D and E are the closest sequences
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
46. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
D and E are the closest sequences
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
47. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Name {A, B, C} as W,
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
48. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Distance between W and D:
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E33)184139(*
1*3
1
)(
1
1
,
=++
=++
== ∑∈∈
CDBDAD
DW
DjWi
ij
DW
WD
ddd
NN
d
NN
d
49. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Distance between W and E:
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E35)204341(*
1*3
1
)(
1
1
,
=++
=++
== ∑∈∈
CEBEAE
EW
EjWi
ij
EW
WE
ddd
NN
d
NN
d
50. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Branches a, b and c:
29433
6410
4)351033(
2
1
)(
2
1
=−=
=−=
=−+=−+=
c
b
ddda WEDEWD
52. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Update the distance matrix:
A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
A B C {D,E}
A 22 39 40
B 41 42
C 19
{D,E}
53. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
{D,E} and C are the closest sequences
A B C {D,E}
A 22 39 40
B 41 42
C 19
{D,E}
54. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Name {A, B} as W:
A B C {D,E}
A 22 39 40
B 41 42
C 19
{D,E}
55. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Distance between W and C:
40)4139(*
1*2
1
)(
1
1
,
=+
=+
== ∑∈∈
BCAC
CW
CjWi
ij
CW
WC
dd
NN
d
NN
d A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
56. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Distance between W and {D,E} (name {D,E} as X):
41)43414139(*
2*2
1
)(
1
1
,
=+++
=+++
== ∑∈∈
BEBDAEAD
XW
XjWi
ij
XW
WX
dddd
NN
d
NN
d A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
57. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Distance between C and {D,E} (name {D,E} as X):
19)2018(*
2*1
1
)(
1
1
,
=+
=+
== ∑∈∈
CECD
XC
XjCi
ij
XC
CX
dd
NN
d
NN
d A B C D E
A 22 39 39 41
B 41 41 43
C 18 20
D 10
E
58. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Branches a, b and c:
31940
10919
9)411940(
2
1
)(
2
1
=−=
=−=
=−+=−+=
c
b
ddda WXCXWC
60. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Update the distance matrix:
A B C {D,E}
A 22 39 40
B 41 42
C 19
{D,E}
A B {C,D,E}
A 22 39.5
B 41.5
{C,D,E}
61. Fitch-Margoliash Method:
Building Phylogenetic Trees by Fitch-Margoliash:
– Another Example:
Now we are in thee trivial case of 3 sequences (remember the
previous example):
A B {C,D,E}
A 22 39.5
B 41.5
{C,D,E}
63. The Neighbor-Joining Method:
Building Phylogenetic Trees by Neighbor-Joining:
– The true tree will be that for which the total branch
length, S, is shortest,
– Neighbors: a pair of nodes that are seperated by just
one other node,
64. The Neighbor-Joining Method:
Building Phylogenetic Trees by Neighbor-Joining:
– Algorithm (Given a distance matrix):
Iterate Until 2 Nodes are left:
– For each node find
– Choose pair (i, j) with smallest
– Mege two nodes i and j with a new internal node Y, and
find branch lengths by
– Update the distance matrix using
69. References
M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland
Science
Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical
guide to the analysis of genes and proteins”, 2001, Wiley.
Barbara Resch, “Hidden Markov Models - A Tutorial for the Course
Computational Intelligence”, 2010.