Evolution
• Evolution is the process of change in the inherited traits of a population of
organisms from one generation to the next.
• Genes that are passed on to an organism's offspring produce the inherited traits
that are the basis of evolution.
• Mutations in genes can produce new or altered traits in individuals, resulting in
the appearance of heritable differences between organisms. New traits may also
arise from the transfer of genes between populations, as in migration; or between
species, as in horizontal gene transfer.
• In species that reproduce sexually, new combinations of genes are produced by
genetic recombination, which can increase the variation in traits between
organisms.
• Evolution occurs when these heritable differences become more common or rare
in a population.
…continue
• It is important to note that biological evolution is a physical process occurring in
the natural realm. The mechanisms that drive evolution also control it.
• Two major mechanisms that’s drive evolution are:
– Natural selection, a process causing heritable traits that are helpful for survival and
reproduction to become more common in a population, and harmful traits to become
more rare. This occurs because individuals with advantageous traits are more likely to
reproduce, so that more individuals in the next generation inherit these traits.
– Genetic drift, an independent process that produces random changes in the frequency
of traits in a population. In genetic drift probability plays vital role, whether a given
trait will be passed on as individual to survive and reproduce.
• Though the changes produced in any one generation by drift and selection are
small, differences accumulate with each subsequent generation and can, over
time, cause substantial changes in the organisms.
Evolution
• Most of bioinformatics is comparative biology
• Comparative biology is based upon evolutionary
relationships between compared entities
• Evolutionary relationships are normally depicted in a
phylogenetic tree
Phylogenetics
• Phylogenetic trees illustrate the evolutionary
relationships among groups of organisms, or
among a family of related nucleic acid or protein
sequences
• E.g., how might have this family been derived
during evolution
• The purpose of phylogeny is to reconstruct
the history of life and explain the present
diversity of living creatures. This can be
represented as a huge genealogic tree (the
tree of life).
• Biology is very much about classifying —
and the best means of classification we have
is phylogeny.
Where can phylogeny be used?
• For example, finding out about orthology versus paralogy
• Determining the closest relatives of the organism that you’re interested
in: For instance, if you’re studying a new bacterium, you can sequence its
ribosomal RNA and place it on a phylogenetic tree computed with all known
ribosomal RNAs. This can give you a fairly good idea of who this bacterium
really is
• Discovering the function of a gene: If you’re studying a gene, you can use
phylogenetic trees to be sure that the gene you’re interested in is orthologous
(more about that in a minute) to another well-characterized gene in another
species
• Retracing the origin of a gene: Most genes within a genome travel
together through evolutionary time. However, from time to time, individual
genes may jump from one species to. Phylogenetic trees are a great way to
reveal such events, which are called horizontal (or lateral) transfers
• Multiple sequence alignment (e.g. ClustalW)
Reminder -- Orthology/Paralogy
Orthologous genes are homologous (corresponding)
genes in different species
Paralogous genes are homologous genes within the
same species (genome)
Phylogenetic tree
C
D
B
A
branches
external nodes
leaf
OTU – Observed
taxonomic unit
A tree is an acyclic connected graph that consists of a collection of
nodes (internal and external) and branches connecting them so that
every node can be reached by a unique path from every other branch.
internal nodes
Terminology
Node: represents a taxonomic unit. This can be either an existing
species or an ancestor.
Branch: defines the relationship between the taxa in terms of descent
and ancestry.
Topology: the branching patterns of the tree.
Branch length: represents the number of changes that have occurred
in the branch.
Root: the common ancestor of all taxa.
Distance scale: scale that represents the number of differences
between organisms or sequences.
Clade: a group of two or more taxa or DNA sequences that includes
both their common ancestor and all of their descendents.
Leaf/Operational Taxonomic Unit (OTU): taxonomic level of
sampling selected by the user to be used in a study, such as
individuals, populations, species, genera, or bacterial strains.
What data used to build a tree?
• Traditionally: morphological features like
numbers of legs, beak shape, etc
• Today: mostly molecular data, i.e. DNA and
protein sequences
Data for phylogeny
• Can be classified into two categories
− Numerical data
▪ Distances between objects
e.g., distance(man, mouse) = 500
distance(man, chimp) = 100
− Discrete characters
▪ Each character has finite number of states
e.g., number of legs = 1, 2, 3, 4
DNA = {A, C, T, G}
Types of Phylogenetic tree
• Species tree (how are my species related?)
− contains only one representative from each species
− when did speciation take place?
− all nodes indicate speciation events
• Gene tree (how are my genes related?)
− normally contains a number of genes from a single species
− nodes relate either to speciation or gene duplication events
Features of a phylogenetic tree
Phylogenetic trees are used as visual displays that represent hypothetical,
reconstructed evolutionary events. The tree in this case consists of:
• internal nodes which represent taxonomic units such as species or
genes; the external nodes, those at the ends of the branches,
represent living organisms.
• The lengths of the branches usually represent an elapsed time,
measured in years, or the length of the branches may represent
number of molecular changes (e.g. mutations) that have taken place
between the two nodes. This is calculated from the degree of
differences when sequences are compared (refer to “alignments” later)
• Sometimes, the lengths are irrelevant and the tree represents only the
order of evolution. [In a dendrogram, only the lengths of horizontal (or
vertical, as the case may be) branches count].
• Finally the tree may be rooted or unrooted.
Phylogenetic tree (unrooted)
C
D
root
In this case, the tree shows the relationship between organisms A, B, C & D
and does not tell us anything about the series of evolutionary events that
led to these genes. There is also no way to tell whether or not a given
internal node is a common ancestor of any 2 external nodes.
A
B
In unrooted tree, an external node represents a contemporary organism.
Internal nodes represent common ancestors of some of the external nodes.
Phylogenetic tree (rooted)
root
branch
internal node (ancestor)
leaf
OTU – Observed
taxonomic unit
time
In case of a rooted tree, one of the internal nodes is used as an outgroup
and becomes the common ancestor of all the other external nodes. The
outgroup therefore enables the root of a tree to be located and the correct
evolutionary pathway to be identified.
A B C D
How to root a tree
• Outgroup – place root between
distant sequence and rest group
• Midpoint – place root at midpoint
of longest path (sum of branches
between any two OTUs)
• Gene duplication – place root
between paralogous gene copies
f
D
m
h D f m h
f
D
m
h D f m h
f-α
h-α
f-β
h-β f-α h-α f-β h-β
5
3
2
1
1
4
1
2
1
3
1
Gene trees are not same as species trees
The above tree is a gene tree i.e. a tree derived by comparing
orthologous sequences (those derived from the same ancestral
sequence). The assumption is that this gene tree is a more accurate
reflection of a species tree than the one that can be inferred from
morphological data. This assumption is generally correct but it does
not mean that the gene tree is the same as a species tree.
Baboon
Orangutan
Gorilla
Human
Chimpanzee
Cladistics and Phenetics
• Cladistic approach: Trees are drawn based on the
conserved characters
• Phenetic approach: Trees are based on some
measure of distance between the leaves
• Molecular phylogenies are inferred from molecular
(usually sequence) data
− either cladistic (e.g. gene order) or phenetic
Clade: A set of species which includes all of the species derived
from a single common ancestor
Tree distances
human x
mouse 6 x
fugu 7 3 x
Drosophila 14 10 9 x
human
mouse
fugu
Drosophila
5
1
1
2
6
Evolutionary (sequence distance) = sequence dissimilarity
1
Note that with evolutionary methods for generating trees you get
distances between objects by walking from one to the other.
Phylogeny methods
1. Distance method – evolutionary distances are computed
for all OTUs and build tree where distance between
OTUs “matches” these distances
2. Maximum Parsimony (MP) – choose tree that
minimizes number of changes required to explain data
3. Maximum likelihood (ML) – under a model of sequence
evolution, find the tree which gives the highest
likelihood of the observed data