Contenu connexe



  1. Evolution and Molecular Phylogeny Abida Shehzadi Centre of Excellence in Molecular Biology University of the Punjab, Lahore BIOINFORMATICS
  2. Evolution • Evolution is the process of change in the inherited traits of a population of organisms from one generation to the next. • Genes that are passed on to an organism's offspring produce the inherited traits that are the basis of evolution. • Mutations in genes can produce new or altered traits in individuals, resulting in the appearance of heritable differences between organisms. New traits may also arise from the transfer of genes between populations, as in migration; or between species, as in horizontal gene transfer. • In species that reproduce sexually, new combinations of genes are produced by genetic recombination, which can increase the variation in traits between organisms. • Evolution occurs when these heritable differences become more common or rare in a population.
  3. …continue • It is important to note that biological evolution is a physical process occurring in the natural realm. The mechanisms that drive evolution also control it. • Two major mechanisms that’s drive evolution are: – Natural selection, a process causing heritable traits that are helpful for survival and reproduction to become more common in a population, and harmful traits to become more rare. This occurs because individuals with advantageous traits are more likely to reproduce, so that more individuals in the next generation inherit these traits. – Genetic drift, an independent process that produces random changes in the frequency of traits in a population. In genetic drift probability plays vital role, whether a given trait will be passed on as individual to survive and reproduce. • Though the changes produced in any one generation by drift and selection are small, differences accumulate with each subsequent generation and can, over time, cause substantial changes in the organisms.
  4. Evolution • Most of bioinformatics is comparative biology • Comparative biology is based upon evolutionary relationships between compared entities • Evolutionary relationships are normally depicted in a phylogenetic tree
  5. Phylogenetics • Phylogenetic trees illustrate the evolutionary relationships among groups of organisms, or among a family of related nucleic acid or protein sequences • E.g., how might have this family been derived during evolution
  6. • The purpose of phylogeny is to reconstruct the history of life and explain the present diversity of living creatures. This can be represented as a huge genealogic tree (the tree of life). • Biology is very much about classifying — and the best means of classification we have is phylogeny.
  7. Where can phylogeny be used? • For example, finding out about orthology versus paralogy • Determining the closest relatives of the organism that you’re interested in: For instance, if you’re studying a new bacterium, you can sequence its ribosomal RNA and place it on a phylogenetic tree computed with all known ribosomal RNAs. This can give you a fairly good idea of who this bacterium really is • Discovering the function of a gene: If you’re studying a gene, you can use phylogenetic trees to be sure that the gene you’re interested in is orthologous (more about that in a minute) to another well-characterized gene in another species • Retracing the origin of a gene: Most genes within a genome travel together through evolutionary time. However, from time to time, individual genes may jump from one species to. Phylogenetic trees are a great way to reveal such events, which are called horizontal (or lateral) transfers • Multiple sequence alignment (e.g. ClustalW)
  8. Reminder -- Orthology/Paralogy Orthologous genes are homologous (corresponding) genes in different species Paralogous genes are homologous genes within the same species (genome)
  9. Phylogenetic tree C D B A branches external nodes leaf OTU – Observed taxonomic unit A tree is an acyclic connected graph that consists of a collection of nodes (internal and external) and branches connecting them so that every node can be reached by a unique path from every other branch. internal nodes
  10. Terminology Node: represents a taxonomic unit. This can be either an existing species or an ancestor. Branch: defines the relationship between the taxa in terms of descent and ancestry. Topology: the branching patterns of the tree. Branch length: represents the number of changes that have occurred in the branch. Root: the common ancestor of all taxa. Distance scale: scale that represents the number of differences between organisms or sequences. Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all of their descendents. Leaf/Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as individuals, populations, species, genera, or bacterial strains.
  11. What data used to build a tree? • Traditionally: morphological features like numbers of legs, beak shape, etc • Today: mostly molecular data, i.e. DNA and protein sequences
  12. Data for phylogeny • Can be classified into two categories − Numerical data ▪ Distances between objects e.g., distance(man, mouse) = 500 distance(man, chimp) = 100 − Discrete characters ▪ Each character has finite number of states e.g., number of legs = 1, 2, 3, 4 DNA = {A, C, T, G}
  13. Types of Phylogenetic tree • Species tree (how are my species related?) − contains only one representative from each species − when did speciation take place? − all nodes indicate speciation events • Gene tree (how are my genes related?) − normally contains a number of genes from a single species − nodes relate either to speciation or gene duplication events
  14. Features of a phylogenetic tree Phylogenetic trees are used as visual displays that represent hypothetical, reconstructed evolutionary events. The tree in this case consists of: • internal nodes which represent taxonomic units such as species or genes; the external nodes, those at the ends of the branches, represent living organisms. • The lengths of the branches usually represent an elapsed time, measured in years, or the length of the branches may represent number of molecular changes (e.g. mutations) that have taken place between the two nodes. This is calculated from the degree of differences when sequences are compared (refer to “alignments” later) • Sometimes, the lengths are irrelevant and the tree represents only the order of evolution. [In a dendrogram, only the lengths of horizontal (or vertical, as the case may be) branches count]. • Finally the tree may be rooted or unrooted.
  15. Phylogenetic tree (unrooted) C D root In this case, the tree shows the relationship between organisms A, B, C & D and does not tell us anything about the series of evolutionary events that led to these genes. There is also no way to tell whether or not a given internal node is a common ancestor of any 2 external nodes. A B In unrooted tree, an external node represents a contemporary organism. Internal nodes represent common ancestors of some of the external nodes.
  16. Phylogenetic tree (rooted) root branch internal node (ancestor) leaf OTU – Observed taxonomic unit time In case of a rooted tree, one of the internal nodes is used as an outgroup and becomes the common ancestor of all the other external nodes. The outgroup therefore enables the root of a tree to be located and the correct evolutionary pathway to be identified. A B C D
  17. How to root a tree • Outgroup – place root between distant sequence and rest group • Midpoint – place root at midpoint of longest path (sum of branches between any two OTUs) • Gene duplication – place root between paralogous gene copies f D m h D f m h f D m h D f m h f-α h-α f-β h-β f-α h-α f-β h-β 5 3 2 1 1 4 1 2 1 3 1
  18. Gene trees are not same as species trees The above tree is a gene tree i.e. a tree derived by comparing orthologous sequences (those derived from the same ancestral sequence). The assumption is that this gene tree is a more accurate reflection of a species tree than the one that can be inferred from morphological data. This assumption is generally correct but it does not mean that the gene tree is the same as a species tree. Baboon Orangutan Gorilla Human Chimpanzee
  19. Cladistics and Phenetics • Cladistic approach: Trees are drawn based on the conserved characters • Phenetic approach: Trees are based on some measure of distance between the leaves • Molecular phylogenies are inferred from molecular (usually sequence) data − either cladistic (e.g. gene order) or phenetic Clade: A set of species which includes all of the species derived from a single common ancestor
  20. Tree distances human x mouse 6 x fugu 7 3 x Drosophila 14 10 9 x human mouse fugu Drosophila 5 1 1 2 6 Evolutionary (sequence distance) = sequence dissimilarity 1 Note that with evolutionary methods for generating trees you get distances between objects by walking from one to the other.
  21. Phylogeny methods 1. Distance method – evolutionary distances are computed for all OTUs and build tree where distance between OTUs “matches” these distances 2. Maximum Parsimony (MP) – choose tree that minimizes number of changes required to explain data 3. Maximum likelihood (ML) – under a model of sequence evolution, find the tree which gives the highest likelihood of the observed data