2. Introduction
• Comparative genomics is a large-scale, holistic approach that
compares two or more genomes to discover the similarities and
differences between the genomes and to study the biology of the
individual genomes
• The subject of comparative genomics impinges on
– Evolutionary biology and phylogenetic reconstructions of the tree of life,
– Drug discovery programs,
– Function predictions of hypothetical proteins
– Identification of genes, regulatory motifs and other non-coding DNA
motifs
– Genome flux and dynamics
2
3. Computational tool for genome-scale
sequence alignment
• The first step in comparative genomics analysis is often the
alignment of two genome sequences
• It is a technically challenging problem
Algorithms/tools URL
BLASTN and
MEGABLAST
http://www.ncbi.nlm.nih.gov/BLAST/
GLASS http://crossspecies.lcs.mit.edu/
MUMmer http://www.tigr.org/software/mummer/
PatternHunter http://www.bioinformaticssolutions.com/
products/ph.php
PipMaker http://bio.cse.psu.edu/pipmaker/
VISTA http://www-gsd.lbl.gov/vista
WABA http://www.cse.ucsc.edu/kent/xenoAli/
3
L. Wei et. al. 2002
4. Related terms..
• Homology
• Homologous
• Orthologous
• Paralogous
• Xenologous
• Analogoues
• Horizontal gene transfer
4
5. • Homology is the relationship of any two characters (such as two proteins that
have similar sequences) that have descended, usually through divergence, from
a common ancestral character
• Homologues are thus components or characters (such as genes/proteins with
similar sequences) that can be attributed to a common ancestor of the two
organisms during evolution. Homologues can either be orthologues,
paralogues, or xenologues
• Orthologues are homologues that have evolved from a common ancestral gene
by speciation. They usually have similar functions
• Paralogues are homologues that are related or produced by duplication within a
genome. They often have evolved to perform different functions
• Xenologues are homologues that are related by an interspecies (horizontal
transfer) of the genetic material for one of the homologues
• Horizontal (Lateral) Gene Transfer is the movement of genetic material
between species (or genus) other than by vertical descent.
5
8. Methods for comparative genomics
• Comparative analysis of genome structure
• Comparative analysis of coding regions
• Comparative analysis of non-coding regions
8
9. Comparative analysis of genome structure
• Analysis of the global structure of genomes, such as nucleotide
composition, syntenic relationships, and gene ordering offer insight
into the similarities and differences between genomes.
• This provide information on the organization and evolution of the
genomes, and highlight the unique features of individual genomes
• The structure of different genomes can be compared at three levels:
– Overall nucleotide statistics,
– Genome structure at DNA level, and
– Genome structure at gene level.
9
10. Comparison of overall nucleotide statistics
• Overall nucleotide statistics, such as
– Genome size,
– Overall (G+C) content,
– Regions of different (G+C) content,
– Genome signature such as codon usage biases,
– Amino acid usage biases, and the ratio of observed di-
nucleotide frequency and
– The expected frequency given random nucleotide
distribution
• These all present a global view of the similarities and
differences of the genomes.
10
11. Comparison of genome structure at DNA level
• Chromosomal breakage and exchange of chromosomal
fragments are common mode of gene evolution. They can be
studied by comparing genome structures at DNA level.
– Identification of conserved synteny and genome
rearrangement events
– Analysis of breakpoints
– Analysis of content and distribution of DNA repeats
11
12. Comparison of genome structure at gene level
• Chromosomal breakage and exchange of chromosomal
fragments cause disruption of gene order
• Therefore gene order correlates with evolutionary distance
between genomes
12
13. Comparative analysis of coding regions
13
Number of algorithms that have been use in comparative genomics to aid function
prediction of genes.
Identification of gene-coding regions
comparison of gene content
comparison of protein content
Comparative genome based function prediction
14. Identification of gene-coding regions
• The analysis and comparison of the coding regions starts with the gene
identification algorithm that is used to infer what portions of the genomic
sequence actively code for genes.
• There are four basic approaches for gene identification
L. Wei et. al. 2002
14
15. Comparison of gene content
• After the predicted gene set is generated, it is very interesting
and important to compare the content of genes across genomes
• The first statistics to compare is the estimated total number of
genes in a genome, elucidate the similarities and differences
between the genomes include percentage of the genome that
code for genes, distribution of coding regions across the
genome (a.k.a. gene density), average gene length, codon
usage
• This is often done using a pairwise sequence comparison tool
such as BLASTN or TBLASTX
15
16. Comparison of protein content
• A second level of analysis that can be performed is to compare
the set of gene products (protein) between the genomes, which
has been termed ‘‘comparative proteomics”
• It is important to compare the protein contents in critical
pathways and important functional categories across genomes
• Two widely used resources for pathways and functional
categories are the KEGG pathway database and the Gene
Ontology (GO) hierarchy
L. Wei et. al. 2002
16
17. cntd…
• Interesting statistics to compare include
– Level of sequence identity between orthologous pairs
across genome
– Paralogous pairs within genome,
– Number of replicated copies in corresponding paralog
families
– Functions of the paralogs
– Locations of members of paralog families across the
genome
17
18. Comparative genomics-based function
prediction
• functional assignment of genes in a non similarity-based
manner
• This rely on the basic premise that genes; that are functionally
related, are genes that are closely associated across genomes in
some form
• This include three methods:
– Co-conservation across genomes
– Conservation of gene clusters and genomic context across species
– Physical fusion of functionally linked genes across species (Domain
fusion analysis)
18
19. Comparative analysis of noncoding regions
• Noncoding regions of the genome gained a lot of attention in
recent years because of its predicted role in regulation of
transcription, DNA replication, and other biological functions
• This approach is based on the presumption that selective
pressure causes regulatory elements to evolve at a slower rate
than that of non regulatory sequences in the non coding
regions
19
20. Insights into Genome Fluxes and the Processes of Evolution
• From an evolutionary biology perspective, whole genome
comparisons provide molecular insights into the processes of
evolution that include the molecular events responsible for the
variations and fluxes that occur through a genome. These
include processes like, inversions, translocations, deletions,
duplications and insertions.
20
21. The Impact of Comparative Genomics in Phylogenetic Analysis
Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA
(SSU rRNA) as an early branching eukaryote that evolved prior to the acquisiton of
mitochondria, and it's subsequent placement based on a composite gene phylogeny where it
was placed closer to fungi. The latter placement has been confirmed by the complete
sequenceof the micro-sporidia, Encephalitozoon cuniculi, where despite the absence of
mitochondria, the presence of several mitochondrial genes could be observed.
21
22. Comparative Genomics in Drug Discovery
• Comparative genomic studies throw important light on the
pathogenesis of organisms, throwing up opportunities for
therapeutic intervention as well as help in understanding and
identifying disease genes
• One of the most important fallouts of comparative analyses at a
genome-wide scale is in the ability to identify and develop novel
drug targets
• If one is looking for antibacterial, antifungal, or antiprotozoal
proteins to be used as targets, comparative genome analysis can
reveal virulence genes, uncharacterized essential genes, species-
specific genes, organism-specific genes, while ensuring that the
chosen genes have no homologues in humans
22
23. Comparative genomics in drug discovery programs. A flow chart
diagram explaining how comparative genomics can facilitate drug
discovery programs for the discovery of new antimicrobials
23
24. Looking beyond…
• As comparative genomics moves from between kingdoms to
between genus to between species analysis, the next step is to
carry out comparisions between individuals or strains that are
members of a particular species
• This would allow us to investigate variations at the individual
level and to enable one to determine the propensity of an
individual to respond to a drug or to come down with a disease
or infection
24