SlideShare une entreprise Scribd logo
1  sur  33
COMPARATIVE GENOMICS
Presentation by,
ATHIRA RG
BBM051603
M.Sc. Biochemistry & Molecular Biology
 Comparative genomics involves a comprehensive
systematic comparison of genome sequences.
 It begins with powerful computer programs that
identify homologous regions within the genomes
under comparison.
 Sets of homologous sequences are then grouped
with their sequences aligned at the base-pair level
in an attempt to define whole genome sequence
alignments.
• Discover what lies hidden in genomic sequences
by comparing sequence information.
By comparing the human genome with the genomes of
different organisms, researchers can better understand the
structure and function of human genes and thereby develop
new strategies in the battle against human disease.
 In addition, comparative genomics provides a powerful
new tool for studying evolutionary changes among
organisms, helping to identify the genes that are conserved
among species along with the genes that give each
organism its own unique characteristics.
SOME QUESTIONS THAT COMPARATIVE GENOMICS CAN
ADDRESS?
How has the organism evolved?
What differentiates species?
Which non-coding regions are important?
Which genes are required for organisms to survive in a
certain environment?
PHYLOGENETIC DISTANCE
 Information that can be gained by
comparison of genomes largely dependent
upon the phylogenetic distances between
them.
 Phylogenetic distance is a measure of the
degree of separation b/w two organisms or
genomes on an evolutionary scale , usually
expressed as the number of accumulated
sequence changes, number of years or
number of generations
 More distance, less sequence similarity or
less shared genomic features.
Comparisons of Genomes at Different Phylogenetic Distances Are
Appropriate to Address Different Questions
 Broad insights about types of genes can be gleaned by genomic comparisons at very long
phylogenetic distances, e.g., greater than 1 billion years since their separation.
 For example, comparing the genomes of yeast, worms, and flies reveals that these
eukaryotes encode many of the same proteins, and the non-redundant protein sets of flies
and worms are about the same size, being only twice that of yeast.
 The more complex developmental biology of flies and worms is reflected in the greater
number of signaling pathways in these two species than in yeast.
 Over such very large distances, the order of genes and the sequences regulating their
expression are generally not conserved.
 At moderate phylogenetic distances (roughly 70–100 million years of divergence), both
functional and nonfunctional DNA is found within the conserved DNA.
 In these cases, the functional sequences will show a signature of purifying or negative
selection, which is that the functional sequences will have changed less than the
nonfunctional or neutral DNA (Jukes and Kimura 1984).
COMMONLY USED TOOLS
 UCSC Browser: This site contains the reference sequence and working draft
assemblies for a large collection of genomes.
 Ensembl: The Ensembl project produces genome databases for vertebrates and
other eukaryotic species, and makes this information freely available online.
 MapView: The Map Viewer provides a wide variety of genome mapping and
sequencing data.
 VISTA is a comprehensive suite of programs and databases for comparative
analysis of genomic sequences. It was built to visualize the results of
comparative analysis based on DNA alignments. The presentation of
comparative data generated by VISTA can easily suit both small and large scale
of data.
 BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale
viewing of annotated genomes and other genomic elements.
 Chromosome level
Number of genes
Genome size
Content (sequence)
Location (map position)
Gene Order
Gene Cluster (Genes that are part of a known metabolic pathway, are found
to exist as a group)
Translocation: movement of genomic part fromone position to another
HOW ARE GENOMES COMPARED ?
Different ways of comparison
Whole genome
Genome alignments
Synteny (gene
order conservation)
Anomalous regions
Gene-centric
Gene families
and unique genes
Gene clustering by
function
Gene sequence variations
Codon usage,
SNPs,
inDels,
pseudogenes
GENOME ALIGNMENT
 Alignment of DNA sequences is the core process in
comparative genomics.
 An alignment is a mapping of the nucleotides in one
sequence onto the nucleotides in the other sequence,
with gaps introduced into one or the other sequence to
increase the number of positions with matching
nucleotides.
 Several powerful alignment algorithms have been
developed to align two or more sequences.
Popular alignment programs such as BLAST and FASTA or the multiple alignment program Clustal
W are essentiallyoptimizedfor the alignment
Computational tools for genome-scale sequence alignment
 Human PKLR gene region compared
to the macaque, dog, mouse, chicken,
and zebrafish genomes
Numbers on the vertical axis represent the
proportion of identical nucleotides in a 100-
bp window for a point on the plot. Numbers
on the horizontal axis indicate the nucleotide
position from the beginning of the 12-
kilobase human genomic sequence. Peaks
shaded in blue correspond to the PKLR coding
regions. Peaks shaded in light blue correspond
to PKLR mRNA untranslated regions. Peaks
shaded in red correspond to conserved non-
coding regions (CNSs), defined as areas where
the average identity is > 75%. Alignment was
generated using the sequence comparison tool
VISTA (http://pipeline.lbl.gov).
GENOME
ALIGNMENT
 Notice the high degree of sequence similarity between human and macaque
(two primates) in both PKLR exons (blue) as well as introns (red) and
untranslated regions (light blue) of the gene.
 In contrast, the chicken and zebrafish alignments with human only show
similarity to sequences in the coding exons; the rest of the sequence has
diverged to a point where it can no longer be reliably aligned with the human
DNA sequence.
 Using such computer-based analysis to zero in on the genomic features that
have been preserved in multiple organisms over millions of years, researchers
are able to locate the signals that represent the location of genes, as well as
sequences that may regulate gene expression.
 Indeed, much of the functional parts of the human genome have been
discovered or verified by this type of sequence comparison (Lander et al. 2001)
and it is now a standard component of the analysis of every new genome
sequence.
Comparison of overall nucleotide statistics
• Overall nucleotide statistics, suchas
– Genome size,
– Overall (G+C) content,
– Regions of different (G+C) content,
– Genome signature such as codon usage biases,
– Amino acid usage biases, and the ratio of observed dinucleotide frequency
These all present a global view of the similarities and differences of the genomes
SYNTENY
 Refers to regions of two genomes that show considerable similarity in terms of
 sequence and
 conservation of the order of genes
likelyto be related by common descent.
By mapping of syntenic regions in corresponding genomes, genome rearrangement
events can be identifiedsuchas fission, translocation, inversion, and transposition
SYNTENY
Once syntenic regions are detected, one can obtain breakpoints(a.k.a. syntenicboundaries)
betweensyntenicregions.
Analysis of various genomicfeatures of the breakpoints such as G+C content, gene density,
and the density of various DNA repeats provides understanding of the evolution of
genomes.
For instance, Mural et al. observedsharpdiscontinuity of features aroundsome syntenic
boundaries but not others.
They hypothesizedthat syntenicboundaries that do not show sharp transitions in these
various features may provide evidence for conservation of the ancestral pattern in the
lineage.
Analysis Of Breakpoints
Homologs:
Genes that have the same ancestor; in general retain the same function
Orthologs:
Homologs from different species (arise from speciation)
Paralogs:
Homologs from the same species (arise from duplication)
 Duplication before speciation (ancient duplication) : Out-paralogs; may not
have the same function
 Duplication after speciation (recent duplication) : In-paralogs; likely to have
the same function
GENE CENTRIC COMPARISON
GENE CLUSTERS
 In prokaryotes, groups of functionally related genes tend to be
located in close proximity to each other, and often in specific order,
as exemplified by operons.
 Although gene order conservation beyond the level of operons is
much less prevalent, conservation of clusters and gene order can be
important indicators of function.
 Several approaches have been used to determine functionally
related ‘‘clusters’’ of genes.
 Overbeek et al. use the constructs of a ‘‘pair of close bidirectional
best hits’’ (PCBBH) and ‘‘pairs of close homologs’’ (PCHs) to
represent pairs of genes that are closely conserved between two
species and likely to be functionally related.
COGs
Cluster of orthologous genes.
 groups of threeor more orthologgenes,
 meaningtheyare direct evolutionarycounter parts and are considered to be part of an 'ancient conserved domain'.
 A COGis definedas threeor more proteins fromthe genomes of distant species that are more similarto each other than
to anyotherproteinwithin the individual genome.
 COGs can be used to predict the function of homologousproteins in poorly studied species and can alsobe used to track
the evolutionarydivergence froma common ancestor,
 hence providinga powerful toolfor functional annotation of uncharacterizedproteins.
 Important in comparative genomics studies
Application of COG
 The most straightforwardapplication of the COGs is for the predictionof functions of individual
proteins or proteinsets, including those fromnewly completedgenomes.
COG database
NCBI provides a COG databasethat consists of 4,873 COGs that code for over 13600
proteins fromthe genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes. This
database uses completely sequenced genomes to classify proteins using the orthologyconcept.
MBGD
 MBGDis a database for comparative analysis
Of completely sequenced microbial genomes,
the number of which is now growing rapidly.
 The aimof MBGDis to facilitatecomparative
genomics fromvarious points of viewsuchas
ortholog identification, paralog clustering,
motif analysis and gene order comparisons
COMPARATIVE ANALYSIS OF CODING
REGIONS
 typically involves the identification of gene-coding regions,
comparison of gene content, and comparison of protein content.
 Recently there have also been a number of algorithms developed that
use comparative genomics to aid function prediction of genes.
The analysis and comparison of the coding regions starts with, and is
very dependent upon, the gene identification algorithm that is used to
infer what portions of the genomic sequence actively code for genes.
A combination of multiple gene identification approaches are often used together in large-scale analysis to
improve the overall accuracy
COMPARATIVE ANALYSIS OF NON CODING
REGIONS
 Noncoding regions of the genome, which may comprise as much as 97%
of the genome length such as in the human genome, gained a lot of
attention in recent years because of its predicted role in regulation of
transcription, DNA replication, and other biological functions .
 However, identification of regulatory elements from the noncoding
portion of a genome remains a challenge.
 Comparative genomics has been used to greatly aid the identification of
regulatory segments by comparing the genomic noncoding DNA
sequences from diverse species to identify conserved regions .
 This approach is based on the presumption that selective pressure
causes regulatory elements to evolve at a slower rate than that of non
regulatory sequences in the noncoding regions.
ANALYSIS OF MUTATIONS
 Search and display of mutations within multiple alignments, with
discrimination between intergenic, synonymous, non-synonymous
and Indel mutations.
 Additional filtering based on SNP quality scores.
 Display colors based on mutation type or quality; sorting based on
position, gene, NA change, AA change, quality
 Direct clustering based upon mutations or export of mutation list
for further analysis.
 Nonfunctional protein coding genes
 Mutations introduce “sequence problems” (frameshifts, stop in frame, absence of stop)
PSEUDOGENES?
 “Normal” bacterial genomes have 1-5% of pseudogenes [Liu et al]
 Pseudogenes can give interesting clues to evolutionary pathways
 High fractions of pseudogenes suggest a “genome degradation” process
 May be cause or effect of niche restriction
 Examples
 Mycobacterium leprae: 36% (~1,100 genes)
 Leifsonia xyli subsp. xyli: 13% (~300 genes)
 Pseudogenes do not show up in BLAST searches
APPLICATIONS
Gene identification
 comparative genomics can aid gene identification. Comparative genomics can recognize real
genes based on their patterns of nucleotide conservation across evolutionary time. With the
availability of genome-wide alignments across the genomes compared, the different ways by
which sequences change in known genes and in intergenic regions can be analyzed. The
alignments of known genes will reveal the conservation of the reading frame of protein
translation.
Regulatory motif discovery
 Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to control the
expression of genes, dictating the conditions under which a gene will be turned on or off. Each
motif is typically recognized by a specific DNA-binding protein called a transcription factor (TF).
A transcription factor binds precise sites in the promoter region of target genes in a sequence-
specific way, but this contact can tolerate some degree of sequence variation. Comparative
genomics provides a powerful way to distinguish regulatory motifs from non-functional patterns
based on their conservation.
APPLICATIONS
 Comparative genomics has wide applications in the field of molecular
medicine and molecular evolution. The most significant application of
comparative genomics in molecular medicine is the identification of drug
targets of many infectious diseases. For example, comparative analyses of
fungal genomes have led to the identification of many putative targets for
novel antifungal. This discovery can aid in target based drug design to cure
fungal diseases in human.
 Comparative genomics also helps in the clustering of regulatory sites , which
can help in the recognition of unknown regulatory regions in other genomes.
The metabolic pathway regulation can also be recognized by means of
comparative genomics of a species.
 Agriculture is a field that reaps the benefits of comparative genomics.
Identifying the loci of advantageous genes is a key step in breeding crops
that are optimized for greater yield, cost-efficiency, quality, and disease
resistance.
Thank You

Contenu connexe

Tendances

Tendances (20)

SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Cath
CathCath
Cath
 
Genome mapping
Genome mapping Genome mapping
Genome mapping
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Genomics
GenomicsGenomics
Genomics
 
Est database
Est databaseEst database
Est database
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Sts
StsSts
Sts
 
Gene knockout
Gene knockoutGene knockout
Gene knockout
 
Scop database
Scop databaseScop database
Scop database
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion
 

Similaire à Comparative genomics

Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
rehman2009
 
genemappingppt-170209023430.pptx
genemappingppt-170209023430.pptxgenemappingppt-170209023430.pptx
genemappingppt-170209023430.pptx
HINDUJA20
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009
lyonja
 

Similaire à Comparative genomics (20)

Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
 
Gene mapping ppt
Gene mapping pptGene mapping ppt
Gene mapping ppt
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
 
genomics and system biology
genomics and system biologygenomics and system biology
genomics and system biology
 
Genomic variation
Genomic variationGenomic variation
Genomic variation
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
genemappingppt-170209023430.pptx
genemappingppt-170209023430.pptxgenemappingppt-170209023430.pptx
genemappingppt-170209023430.pptx
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Molecular markers
Molecular markersMolecular markers
Molecular markers
 
Mapping the bacteriophage genome
Mapping the bacteriophage genomeMapping the bacteriophage genome
Mapping the bacteriophage genome
 
Nature Of Gene.pdf
Nature Of Gene.pdfNature Of Gene.pdf
Nature Of Gene.pdf
 
Nature Of Gene.pdf
Nature Of Gene.pdfNature Of Gene.pdf
Nature Of Gene.pdf
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 

Dernier

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Dernier (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Comparative genomics

  • 1. COMPARATIVE GENOMICS Presentation by, ATHIRA RG BBM051603 M.Sc. Biochemistry & Molecular Biology
  • 2.  Comparative genomics involves a comprehensive systematic comparison of genome sequences.  It begins with powerful computer programs that identify homologous regions within the genomes under comparison.  Sets of homologous sequences are then grouped with their sequences aligned at the base-pair level in an attempt to define whole genome sequence alignments. • Discover what lies hidden in genomic sequences by comparing sequence information.
  • 3. By comparing the human genome with the genomes of different organisms, researchers can better understand the structure and function of human genes and thereby develop new strategies in the battle against human disease.  In addition, comparative genomics provides a powerful new tool for studying evolutionary changes among organisms, helping to identify the genes that are conserved among species along with the genes that give each organism its own unique characteristics.
  • 4. SOME QUESTIONS THAT COMPARATIVE GENOMICS CAN ADDRESS? How has the organism evolved? What differentiates species? Which non-coding regions are important? Which genes are required for organisms to survive in a certain environment?
  • 5. PHYLOGENETIC DISTANCE  Information that can be gained by comparison of genomes largely dependent upon the phylogenetic distances between them.  Phylogenetic distance is a measure of the degree of separation b/w two organisms or genomes on an evolutionary scale , usually expressed as the number of accumulated sequence changes, number of years or number of generations  More distance, less sequence similarity or less shared genomic features.
  • 6. Comparisons of Genomes at Different Phylogenetic Distances Are Appropriate to Address Different Questions
  • 7.  Broad insights about types of genes can be gleaned by genomic comparisons at very long phylogenetic distances, e.g., greater than 1 billion years since their separation.  For example, comparing the genomes of yeast, worms, and flies reveals that these eukaryotes encode many of the same proteins, and the non-redundant protein sets of flies and worms are about the same size, being only twice that of yeast.  The more complex developmental biology of flies and worms is reflected in the greater number of signaling pathways in these two species than in yeast.  Over such very large distances, the order of genes and the sequences regulating their expression are generally not conserved.  At moderate phylogenetic distances (roughly 70–100 million years of divergence), both functional and nonfunctional DNA is found within the conserved DNA.  In these cases, the functional sequences will show a signature of purifying or negative selection, which is that the functional sequences will have changed less than the nonfunctional or neutral DNA (Jukes and Kimura 1984).
  • 8. COMMONLY USED TOOLS  UCSC Browser: This site contains the reference sequence and working draft assemblies for a large collection of genomes.  Ensembl: The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.  MapView: The Map Viewer provides a wide variety of genome mapping and sequencing data.  VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. It was built to visualize the results of comparative analysis based on DNA alignments. The presentation of comparative data generated by VISTA can easily suit both small and large scale of data.  BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale viewing of annotated genomes and other genomic elements.
  • 9.
  • 10.  Chromosome level Number of genes Genome size Content (sequence) Location (map position) Gene Order Gene Cluster (Genes that are part of a known metabolic pathway, are found to exist as a group) Translocation: movement of genomic part fromone position to another HOW ARE GENOMES COMPARED ?
  • 11.
  • 12. Different ways of comparison Whole genome Genome alignments Synteny (gene order conservation) Anomalous regions Gene-centric Gene families and unique genes Gene clustering by function Gene sequence variations Codon usage, SNPs, inDels, pseudogenes
  • 13. GENOME ALIGNMENT  Alignment of DNA sequences is the core process in comparative genomics.  An alignment is a mapping of the nucleotides in one sequence onto the nucleotides in the other sequence, with gaps introduced into one or the other sequence to increase the number of positions with matching nucleotides.  Several powerful alignment algorithms have been developed to align two or more sequences. Popular alignment programs such as BLAST and FASTA or the multiple alignment program Clustal W are essentiallyoptimizedfor the alignment
  • 14. Computational tools for genome-scale sequence alignment
  • 15.  Human PKLR gene region compared to the macaque, dog, mouse, chicken, and zebrafish genomes Numbers on the vertical axis represent the proportion of identical nucleotides in a 100- bp window for a point on the plot. Numbers on the horizontal axis indicate the nucleotide position from the beginning of the 12- kilobase human genomic sequence. Peaks shaded in blue correspond to the PKLR coding regions. Peaks shaded in light blue correspond to PKLR mRNA untranslated regions. Peaks shaded in red correspond to conserved non- coding regions (CNSs), defined as areas where the average identity is > 75%. Alignment was generated using the sequence comparison tool VISTA (http://pipeline.lbl.gov). GENOME ALIGNMENT
  • 16.  Notice the high degree of sequence similarity between human and macaque (two primates) in both PKLR exons (blue) as well as introns (red) and untranslated regions (light blue) of the gene.  In contrast, the chicken and zebrafish alignments with human only show similarity to sequences in the coding exons; the rest of the sequence has diverged to a point where it can no longer be reliably aligned with the human DNA sequence.  Using such computer-based analysis to zero in on the genomic features that have been preserved in multiple organisms over millions of years, researchers are able to locate the signals that represent the location of genes, as well as sequences that may regulate gene expression.  Indeed, much of the functional parts of the human genome have been discovered or verified by this type of sequence comparison (Lander et al. 2001) and it is now a standard component of the analysis of every new genome sequence.
  • 17. Comparison of overall nucleotide statistics • Overall nucleotide statistics, suchas – Genome size, – Overall (G+C) content, – Regions of different (G+C) content, – Genome signature such as codon usage biases, – Amino acid usage biases, and the ratio of observed dinucleotide frequency These all present a global view of the similarities and differences of the genomes
  • 18. SYNTENY  Refers to regions of two genomes that show considerable similarity in terms of  sequence and  conservation of the order of genes likelyto be related by common descent. By mapping of syntenic regions in corresponding genomes, genome rearrangement events can be identifiedsuchas fission, translocation, inversion, and transposition
  • 20. Once syntenic regions are detected, one can obtain breakpoints(a.k.a. syntenicboundaries) betweensyntenicregions. Analysis of various genomicfeatures of the breakpoints such as G+C content, gene density, and the density of various DNA repeats provides understanding of the evolution of genomes. For instance, Mural et al. observedsharpdiscontinuity of features aroundsome syntenic boundaries but not others. They hypothesizedthat syntenicboundaries that do not show sharp transitions in these various features may provide evidence for conservation of the ancestral pattern in the lineage. Analysis Of Breakpoints
  • 21. Homologs: Genes that have the same ancestor; in general retain the same function Orthologs: Homologs from different species (arise from speciation) Paralogs: Homologs from the same species (arise from duplication)  Duplication before speciation (ancient duplication) : Out-paralogs; may not have the same function  Duplication after speciation (recent duplication) : In-paralogs; likely to have the same function GENE CENTRIC COMPARISON
  • 22. GENE CLUSTERS  In prokaryotes, groups of functionally related genes tend to be located in close proximity to each other, and often in specific order, as exemplified by operons.  Although gene order conservation beyond the level of operons is much less prevalent, conservation of clusters and gene order can be important indicators of function.  Several approaches have been used to determine functionally related ‘‘clusters’’ of genes.  Overbeek et al. use the constructs of a ‘‘pair of close bidirectional best hits’’ (PCBBH) and ‘‘pairs of close homologs’’ (PCHs) to represent pairs of genes that are closely conserved between two species and likely to be functionally related.
  • 23. COGs Cluster of orthologous genes.  groups of threeor more orthologgenes,  meaningtheyare direct evolutionarycounter parts and are considered to be part of an 'ancient conserved domain'.  A COGis definedas threeor more proteins fromthe genomes of distant species that are more similarto each other than to anyotherproteinwithin the individual genome.  COGs can be used to predict the function of homologousproteins in poorly studied species and can alsobe used to track the evolutionarydivergence froma common ancestor,  hence providinga powerful toolfor functional annotation of uncharacterizedproteins.  Important in comparative genomics studies
  • 24. Application of COG  The most straightforwardapplication of the COGs is for the predictionof functions of individual proteins or proteinsets, including those fromnewly completedgenomes. COG database NCBI provides a COG databasethat consists of 4,873 COGs that code for over 13600 proteins fromthe genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes. This database uses completely sequenced genomes to classify proteins using the orthologyconcept.
  • 25. MBGD  MBGDis a database for comparative analysis Of completely sequenced microbial genomes, the number of which is now growing rapidly.  The aimof MBGDis to facilitatecomparative genomics fromvarious points of viewsuchas ortholog identification, paralog clustering, motif analysis and gene order comparisons
  • 26. COMPARATIVE ANALYSIS OF CODING REGIONS  typically involves the identification of gene-coding regions, comparison of gene content, and comparison of protein content.  Recently there have also been a number of algorithms developed that use comparative genomics to aid function prediction of genes. The analysis and comparison of the coding regions starts with, and is very dependent upon, the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes.
  • 27. A combination of multiple gene identification approaches are often used together in large-scale analysis to improve the overall accuracy
  • 28. COMPARATIVE ANALYSIS OF NON CODING REGIONS  Noncoding regions of the genome, which may comprise as much as 97% of the genome length such as in the human genome, gained a lot of attention in recent years because of its predicted role in regulation of transcription, DNA replication, and other biological functions .  However, identification of regulatory elements from the noncoding portion of a genome remains a challenge.  Comparative genomics has been used to greatly aid the identification of regulatory segments by comparing the genomic noncoding DNA sequences from diverse species to identify conserved regions .  This approach is based on the presumption that selective pressure causes regulatory elements to evolve at a slower rate than that of non regulatory sequences in the noncoding regions.
  • 29. ANALYSIS OF MUTATIONS  Search and display of mutations within multiple alignments, with discrimination between intergenic, synonymous, non-synonymous and Indel mutations.  Additional filtering based on SNP quality scores.  Display colors based on mutation type or quality; sorting based on position, gene, NA change, AA change, quality  Direct clustering based upon mutations or export of mutation list for further analysis.
  • 30.  Nonfunctional protein coding genes  Mutations introduce “sequence problems” (frameshifts, stop in frame, absence of stop) PSEUDOGENES?  “Normal” bacterial genomes have 1-5% of pseudogenes [Liu et al]  Pseudogenes can give interesting clues to evolutionary pathways  High fractions of pseudogenes suggest a “genome degradation” process  May be cause or effect of niche restriction  Examples  Mycobacterium leprae: 36% (~1,100 genes)  Leifsonia xyli subsp. xyli: 13% (~300 genes)  Pseudogenes do not show up in BLAST searches
  • 31. APPLICATIONS Gene identification  comparative genomics can aid gene identification. Comparative genomics can recognize real genes based on their patterns of nucleotide conservation across evolutionary time. With the availability of genome-wide alignments across the genomes compared, the different ways by which sequences change in known genes and in intergenic regions can be analyzed. The alignments of known genes will reveal the conservation of the reading frame of protein translation. Regulatory motif discovery  Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to control the expression of genes, dictating the conditions under which a gene will be turned on or off. Each motif is typically recognized by a specific DNA-binding protein called a transcription factor (TF). A transcription factor binds precise sites in the promoter region of target genes in a sequence- specific way, but this contact can tolerate some degree of sequence variation. Comparative genomics provides a powerful way to distinguish regulatory motifs from non-functional patterns based on their conservation.
  • 32. APPLICATIONS  Comparative genomics has wide applications in the field of molecular medicine and molecular evolution. The most significant application of comparative genomics in molecular medicine is the identification of drug targets of many infectious diseases. For example, comparative analyses of fungal genomes have led to the identification of many putative targets for novel antifungal. This discovery can aid in target based drug design to cure fungal diseases in human.  Comparative genomics also helps in the clustering of regulatory sites , which can help in the recognition of unknown regulatory regions in other genomes. The metabolic pathway regulation can also be recognized by means of comparative genomics of a species.  Agriculture is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance.