2. INTRODUCTION
1977 - first complete genome to be sequenced was
bacteriophage X174 - 5386 bp
1995 - first complete genome sequence from a free living
organism - Haemophilus influenzae (1.83 Mb) by whole
genome shotgun approach
Sanger & Coulson (1977) - used chain-terminating
dideoxynucleotide analogues
Maxam & Gilbert (1977) chemical degradation DNA
sequencing - terminally labeled DNA fragments were
chemically cleaved at specific bases and separated by gel
electrophoresis
4. ARCHON X PRIZE
X PRIZE Foundation in Santa Monica, CA, has
introduced the Archon X PRIZE for Genomics and will
award a sum of $10 million to the first team that can
design a system capable of sequencing 100 human
genomes in 10 days
5. SEQUENCING TECHNOLOGY
First generation
Sanger’s dideoxy chain terminating tech
Maxam & Gilbert chemical degradation tech
Next generation sequencing (NGS)
454/Roche - pyrosequencing
Illumina/ Solexa - reversible dye terminators
SOLiD /ABI- sequential ligation of oligonucleotide probes
Second generation HT-NGS – sequencing after amplification
6.
7. Heliscope
SMRT (Pacific biosciences)
Single molecule real time (RNAP) sequencer
Nanopore DNA sequencer
Ion Torrent sequencing technology (PostLight)
VisiGen biotechnologies – FRET
Advantages of 3rd generation HT-NGS over 2nd
higher throughput
faster turnaround time
longer read lengths
higher consensus accuracy
small amounts of starting material
low cost
Third
generation
HT-NGS -
Single
molecule
sequencing
8.
9. ADVANTAGES OF HT-NGS
Massive parallel sequencing of hundreds of thousands
or millions of templates
Preliminary and tedious cloning work is eliminated and
substituted by PCR amplification
Most recent technologies, even PCR is eliminated,
because single DNA molecules
Economic
Reduced time
10. DISADVANTAGES OF HT-NGS
Most NGSTs produce short reads
Constructions of fragment libraries remain tricky and
involve several steps of fragmentation, adaptor ligation
and PCR amplification
Short homopolymers with the 454 technology
Modified nucleotides cause mis-incorporation or block
further incorporation if the florescent moiety cannot be
completely removed
Assembly of short reads into longer sequences
18. GENOME ASSEMBLY
Assemblers can join sequences together based on
overlapping regions between the sequences
Composed of contigs and scaffolds
Contigs - contiguous consensus sequences that are
derived from collections of overlapping reads
Scaffolds - ordered and orientated sets of contigs that are
linked to one another by mate pairs of sequencing reads
N50 - basic statistic for describing the contiguity of a
genome assembly. The longer the N50 is, the better the
assembly
19. Alignment against a reference genome sequence
De novo assembly Construction of longer sequences, such
as contigs or genomes, from shorter sequences, such as
sequence reads, without prior knowledge of the order of
the reads or reference to a closely related sequence
20. GENE PREDICTION
Ab initio gene prediction - mathematical models
rather than external evidence (such as EST and
protein alignments) to identify genes and to
determine their intron–exon structures
Evidence-driven gene prediction - using ESTs, can
be used to identify exon boundaries
unambiguously. Great potential to improve the
quality of gene prediction in newly sequenced
genomes. ESTs and proteins must first be aligned
to the genome
Commonly used tools for gene prediction in
prokaryotes Glimmer, GeneMark
21. GENOME ANNOTATION
Is the extraction of biological knowledge from raw
nucleotide sequences
Seeks to identify every potential protein coding gene
(ORFs)
Used to compare in available database like BlastP
‘Structural’ genome annotation is the process of identifying
genes and their intron–exon structures
‘Functional’ genome annotation is the process of attaching
meta-data such as gene ontology terms to structural
annotations
22.
23.
24. APPLICATIONS
Very large no of short reads help to identify single nucleotide
polymorphisms (SNP) when comparing them in reference
genome
Identification of rearrangements, deletions, insertions,
inversions
Used to generate expressed sequence tags (EST) from RNA
sequencing
Also to detect small regulatory RNAs
Illumia technoloy - ChIP Seq to study protein - DNA
interactions
Metagenomics
25. LEADS TO DEVELOPMENT
Functional genomics
Comparative genomics
Environmental genomics (Metagenomics)
26. FUNCTIONAL GENOMICS
Reveals genome structure and its functional relation
Orthologs - they represent genes derived from a common
ancestor that diverged because of divergence of the
organism, tend to have similar function
Paralogs are homologs produced by gene duplication and
represent genes derived from a common ancestral gene
that duplicated within an organism and then diverged, tend
to have different functions
Xenologs are homologs resulting from the horizontal
transfer of a gene between two organisms. The function of
xenologs can be variable, depending on how significant the
change in context was for the horizontally moving gene. In
general, though, the function tends to be similar
27. PHYLOGENETIC ANALYSIS
Phylogenetic trees, which are used to classify the
evolutionary relationships between homologous
genes represented in the genomes of divergent species
Internal Nodes or
Divergence Points
Branches or
Lineages A
B
C
D
E
Terminal Nodes
Ancestral Node
or ROOT of
the Tree
28. COMPARATIVE GENOMICS
Comparison of genome sequences reveals much
information about genome structure and evolution,
including importance of lateral gene transfer
Tool to discover how microbs adapted to particular
ecology and in development of new therapeutic
agents
29. METAGENOMICS
Genomics-based study of genetic material
recovered directly from environmentally derived
samples without laboratory culture and compared
with all previously sequenced genes
Enable how microbs adapt extreme environments
which help to discover new metabolic pathway and
protective mechanisms
30. IMPACT OF GENOME SEQUENCING
Revealed genome reduction in I/C bacteria
Genome plasticity (rearrangements, mobile elements)
Gene duplication and diversification of protein function
Lateral gene transfer & acquisition of new functions
Adaptation to environments, virulence
Industrial process - fermentation tech,
Bioremediation
Biotransformation
Development of vaccines
Bacterial diversity
Synthetic biology
Epigenetics
31. REVERSE VACCINOLOGY
Use of genomic sequence information to identify novel
and better suited protein candidates for vaccine
Serogroup B Neisseria meningitidis – based on
genomic data all proteins predicted to be surface
exposed, therefore accessible to antiobodies
Suitable candidates selected after sequencing various
strains
Streptococcus agalactiae
Pan-genome composed of core genome, the genes
present in all sequence strains and the dispensable
genome made of genes present in a subset of strains
32. Synthetic biology - from sequence of entire genome to
synthesize genes de novo
Identification of minimal genome, the smallest set of
genes that enbles life - Mycoplasma genitalium
33. DATABASES AND TOOLS RELATED WITH BACTERIAL
GENOMIC DATA
NCBI Entrez Genome Project database:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = genomeprj
A searchable collection of complete and incomplete (in-progress)
large-scale sequencing, assembly, annotation, and mapping projects
for cellular organisms
NCBI, Bacteria Genome Database:
http://www.ncbi.nlm.nih.gov/genomes/static/eub.html
The Genome database provides views for a variety of genomes,
complete chromosomes, sequence maps with contigs, and integrated
genetic and physical maps
Bacterial Genomes at The Sanger Institute:
• http://www.sanger.ac.uk/Projects/Microbes/
• This web contains a list of funded, on-going, or completed projects of
pathogens sequenced at this institute
TIGR Comprehensive Microbial Resource (CMR):
http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
A free website displaying information on all the publicly available,
complete prokaryotic genomes
34. GOLD: Genomes OnLine Database:
http://www.genomesonline.org/
A genome database containing information about which genomes have
been sequenced or are in progress
Microbial Genome Database for Comparative Analysis (MBGD):
http://mbgd.genome.ad.jp/
A database for comparative analysis of completely sequenced microbial
genomes
Virulence Factors of Bacterial Pathogens (VFDB):
http://zdsys.chgb.org.cn/VFs/main.htm
VFDB is an integrated and comprehensive database of virulence
factors for bacterial pathogens
Genome Information Broker:
http://gib.genes.nig.ac.jp/
A comprehensive data repository of complete microbial genomes in the
public domain. Many microbial genomes can be explored graphically
Islander, a Database of Genomic Islands:
http://www.indiana.edu/~islander
This database contains genomic islands discovered in completely
sequenced bacterial genomes
35. GenoList genome browser at Institute Pasteur:
http://genolist.pasteur.fr/
Contains access to diverse genome browsers of pathogenic
bacteria
IslandPath:
http://www.pathogenomics.sfu.ca/islandpath/update/IPindex.pl
An aid to the identification of genomic islands, including
pathogenicity islands, of potentially horizontally transferred genes
HGT-DB:
http://www.tinet.org/~debb/HGT/
A database containing the prediction of horizontally transferred
genes in several prokaryotic complete genomes
E. coli genome project:
http://www.genome.wisc.edu
A site devoted to the E. coli genome project with an updated
annotation of the genome