SlideShare a Scribd company logo
1 of 49
Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General features of genomes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bacterial genome organisation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of a genome project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert  with two primers Pick colonies to create shotgun library bacterial  chromosome plasmid vector Plasmid preps
High-throughput Sequencing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-Throughput Shotgun Sequencing Random shearing Size selection bacterial  chromosome Add adapters Amplify Sequence
Illumina Sequencing
The Sequence Assembly Problem ,[object Object],[object Object],[object Object]
The Repeat Problem ,[object Object],ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT Correct Incorrect
Paired-end Sequencing Random shearing Size selection for 3kb or 8kb etc bacterial  chromosome Add linkers Circularise Shear and select on size and presence of linkers Add adapters Obtain sequences from either side of linker known distance apart in genome ,[object Object],[object Object],[object Object],[object Object]
Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
Re-sequencing ,[object Object],[object Object],[object Object]
SNP calling ,[object Object],[object Object],[object Object]
Genome annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to go from this….? ,[object Object],[object Object]
… to this? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Or this?
Caveat ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sources of information for annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approaches to functional annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
Analysis of nucleotide sequence data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gene Finding in bacteria ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying protein-coding sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The problem of conflicting ORFs Non-coding ORFs CDSs  (note ORF can extend upstream of start codon)
The Problem of Frameshift Errors Actual sequence 10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M  S  T  A  K  L  V  K  S  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K  •  V  P  L  N  •  L  N  Q  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  E  Y  R  •  I  S  •  I  K  S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  K  10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M  S  T  A  K  L  V  K  S  K   S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  •  V  P  L  N  •  L  N  Q  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K   E  Y  R  •  I  S  •  I  K  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  Frameshifted sequence after single base error
CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
CDS Prediction: Markov Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Annotation of protein-coding genes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object],the cat  sat  on  the mat  die Katze sass auf der Matte vge|GBant88-2  ITLITCVSVKDNSKRYVVAG vge|GEfae9-178  LTLITCDQATKTTGRIIVIA vge|GSpne1-403  MTLITCDPIPTFNKRLLVNF sortase_staur  LTLITCDDYNEKTGVWEKRK
Types of Homology ,[object Object],[object Object],[object Object],[object Object]
Homology Searches ,[object Object],[object Object],[object Object],[object Object],[object Object]
What is BLAST? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The several flavours of BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chosing the right flavour ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Low complexity filtering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Understanding BLAST Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
Typical Blast Output Sum Reading  High Probability Sequences producing High-scoring Segment Pairs:  Frame  Score  P(N)  N emb|X69337|ECDPS  E.coli dps gene for binding protein  +2  834  6.4e-109  1 gb|U04242|ECU04242  Escherichia coli core starvation p... +3  828  2.7e-106  1 emb|X14180|ECGLNHPQ  Escherichia coli glutamine permeas... +3  443  2.8e-53  1 gb|U18769|HDU18769  Haemophilus ducreyi fine tangled p... +1  150  4.0e-18  2 dbj|D01016|ANALTI46  Anabaena variabilis lti46 gene. >e... +2  129  4.8e-12  2 gb|M84990|P26BPO  Plasmid pOP2621 ORF1 gene, 5' end;... -2  131  6.7e-09  1 gb|U16121|HPU16121  Helicobacter pylori neutrophil act... +1  112  1.8e-06  1 gb|M32401|TRPTYF1  T.pallidum pallidum antigen TyF1 g... +3  101  5.6e-06  2 emb|X71436|RPNTRB  R.phaseoli ntrB gene +1  67  0.76  2 gb|L35598|DRODGC1A  Drosophila melanogaster receptor g... +1  48  0.97  3
Typical Blast Output gb|U18769|HDU18769  Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query:  30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L  ++  +L+LI K AHWN+ G  FIAVHEMLD  + D +D +AER  LG  Sbjct:  253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query:  90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G +  +  YPL  QDHLK Sbjct:  433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
Domain database searches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pfam domains
Pfam search results
The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
Annotation: rules to consider ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Lectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iLectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iRishabh Jain
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.dbskkv
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEMohit Roy
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysissaberhussain9
 
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )UTTARAN MODHUKALYA
 
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectRecombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectNateneal Tamerat
 

What's hot (20)

Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Lectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iLectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-i
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Vntr marker
Vntr markerVntr marker
Vntr marker
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.
 
dna sequencing methods
 dna sequencing methods dna sequencing methods
dna sequencing methods
 
Probe labelling
Probe labellingProbe labelling
Probe labelling
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINE
 
DNA Library
DNA LibraryDNA Library
DNA Library
 
PPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOMEPPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOME
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
 
Cloning vectors
Cloning vectorsCloning vectors
Cloning vectors
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Non-PCR-based Molecular Methods
Non-PCR-based Molecular MethodsNon-PCR-based Molecular Methods
Non-PCR-based Molecular Methods
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectRecombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
 

Viewers also liked

Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliMark Pallen
 
Escherichia coli
Escherichia coliEscherichia coli
Escherichia coliBritni Bell
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkodamaths00001
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ? Jeremaya
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal viewMark Pallen
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaMark Pallen
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland intervieweeskatyfleury
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 evofenedex
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadMark Pallen
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentationmhaimel
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsMark Pallen
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyMark Pallen
 
EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2evofenedex
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanMark Pallen
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power pointmaths00001
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsMark Pallen
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 

Viewers also liked (20)

Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coli
 
Escherichia coli
Escherichia coliEscherichia coli
Escherichia coli
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
 
Postgresql 9.3-a4
Postgresql 9.3-a4Postgresql 9.3-a4
Postgresql 9.3-a4
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal view
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafrica
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
 
Ducky momo
Ducky momoDucky momo
Ducky momo
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the dead
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer Phylogenomics
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
 
EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power point
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populations
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 

Similar to Bio305 genome analysis and annotation 2012

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02t7260678
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfCRISTIANALONSORODRIG1
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPrabhatSingh628463
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Integrated DNA Technologies
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012Koppolu Ravi
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGedifewGebrie
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析Monascus2008
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencingPeter Egorov
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariTahura Mariyam Ansari
 
Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891saurabh verma
 

Similar to Bio305 genome analysis and annotation 2012 (20)

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
 
PCR
PCRPCR
PCR
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
 
Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891
 

More from Mark Pallen

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionMark Pallen
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Mark Pallen
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensMark Pallen
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Mark Pallen
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionMark Pallen
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsMark Pallen
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest RelativeMark Pallen
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaMark Pallen
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneMark Pallen
 

More from Mark Pallen (9)

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of Evolution
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infection
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging Infections
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering Gene
 

Recently uploaded

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Recently uploaded (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Bio305 genome analysis and annotation 2012

  • 1. Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert with two primers Pick colonies to create shotgun library bacterial chromosome plasmid vector Plasmid preps
  • 7.
  • 8. High-Throughput Shotgun Sequencing Random shearing Size selection bacterial chromosome Add adapters Amplify Sequence
  • 10.
  • 11.
  • 12.
  • 13. Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 21.
  • 22.
  • 23. Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
  • 24.
  • 25.
  • 26.
  • 27. The problem of conflicting ORFs Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 28. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 29. CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
  • 42. Typical Blast Output Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N emb|X69337|ECDPS E.coli dps gene for binding protein +2 834 6.4e-109 1 gb|U04242|ECU04242 Escherichia coli core starvation p... +3 828 2.7e-106 1 emb|X14180|ECGLNHPQ Escherichia coli glutamine permeas... +3 443 2.8e-53 1 gb|U18769|HDU18769 Haemophilus ducreyi fine tangled p... +1 150 4.0e-18 2 dbj|D01016|ANALTI46 Anabaena variabilis lti46 gene. >e... +2 129 4.8e-12 2 gb|M84990|P26BPO Plasmid pOP2621 ORF1 gene, 5' end;... -2 131 6.7e-09 1 gb|U16121|HPU16121 Helicobacter pylori neutrophil act... +1 112 1.8e-06 1 gb|M32401|TRPTYF1 T.pallidum pallidum antigen TyF1 g... +3 101 5.6e-06 2 emb|X71436|RPNTRB R.phaseoli ntrB gene +1 67 0.76 2 gb|L35598|DRODGC1A Drosophila melanogaster receptor g... +1 48 0.97 3
  • 43. Typical Blast Output gb|U18769|HDU18769 Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query: 30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L ++ +L+LI K AHWN+ G FIAVHEMLD + D +D +AER LG Sbjct: 253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query: 90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G + + YPL QDHLK Sbjct: 433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
  • 44.
  • 47. The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
  • 48.
  • 49.

Editor's Notes

  1. 12
  2. 13