SlideShare une entreprise Scribd logo
1  sur  46
From Sequence to Knowledge:
The Art & Science of Phage
Genome Annotation
Ramy K. Aziz – Cairo University
From Sequence to
Knowledge:
PhAnToMe, RAST, and the
Ultimate Kropinski Toolkit
A helping hand through
The Annotation Bottleneck
Compiled by: Andrew Kropinski and Ramy Aziz
Online material
• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2016
– http://bit.ly/phantome4
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (Karin)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
21 July 2016 Phage Genomics - VoM 2016
INTRODUCTION
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster than
we can analyze them.
• Opinion:
– Bottlenecks are not
created equal!
– It is important to define the question(s)
before working on the answer(s)!
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
Quick group activity
Defining the question(s):
• How many of you have annotated a
genome?
• How many phage genomes have you
sequenced (or are in the process of
sequencing)?
a) None b) 1-5 c) 5-50 d) > 50
• What is the single most pressing question
you want to answer from genome analysis?
21 July 2016 Phage Genomics - VoM 2016
DEFINING THE QUESTION(S)
“Begin with the end in mind” (Covey, the 7 habits)
21 July 2016 Phage Genomics - VoM 2016
What You Want
The goal:
 complete
 accurate
Incomplete:
 genome
termini Faulty assembly
Frameshift
 chimeric
fragments21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
21 July 2016 Phage Genomics - VoM 2016
Annotation  Reconstruction
from genome from metagenome
21 July 2016 Phage Genomics - VoM 2016
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
Annotation  Reconstruction
from genome from metagenome
21 July 2016
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - VoM 2016
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
• Experimentally
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
• Experimentally
• Computationally
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Countless tools
21 July 2016 Phage Genomics - VoM 2016
Authority figures
Andrew Kropinski Rob Lavigne
21 July 2016 Phage Genomics - VoM 2016
Rob Edwards
General outline
• Part I: The “Kropinski toolkit”
– Tools approved and recommended by Andrew
Kropinski (http://molbiol-tools.ca): from seq to pub
• Part II: SEED-based tools:
– The RAST family
– The PhAnToMe database/portal
21 July 2016 Phage Genomics - VoM 2016
The Kropinski Toolkit
21 July 2016 Phage Genomics - VoM 2016
What we want, according to Andrew
Well characterized genome, in which, ideally we
know:
 the location & function of all the genes
 the location of promoters & terminators
 the correct taxonomy
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae;
T1virus
21 July 2016 Phage Genomics - VoM 2016
Desired outcome: Create GenBank
submission
• Complete, accurate description of the
genome and its taxonomy
Good title
Desired outcome (2)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (3)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (4)
 Protein products of concern, particularly
for those interested in phage therapy:
 Integrases
 Toxins
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
21 July 2016 Phage Genomics - VoM 2016
Processes and Steps
I. Primary analysis
(QC/ pre-annotation proofreading: e.g., orient with BLASTN)
II. Genome annotation
– Gene finding (ORF calling)
– Automated annotation
– Massaging (edition, functional assignment)
III. Second dimension (regulatory elements)
IV. Comparative genomics
V. Metadata
VI. Visualization
21 July 2016 Phage Genomics - VoM 2016
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
AUTOMATED ANNOTATION
II. Genome Annotation
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• Will be the major focus of this short
tutorial…
• The goal is:
1. Quick demo how to use RAST
2. Quick preview batch annotation in RAST
3. Optimize RAST for phage annotation
4. Demonstrate & discuss how to improve
RAST output
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• But,
before getting there …
21 July 2016 Phage Genomics - VoM 2016
The Kropinski wisdom
1. Always use more than one tool
2. Never blindly trust any automated (or manual)
process
3. Use your eyes and hands: visual inspection/
manual proofreading, re-annotation
– Every apparently complicated file is still editable on
your favorite text editor (e.g., NotePad)
4. If you don’t know a gene’s function (if you
can’t convince your grandma), better keep it
unnamed than contribute to error propagation
2 Aug 2015 Phage Genomics - Evergreen 2015
What do I call my gene product
(i.e. protein)?
 “phage hypothetical protein” – redundant
 “gp87” (gp = gene product)  hypothetical protein
 gp200 describes radically different proteins in
Listeria, Enterococcus, Mycobacterium,
Rhodococcus, Sphingomonas, Pseudomonas,
• Bacillus and Synechococcus phage genomes
 Add /note=“similar to gp43 of Escherichia coli
phage T4”
21 July 2016 Phage Genomics - VoM 2016
What do I call my gene product
(i.e. protein)?
 /product=“UboA”; “NrdA”; “hypothetical protein
SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”;
"32 kDa protein”
 Bad because they don`t mean anything to the casual (or
informed) reader.
 Unless you are a bioinformatician or biostatistician be
conservative in recording “hits.” Could you convince your
grandmda?, if not list as a “hypothetical protein” but do take
a stand “putative DNA polymerase” is cowardly
21 July 2016 Phage Genomics - VoM 2016
Nomenclature Sins
 hypothetical protein  DNA polymerase with no
or poor quality evidence is far worse than:
 DNA polymerase  hypothetical protein
 Be cautious about using BLASTP hits in naming
gps – is there additional evidence to back the
designation up
21 July 2016 Phage Genomics - VoM 2016
Consistent Nomenclature
 All of these describe homologs of the
product of the coliphage T4 rIIA gene!
rIIA protector from prophage-induced early lysis
protector from prophage-induced early lysis
protector from prophage-induced early lysis rIIA
membrane-associated affects host membrane ATPase
rIIA membrane-associated affects host membrane ATPase
phage rIIA lysis inhibitor
rIIA protector
rIIA
rIIA protein
membrane integrity protector
hypothetical protein
unnamed protein product !!!!!!
protein of unknown function
21 July 2016 Phage Genomics - VoM 2016
Bottom line:
Manual vs. Automated
• “Turtles know the road better than
rabbits… ” Khalil Gibran
• “… but they may never reach the end!”
• The best approach?
– Human expert-based annotation
2 Aug 2015 Phage Genomics - Evergreen 2015
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
IV. COMPARATIVE GENOMICS
Genomic pairwise comparisons
 EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi-
bin/emboss/stretcher N.B. genomes must be collinear
 BLASTN - NCBI
 ANI (Average Nucleotide Identity):http://enve-
omics.ce.gatech.edu/ani/
 GGDC 2.0 (Genome to Genome Distance Calculator):
http://ggdc.dsmz.de/distcalc2.php
 jSpeciesWS –
ANI:http://jspecies.ribohost.com/jspeciesws/
Proteomic pairwise
comparisons
 CoreGenes –
(http://binf.gmu.edu:8080/CoreGenes3.0/)
 TBLASTX
 Remember protein sequence is more conserved
than DNA sequence; probably useful for more
distant relationships
VI. “POLISH” IT TO PUBLISH IT
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Servers & software
 BLAST Ring Image Generator (http://brig.sourceforge.net)
 CGView (http://wishart.biology.ualberta.ca/cgview)
 CGView Comparison Tool:
http://stothard.afns.ualberta.ca/downloads/CCT
 Circos (http://circos.ca)
 DNAPlotter:
(http://www.sanger.ac.uk/science/tools/dnaplotter)
 Easyfig (http://easyfig.sourceforge.net)
 GenomeVx (http://wolfe.ucd.ie/GenomeVx)
 GView Server (https://server.gview.ca)
 progressiveMauve and ACT
EasyFig
CGView Comparison Tool
BLAST Ring Image Generator

Contenu connexe

Similaire à From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)Ramy K. Aziz
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)Ramy K. Aziz
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...Ramy K. Aziz
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Ramy K. Aziz
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016solgenomics
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxkarlos64
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxPRIYANKAZALA9
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...QIAGEN
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...Ramy K. Aziz
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGenomeInABottle
 

Similaire à From Sequence to Knowledge: The Art and Science of Phage Genome Annotation (20)

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 

Plus de Ramy K. Aziz

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationRamy K. Aziz
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...Ramy K. Aziz
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...Ramy K. Aziz
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving FeedbackRamy K. Aziz
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011Ramy K. Aziz
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Ramy K. Aziz
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speakRamy K. Aziz
 

Plus de Ramy K. Aziz (9)

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving Feedback
 
FootballOmics
FootballOmicsFootballOmics
FootballOmics
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speak
 
Rka nxt 2010_web
Rka nxt 2010_webRka nxt 2010_web
Rka nxt 2010_web
 

Dernier

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 

Dernier (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 

From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

  • 1. From Sequence to Knowledge: The Art & Science of Phage Genome Annotation Ramy K. Aziz – Cairo University
  • 2. From Sequence to Knowledge: PhAnToMe, RAST, and the Ultimate Kropinski Toolkit A helping hand through The Annotation Bottleneck Compiled by: Andrew Kropinski and Ramy Aziz
  • 3. Online material • Data & links: – http://egybio.net/tutorial • Slides – http://bit.ly/annotation2016 – http://bit.ly/phantome4 – Old tutorials (more detailed, but missing latest ): • Evergreen 2011: http://slidesha.re/phantome1 • http://slidesha.re/phiRAST1 (Karin) • Evergreen 2013: http://bit.ly/phantome2 • Evergreen 2015: http://bit.ly/phantome3 21 July 2016 Phage Genomics - VoM 2016
  • 4. INTRODUCTION 21 July 2016 Phage Genomics - VoM 2016
  • 5. “The analysis bottleneck” • Observation: – We generate more data than we can analyze. – We generate sequence data faster than we can analyze them. • Opinion: – Bottlenecks are not created equal! – It is important to define the question(s) before working on the answer(s)! 21 July 2016 Phage Genomics - VoM 2016
  • 6. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 7. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 8. Quick group activity Defining the question(s): • How many of you have annotated a genome? • How many phage genomes have you sequenced (or are in the process of sequencing)? a) None b) 1-5 c) 5-50 d) > 50 • What is the single most pressing question you want to answer from genome analysis? 21 July 2016 Phage Genomics - VoM 2016
  • 9. DEFINING THE QUESTION(S) “Begin with the end in mind” (Covey, the 7 habits) 21 July 2016 Phage Genomics - VoM 2016
  • 10. What You Want The goal:  complete  accurate Incomplete:  genome termini Faulty assembly Frameshift  chimeric fragments21 July 2016 Phage Genomics - VoM 2016
  • 11. A process of reconstruction 21 July 2016 Phage Genomics - VoM 2016
  • 12. Annotation  Reconstruction from genome from metagenome 21 July 2016 Phage Genomics - VoM 2016 Incomplete frameshift - complete - accurate Credit: Andrew Kropinski Credit: Bas Dutilh faulty assembly
  • 13. Annotation  Reconstruction from genome from metagenome 21 July 2016 Incomplete faulty assembly frameshift - complete - accurate Phage Genomics - VoM 2016 Credit: Andrew Kropinski Credit: Bas Dutilh
  • 14. A process of reconstruction • Experimentally DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016
  • 15. A process of reconstruction • Experimentally • Computationally TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016 DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG
  • 16. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 17. Countless tools 21 July 2016 Phage Genomics - VoM 2016
  • 18. Authority figures Andrew Kropinski Rob Lavigne 21 July 2016 Phage Genomics - VoM 2016 Rob Edwards
  • 19. General outline • Part I: The “Kropinski toolkit” – Tools approved and recommended by Andrew Kropinski (http://molbiol-tools.ca): from seq to pub • Part II: SEED-based tools: – The RAST family – The PhAnToMe database/portal 21 July 2016 Phage Genomics - VoM 2016
  • 20. The Kropinski Toolkit 21 July 2016 Phage Genomics - VoM 2016
  • 21. What we want, according to Andrew Well characterized genome, in which, ideally we know:  the location & function of all the genes  the location of promoters & terminators  the correct taxonomy PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae; T1virus 21 July 2016 Phage Genomics - VoM 2016
  • 22. Desired outcome: Create GenBank submission • Complete, accurate description of the genome and its taxonomy Good title
  • 23. Desired outcome (2) 21 July 2016 Phage Genomics - VoM 2016
  • 24. Desired outcome (3) 21 July 2016 Phage Genomics - VoM 2016
  • 25. Desired outcome (4)  Protein products of concern, particularly for those interested in phage therapy:  Integrases  Toxins PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb 21 July 2016 Phage Genomics - VoM 2016
  • 26. Processes and Steps I. Primary analysis (QC/ pre-annotation proofreading: e.g., orient with BLASTN) II. Genome annotation – Gene finding (ORF calling) – Automated annotation – Massaging (edition, functional assignment) III. Second dimension (regulatory elements) IV. Comparative genomics V. Metadata VI. Visualization 21 July 2016 Phage Genomics - VoM 2016
  • 27. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 28. AUTOMATED ANNOTATION II. Genome Annotation 21 July 2016 Phage Genomics - VoM 2016
  • 29. RAST (subsystems-based tools) • Will be the major focus of this short tutorial… • The goal is: 1. Quick demo how to use RAST 2. Quick preview batch annotation in RAST 3. Optimize RAST for phage annotation 4. Demonstrate & discuss how to improve RAST output 21 July 2016 Phage Genomics - VoM 2016
  • 30. RAST (subsystems-based tools) • But, before getting there … 21 July 2016 Phage Genomics - VoM 2016
  • 31. The Kropinski wisdom 1. Always use more than one tool 2. Never blindly trust any automated (or manual) process 3. Use your eyes and hands: visual inspection/ manual proofreading, re-annotation – Every apparently complicated file is still editable on your favorite text editor (e.g., NotePad) 4. If you don’t know a gene’s function (if you can’t convince your grandma), better keep it unnamed than contribute to error propagation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 32. What do I call my gene product (i.e. protein)?  “phage hypothetical protein” – redundant  “gp87” (gp = gene product)  hypothetical protein  gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, • Bacillus and Synechococcus phage genomes  Add /note=“similar to gp43 of Escherichia coli phage T4” 21 July 2016 Phage Genomics - VoM 2016
  • 33. What do I call my gene product (i.e. protein)?  /product=“UboA”; “NrdA”; “hypothetical protein SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”; "32 kDa protein”  Bad because they don`t mean anything to the casual (or informed) reader.  Unless you are a bioinformatician or biostatistician be conservative in recording “hits.” Could you convince your grandmda?, if not list as a “hypothetical protein” but do take a stand “putative DNA polymerase” is cowardly 21 July 2016 Phage Genomics - VoM 2016
  • 34. Nomenclature Sins  hypothetical protein  DNA polymerase with no or poor quality evidence is far worse than:  DNA polymerase  hypothetical protein  Be cautious about using BLASTP hits in naming gps – is there additional evidence to back the designation up 21 July 2016 Phage Genomics - VoM 2016
  • 35. Consistent Nomenclature  All of these describe homologs of the product of the coliphage T4 rIIA gene! rIIA protector from prophage-induced early lysis protector from prophage-induced early lysis protector from prophage-induced early lysis rIIA membrane-associated affects host membrane ATPase rIIA membrane-associated affects host membrane ATPase phage rIIA lysis inhibitor rIIA protector rIIA rIIA protein membrane integrity protector hypothetical protein unnamed protein product !!!!!! protein of unknown function 21 July 2016 Phage Genomics - VoM 2016
  • 36. Bottom line: Manual vs. Automated • “Turtles know the road better than rabbits… ” Khalil Gibran • “… but they may never reach the end!” • The best approach? – Human expert-based annotation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 37. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 39. Genomic pairwise comparisons  EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi- bin/emboss/stretcher N.B. genomes must be collinear  BLASTN - NCBI  ANI (Average Nucleotide Identity):http://enve- omics.ce.gatech.edu/ani/  GGDC 2.0 (Genome to Genome Distance Calculator): http://ggdc.dsmz.de/distcalc2.php  jSpeciesWS – ANI:http://jspecies.ribohost.com/jspeciesws/
  • 40. Proteomic pairwise comparisons  CoreGenes – (http://binf.gmu.edu:8080/CoreGenes3.0/)  TBLASTX  Remember protein sequence is more conserved than DNA sequence; probably useful for more distant relationships
  • 41. VI. “POLISH” IT TO PUBLISH IT
  • 42. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 43. Servers & software  BLAST Ring Image Generator (http://brig.sourceforge.net)  CGView (http://wishart.biology.ualberta.ca/cgview)  CGView Comparison Tool: http://stothard.afns.ualberta.ca/downloads/CCT  Circos (http://circos.ca)  DNAPlotter: (http://www.sanger.ac.uk/science/tools/dnaplotter)  Easyfig (http://easyfig.sourceforge.net)  GenomeVx (http://wolfe.ucd.ie/GenomeVx)  GView Server (https://server.gview.ca)  progressiveMauve and ACT
  • 46. BLAST Ring Image Generator

Notes de l'éditeur

  1. Gp200 from Pseudomonas phage 201phi2-1 is related to phiKZ gp120 and EL gp78
  2. "Shifting the genomic gold standard for the prokaryotic species definition" Michael Richter and Ramon Rosselló-Móra. PNAS vol. 106 no. 45 pg 19126–19131, doi: 10.1073/pnas.0906412106 JSpeciesWS is a quick and easy to use online service to measure the probability if two or more (draft) genomes belong to the same species or not by pairwise comparison of (1) their Average Nucleotide Identity (ANI) and/or (2) correlation indexes of their Tetra-nucleotide signatures.
  3. Star - online