SlideShare une entreprise Scribd logo
1  sur  96
GenoTHREAT
A biosecurity software to screen DNA
synthesis orders against pathogens
GBCB seminar
Laura Adam
10/07/2014
7/10/2014 GenoTHREAT 2
7/10/2014 GenoTHREAT 3
(2005) Science, 310(5745), 77. AAAS..
7/10/2014 GenoTHREAT 4
7/10/2014 GenoTHREAT 5
http://www.washingtonpost.com/wp-srv/nation/daily/graphics/wmdbio_123004.html
CURRENT REGULATIONS
7/10/2014 GenoTHREAT 6
The Gene Synthesis Industry
7/10/2014 7GenoTHREAT
Industry Response to Dual Use
• 5 members (all based in
Germany)
• Undersigned by:
► 6 German or
German/American
► 2 Chinese
• “Code of Conduct for Best
Practices in Gene Synthesis”
• 5 companies (American)
• 80% of worldwide synthesis
capacity
• “Harmonized Screening
Protocol”
7/10/2014 8GenoTHREAT
7/10/2014 GenoTHREAT 9
Major Sections:
Customer screening
Sequence screening
Record retention
Government contact
Our Primary Objectives
1. Interpret the (draft) guidance as an algorithm
2. Implement as a software: GenoTHREAT
3. Characterize screening efficacy
7/10/2014 10GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the
guidance
III. GenoTHREAT: implementation and
characterization
IV. Conclusions
7/10/2014 11GenoTHREAT
[Guidance] : Purpose
“[…] to minimize the risk that unauthorized individuals
or individuals with malicious intent will obtain “toxins
and agents of concern” through the use of nucleic
acid synthesis technologies, and to simultaneously
minimize any negative impacts on the conduct of
research and business operations.”
7/10/2014 12GenoTHREAT
[Guidance] : Goals of sequence screening
• Agent of concern?
• Select Agents and Toxins
• Sequences of concern?
• “dsDNA sequences derived from or encoding Select Agents and Toxins”
• Sequence unique to select agent
• No house-keeping genes
• Both DNA strands and the six-frames translation
• Detect any “sequence of concern”
• Embedded : as small as 200bps
 Use Best match approach (at least)
7/10/2014 13GenoTHREAT
7/10/2014 GenoTHREAT 14
7/10/2014 GenoTHREAT 15
[Guidance] : Major Points
1. Perform Six Frame Translation
2. Divide the query sequences into
subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
7/10/2014 16GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 17GenoTHREAT
[Algorithm] : Input a query DNA sequence to
screen
7/10/2014 GenoTHREAT 18
[Algorithm] : Six Frame translation
7/10/2014 19GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 20GenoTHREAT
[Algorithm] : Division
7/10/2014 21GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 22GenoTHREAT
[Algorithm] : What should we do with
subsequences?
7/10/2014 GenoTHREAT 23
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 24GenoTHREAT
[Algorithm] : BLAST subsequences
against entire Genbank database
7/10/2014 25GenoTHREAT
Basic Local Alignment Search Tool (BLAST)
• Developed at the U.S. National Center for
Biotechnology Information
• One of the most widely used bioinformatics tools
• Aligns query sequences against sequences in the
GenBank sequence database
• Algorithm emphasizes speed over sensitivity
7/10/2014 26GenoTHREAT
BLAST
Query Sequence
Database of sequences
Local alignment
7/10/2014 27GenoTHREAT
BLAST Output
Percent Identity
► The percentage of identical nucleotides (or amino acid) in
the sequence aligned
Query Coverage
► The length of sequence aligned
7/10/2014 28GenoTHREAT
[Algorithm] : What should we do with all
those results of BLAST?
7/10/2014 29GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 30GenoTHREAT
[Guidance] : The Best match approach
• Use local sequence alignment tool
• suggest Blast
• Best matches = greatest percent identity over
the entire fragment
• 66AA or 200bps fragments
7/10/2014 31GenoTHREAT
[Algorithm] : Identify Best Matches
7/10/2014 32GenoTHREAT
Best matches
Mus musculus
Mus musculus
BLAST results PI QC (%)
Mus musculus 100 100
Mus musculus 100 100
Danio rerio 97 100
Danio rerio 43 80
BLAST
[Example]
7/10/2014 GenoTHREAT 33
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 34GenoTHREAT
[Algorithm]: Determine nature of Best
Matches
7/10/2014 35GenoTHREAT
[Algorithm] : How can we know if a Best
Match is to a Select Agent or Toxin?
Problem: no suggestion in guidance
Solution:
keyword
and anti-keyword list
7/10/2014 36GenoTHREAT
BLAST
[Example] : Is this subsequence a
hit?
7/10/2014 GenoTHREAT 37
BLAST results PI QC (%)
Bacillus anthracis 100 100
Bacillus anthracis str. Sterne 100 100
Danio rerio 97 100
Danio rerio 43 80
Best matches
Bacillus anthracis
Bacillus anthracis str. Sterne
[Example] : Keyword vs. Anti-keyword
If a GenBank entry contains a keyword, then the
sequence is flagged
SA
7/10/2014 38GenoTHREAT
[Example] : Keyword vs. Anti-keyword
If a GenBank entry contains both a keyword and anti-
keyword, the order is not flagged
NSA
7/10/2014 39GenoTHREAT
[Algorithm] : When to flag the subsequence?
7/10/2014 40GenoTHREAT
QC (%)
100
100
100
80
Best matches
Mus musculus
Mus musculus
BLAST results Score
Mus musculus 100
Mus musculus 100
Danio rerio 97
Danio rerio 43
BLAST
[Example] : Is this subsequence a
hit?
7/10/2014 GenoTHREAT 41
QC (%)
100
100
100
80
Best matches
Lumpy skin disease virus
Sheeppox virus
BLAST results Score
Lumpy skin disease virus 100
Sheeppox virus 100
Goatpox virus 98
Dearpox virus 44
BLAST
[Example] : Is this subsequence a
hit?
7/10/2014 GenoTHREAT 42
QC (%)
100
100
100
80
Best matches
Bacillus anthracis
Bacillus cereus
BLAST results Score
Bacillus anthracis 100
Bacillus cereus 100
Plasmodium falciparum 63
Clostridium ljungdahlii 44
BLAST
[Example] : Is this subsequence a
hit?
7/10/2014 GenoTHREAT 43
[Guidance] :
« unique to
Select Agent »
!!!
[Algorithm] : No Best Matches…
7/10/2014 44GenoTHREAT
[Algorithm] : Points of the Guidance left to
interpretation
How do you identify sequences of concern of 200bp or
greater which partially span two adjacent
subsequences?
Problem: no suggestion in guidance
Solution: extension method
7/10/2014 45GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 GenoTHREAT 46
[Algorithm] : Extension Method
7/10/2014 47GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 48GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 49GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 50GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 51GenoTHREAT
Extend to
meet possible
alignments
120bp 80bp
120bp80bp
New subsequence
[Algorithm] : Extension Method
7/10/2014 52GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 53GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 54GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 55GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 56GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 57GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translation
2. Divide the query sequences into subsequences of 200bp or 66aa
3. For each subsequence
i. BLAST
ii. Best Matches
iii. Flag if SAT
4. Automatic decision
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 58GenoTHREAT
[Algorithm] : Recap
7/10/2014 59GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
IV. Conclusions
7/10/2014 60GenoTHREAT
Using BLAST
Online BLAST
Performs BLAST via NCBI
website interface
► Faster per BLAST
► Computationally less
expensive
► Only sequential, due to NCBI
restrictions
► Lack of privacy
Local BLAST
Performs BLAST in parallel on
local machine
► User privacy
► Faster per sequence due to
parallelization
► Computational expensive
(Memory + CPU intensive )
7/10/2014 GenoTHREAT 61
Screening time & hardware
7/10/2014 GenoTHREAT 62
Online Desktop Business Class Server
Sequence length (bp) Screening time (min)*
2,000 2
10,000 12.5
*Screening performed using business class server
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
i. Database of test sequences
ii. Keyword list variation
iii. Detection of Potentially dangerous sequences
iv. BLAST parameters
v. Real world gene orders simulation
IV. Conclusions
7/10/2014 63GenoTHREAT
Database of Test Sequences
• Implementations must be compared to assess quality
• Standardized set of test sequences is needed
• Test Set contains 184 sequences:
• Select Agents
o Genes associated with toxins or pathogenicity
o Genes associated with normal function
• Model Organisms
64
7/10/2014 64GenoTHREAT
Database of Test Sequences
Contribute to the development of a standard test set
of sequences
65
7/10/2014 65GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
i. Database of test sequences
ii. Keyword list variation
iii. Detection of Potentially dangerous sequences
iv. BLAST parameters
v. Real world gene orders simulation
IV. Conclusions
7/10/2014 66GenoTHREAT
Keyword and Anti-Keyword list
• Test with the unmodified sequences (184 sequences)
• Two lists of keywords
• Limited
• extensive
• Plus
• anti-keyword list
• or not
7/10/2014 67GenoTHREAT
Keyword List Content Variation
7/10/2014 GenoTHREAT 68
0
20
40
60
80
100
120
Limited keywords Extensivekeywords
Correct SAT Correct NSAT
Keyword list method not mentioned in guidance
Limited keyword
list:
uniquely composed
of words in SAT List
Extensive keyword
list:
extension of limited
keyword list
containing words
uniquely related to
SAT.
Anti-Keywords
7/10/2014 69GenoTHREAT
Anti
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
i. Database of test sequences
ii. Keyword list variation
iii. Detection of potentially dangerous sequences
iv. BLAST parameters
v. Real world gene orders simulation
IV. Conclusions
7/10/2014 70GenoTHREAT
Modified Test Sequences
Modification performed on the initial unmodified
sequences
► Intervening sequences
► Degenerate sequences
► Mutated sequences (BLAST parameters)
7/10/2014 71GenoTHREAT
Degenerate Sequences
Potential Danger: Codon optimized nucleotide sequences
7/10/2014 GenoTHREAT 72
GATTTGGACACTCATTTCACC
DLDTHFT
Unmodified Nucleotide
Degenerate
NucleotideGATACGTCAACCTTTTAA
GC
Amino Acid
Sequence
Result: all codon optimized sequences detected due to screening of amino acid
sequences
Intervening sequences
Potential Danger: SAT sequences hidden within larger, benign sequences
300bps
NSAT
200bps
SAT
300bps
NSAT
300bps
NSAT
300bps
NSAT
250bps
SAT
7/10/2014 73GenoTHREAT
Result: All hidden sequences were detected
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
i. Database of test sequences
ii. Keyword list variation
iii. Detection of Potentially dangerous sequences
iv. BLAST parameters
v. Real world gene orders simulation
IV. Conclusions
7/10/2014 74GenoTHREAT
Mutated sequences
Potential Danger: mutated, but still active, SAT sequences which do not align to
GenBank entries
7/10/2014 75GenoTHREAT
Nucleotides subsequences
7/10/2014 76GenoTHREAT
Result: BLAST parameter settings affect screening capability
Amino-Acid subsequences
7/10/2014 77GenoTHREAT
Result: BLAST parameters do not clearly change the efficiency of the screening
Nucleotides subsequences
7/10/2014 78GenoTHREAT
Result: Direct relationship between screening time and ability to identify mutated
sequences
Amino-Acid subsequences
7/10/2014 79GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
1. Software implementation
2. Software Characterization
i. Database of test sequences
ii. Keyword list variation
iii. Detection of Potentially dangerous sequences
iv. BLAST parameters
v. Real world gene orders simulation
IV. Conclusions
7/10/2014 80GenoTHREAT
Real world gene orders simulation
Gene Synthesis company: low number of false hit
needed
1. iGEM registry
• Registry completed by iGEM teams each year
• Contains 10,000 sequences
2. GenoCAD database
• 1,258 sequences longer than 200 bp
7/10/2014 81GenoTHREAT
iGEM Registry
First step: screen registry sequences 1-->1724
Hit rate: 6.5%
Major causes of hits:
• 100% query coverage for Best Match too restrictive
• Some results have 100% query coverage but very low
Percent Identity
• Keyword list issues
7/10/2014 82GenoTHREAT
95%
60%
solved
2.9%
iGEM Registry
7/10/2014 GenoTHREAT 83
GenoCAD database
• 1,258 sequences
• 32 hits: 2.54%
• Manual review:
• YopH: protein from Y.pestis (gi|14488772)
7/10/2014 GenoTHREAT 84
Real world gene orders simulation
Hits left are due to:
• Very often: 1 subsequence of 1 Protein frame leads
to a correct hit
 Is it worth flagging the entire sequence?
• Sometimes: many subsequences leads to correct hits
 Probably worth flagging
7/10/2014 85GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementation and characterization
IV. Conclusions
7/10/2014 86GenoTHREAT
GenoTHREAT
• “Best Match”
• Hardware and software parameters
• Keyword list
• BLAST parameters
• Certain types of sequence modifications
• High-resolution screen
7/10/2014 87GenoTHREAT
Guidance conclusion
Government Guidance potentially usable by
companies:
• Reasonable time
• Good detection of sequences of concern
• Number of false hits potentially low (manual review)
7/10/2014 88GenoTHREAT
7/10/2014 GenoTHREAT 89
http://www.dagorret.net/2009/12/18/new-technology-
developed-by-microsoft-for-photography-dna-image/
http://www.wadsworth.org/testing/biodefense/education.shtml
7/10/2014 GenoTHREAT 90
© iGEM and Justin Knight.
7/10/2014 GenoTHREAT 91
7/10/2014 GenoTHREAT 92
7/10/2014 GenoTHREAT 93
7/10/2014 GenoTHREAT 94
A T A A C T C C C T G G G T C G T T A A A C C G G
C G G C T G C G G C A G T C T T A G C A T A A T A
A T C G G A T A G C A C T T T A T G A C C T G T C
G T C G G G G C A C T A A A T G A A C T A G T G G
C A G T A A C T G T C A G G C A G C A T A T A C A
A C G T T C A A A T A A C T G C A T A G A A C C C
A G A A T A A C T A C C A C C A C C G A A T C T T
T A T C C A G A C G A C T G C A T G A C T C G C T
T C T A C G A C G G T G A A T G A C G T T G G G T
T G C G T C G C A T G G T A C C T A C T T A A C T
T C G G T C G C T C A A T G A T C T G C A A A A G
A A T C G G C T A T T G G A C T C C T A G G C G C
G T C T T A T A T A T G C G G C G C T T T T A C G
A T C C G G A C A T A A T C T A A G G T A T C G T
A C G C G C G G G A A C A C G A G G T T G T A A C
A C C G T A G C T A T C T C A T G C A T T C C G A
C C A G C G G T T A T A T A A T A C T C G T T T T
T T C C G C G T G C C A T C A T A C G A C G C T G
G C C G C C G C G T T A G T G T C G T G T G T A C
A C A C C G A G T T A C C C T C C T T C G T T C G
C A C C A G C G T T A C T G C G T G T A G A G G A
A A T T G G C T T G A G A G C T T T G C C C C A C
C G C A C G A G G T A A C T A T T G A G A T C A G
T C T A C A G A G T G C A A T A C A C C A A C G C
http://sourceforge.net/projects/genothreat/
Acknowledgeme
nts
Dr. Jean Peccoud
Mandy L. Wilson
The VT-ENSIMAG iGEM team (2010):
Michael Kozar
Gaelle Letort
Olivier Mirat
Arunima Srivastava
Tyler Stewart
My PhD committee:
Dr. Bevan
Dr. Garner
Dr. Peccoud
Dr. Ramakrishnan
Dr. Setubal
7/10/2014 GenoTHREAT 95
7/10/2014 GenoTHREAT 96

Contenu connexe

Similaire à GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synthesis industry and the synthetic biology community.

171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
Accelerating Protein Research
Accelerating Protein ResearchAccelerating Protein Research
Accelerating Protein ResearchMatthias Harbers
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsAboul Ella Hassanien
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handlinghiratufail
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Sebastiano Panichella
 
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...john wilbanks
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Peter van Amsterdam
 
Supply chain design and operation
Supply chain design and operationSupply chain design and operation
Supply chain design and operationAngelainBay
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web applicationVasileiosMezaris
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxnikitha992646
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbookguest303321
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbookguest303321
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbookguest303321
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop finalMeng-Ru (Raymond) Tsai
 

Similaire à GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synthesis industry and the synthetic biology community. (20)

171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
The CRISPR/Cas9 Toolbox
The CRISPR/Cas9 ToolboxThe CRISPR/Cas9 Toolbox
The CRISPR/Cas9 Toolbox
 
Accelerating Protein Research
Accelerating Protein ResearchAccelerating Protein Research
Accelerating Protein Research
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm Algorithms
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video Summarization
 
SBCCI08
SBCCI08SBCCI08
SBCCI08
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
 
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
 
HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017
 
I045046066
I045046066I045046066
I045046066
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...
 
Kk341721880
Kk341721880Kk341721880
Kk341721880
 
Supply chain design and operation
Supply chain design and operationSupply chain design and operation
Supply chain design and operation
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptx
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbook
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbook
 
Protein Purification Handbook
Protein Purification HandbookProtein Purification Handbook
Protein Purification Handbook
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 

Dernier

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synthesis industry and the synthetic biology community.

  • 1. GenoTHREAT A biosecurity software to screen DNA synthesis orders against pathogens GBCB seminar Laura Adam 10/07/2014
  • 3. 7/10/2014 GenoTHREAT 3 (2005) Science, 310(5745), 77. AAAS..
  • 7. The Gene Synthesis Industry 7/10/2014 7GenoTHREAT
  • 8. Industry Response to Dual Use • 5 members (all based in Germany) • Undersigned by: ► 6 German or German/American ► 2 Chinese • “Code of Conduct for Best Practices in Gene Synthesis” • 5 companies (American) • 80% of worldwide synthesis capacity • “Harmonized Screening Protocol” 7/10/2014 8GenoTHREAT
  • 9. 7/10/2014 GenoTHREAT 9 Major Sections: Customer screening Sequence screening Record retention Government contact
  • 10. Our Primary Objectives 1. Interpret the (draft) guidance as an algorithm 2. Implement as a software: GenoTHREAT 3. Characterize screening efficacy 7/10/2014 10GenoTHREAT
  • 11. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 11GenoTHREAT
  • 12. [Guidance] : Purpose “[…] to minimize the risk that unauthorized individuals or individuals with malicious intent will obtain “toxins and agents of concern” through the use of nucleic acid synthesis technologies, and to simultaneously minimize any negative impacts on the conduct of research and business operations.” 7/10/2014 12GenoTHREAT
  • 13. [Guidance] : Goals of sequence screening • Agent of concern? • Select Agents and Toxins • Sequences of concern? • “dsDNA sequences derived from or encoding Select Agents and Toxins” • Sequence unique to select agent • No house-keeping genes • Both DNA strands and the six-frames translation • Detect any “sequence of concern” • Embedded : as small as 200bps  Use Best match approach (at least) 7/10/2014 13GenoTHREAT
  • 16. [Guidance] : Major Points 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision 7/10/2014 16GenoTHREAT
  • 17. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 17GenoTHREAT
  • 18. [Algorithm] : Input a query DNA sequence to screen 7/10/2014 GenoTHREAT 18
  • 19. [Algorithm] : Six Frame translation 7/10/2014 19GenoTHREAT
  • 20. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 20GenoTHREAT
  • 22. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 22GenoTHREAT
  • 23. [Algorithm] : What should we do with subsequences? 7/10/2014 GenoTHREAT 23
  • 24. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 24GenoTHREAT
  • 25. [Algorithm] : BLAST subsequences against entire Genbank database 7/10/2014 25GenoTHREAT
  • 26. Basic Local Alignment Search Tool (BLAST) • Developed at the U.S. National Center for Biotechnology Information • One of the most widely used bioinformatics tools • Aligns query sequences against sequences in the GenBank sequence database • Algorithm emphasizes speed over sensitivity 7/10/2014 26GenoTHREAT
  • 27. BLAST Query Sequence Database of sequences Local alignment 7/10/2014 27GenoTHREAT
  • 28. BLAST Output Percent Identity ► The percentage of identical nucleotides (or amino acid) in the sequence aligned Query Coverage ► The length of sequence aligned 7/10/2014 28GenoTHREAT
  • 29. [Algorithm] : What should we do with all those results of BLAST? 7/10/2014 29GenoTHREAT
  • 30. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 30GenoTHREAT
  • 31. [Guidance] : The Best match approach • Use local sequence alignment tool • suggest Blast • Best matches = greatest percent identity over the entire fragment • 66AA or 200bps fragments 7/10/2014 31GenoTHREAT
  • 32. [Algorithm] : Identify Best Matches 7/10/2014 32GenoTHREAT
  • 33. Best matches Mus musculus Mus musculus BLAST results PI QC (%) Mus musculus 100 100 Mus musculus 100 100 Danio rerio 97 100 Danio rerio 43 80 BLAST [Example] 7/10/2014 GenoTHREAT 33
  • 34. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 34GenoTHREAT
  • 35. [Algorithm]: Determine nature of Best Matches 7/10/2014 35GenoTHREAT
  • 36. [Algorithm] : How can we know if a Best Match is to a Select Agent or Toxin? Problem: no suggestion in guidance Solution: keyword and anti-keyword list 7/10/2014 36GenoTHREAT
  • 37. BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 37 BLAST results PI QC (%) Bacillus anthracis 100 100 Bacillus anthracis str. Sterne 100 100 Danio rerio 97 100 Danio rerio 43 80 Best matches Bacillus anthracis Bacillus anthracis str. Sterne
  • 38. [Example] : Keyword vs. Anti-keyword If a GenBank entry contains a keyword, then the sequence is flagged SA 7/10/2014 38GenoTHREAT
  • 39. [Example] : Keyword vs. Anti-keyword If a GenBank entry contains both a keyword and anti- keyword, the order is not flagged NSA 7/10/2014 39GenoTHREAT
  • 40. [Algorithm] : When to flag the subsequence? 7/10/2014 40GenoTHREAT
  • 41. QC (%) 100 100 100 80 Best matches Mus musculus Mus musculus BLAST results Score Mus musculus 100 Mus musculus 100 Danio rerio 97 Danio rerio 43 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 41
  • 42. QC (%) 100 100 100 80 Best matches Lumpy skin disease virus Sheeppox virus BLAST results Score Lumpy skin disease virus 100 Sheeppox virus 100 Goatpox virus 98 Dearpox virus 44 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 42
  • 43. QC (%) 100 100 100 80 Best matches Bacillus anthracis Bacillus cereus BLAST results Score Bacillus anthracis 100 Bacillus cereus 100 Plasmodium falciparum 63 Clostridium ljungdahlii 44 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 43 [Guidance] : « unique to Select Agent » !!!
  • 44. [Algorithm] : No Best Matches… 7/10/2014 44GenoTHREAT
  • 45. [Algorithm] : Points of the Guidance left to interpretation How do you identify sequences of concern of 200bp or greater which partially span two adjacent subsequences? Problem: no suggestion in guidance Solution: extension method 7/10/2014 45GenoTHREAT
  • 46. [Algorithm] : Extension Method 7/10/2014 GenoTHREAT 46
  • 47. [Algorithm] : Extension Method 7/10/2014 47GenoTHREAT
  • 48. [Algorithm] : Extension Method 7/10/2014 48GenoTHREAT
  • 49. [Algorithm] : Extension Method 7/10/2014 49GenoTHREAT
  • 50. [Algorithm] : Extension Method 7/10/2014 50GenoTHREAT
  • 51. [Algorithm] : Extension Method 7/10/2014 51GenoTHREAT
  • 52. Extend to meet possible alignments 120bp 80bp 120bp80bp New subsequence [Algorithm] : Extension Method 7/10/2014 52GenoTHREAT
  • 53. [Algorithm] : Extension Method 7/10/2014 53GenoTHREAT
  • 54. [Algorithm] : Extension Method 7/10/2014 54GenoTHREAT
  • 55. [Algorithm] : Extension Method 7/10/2014 55GenoTHREAT
  • 56. [Algorithm] : Extension Method 7/10/2014 56GenoTHREAT
  • 57. [Algorithm] : Extension Method 7/10/2014 57GenoTHREAT
  • 58. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 58GenoTHREAT
  • 60. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization IV. Conclusions 7/10/2014 60GenoTHREAT
  • 61. Using BLAST Online BLAST Performs BLAST via NCBI website interface ► Faster per BLAST ► Computationally less expensive ► Only sequential, due to NCBI restrictions ► Lack of privacy Local BLAST Performs BLAST in parallel on local machine ► User privacy ► Faster per sequence due to parallelization ► Computational expensive (Memory + CPU intensive ) 7/10/2014 GenoTHREAT 61
  • 62. Screening time & hardware 7/10/2014 GenoTHREAT 62 Online Desktop Business Class Server Sequence length (bp) Screening time (min)* 2,000 2 10,000 12.5 *Screening performed using business class server
  • 63. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 63GenoTHREAT
  • 64. Database of Test Sequences • Implementations must be compared to assess quality • Standardized set of test sequences is needed • Test Set contains 184 sequences: • Select Agents o Genes associated with toxins or pathogenicity o Genes associated with normal function • Model Organisms 64 7/10/2014 64GenoTHREAT
  • 65. Database of Test Sequences Contribute to the development of a standard test set of sequences 65 7/10/2014 65GenoTHREAT
  • 66. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 66GenoTHREAT
  • 67. Keyword and Anti-Keyword list • Test with the unmodified sequences (184 sequences) • Two lists of keywords • Limited • extensive • Plus • anti-keyword list • or not 7/10/2014 67GenoTHREAT
  • 68. Keyword List Content Variation 7/10/2014 GenoTHREAT 68 0 20 40 60 80 100 120 Limited keywords Extensivekeywords Correct SAT Correct NSAT Keyword list method not mentioned in guidance Limited keyword list: uniquely composed of words in SAT List Extensive keyword list: extension of limited keyword list containing words uniquely related to SAT.
  • 70. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 70GenoTHREAT
  • 71. Modified Test Sequences Modification performed on the initial unmodified sequences ► Intervening sequences ► Degenerate sequences ► Mutated sequences (BLAST parameters) 7/10/2014 71GenoTHREAT
  • 72. Degenerate Sequences Potential Danger: Codon optimized nucleotide sequences 7/10/2014 GenoTHREAT 72 GATTTGGACACTCATTTCACC DLDTHFT Unmodified Nucleotide Degenerate NucleotideGATACGTCAACCTTTTAA GC Amino Acid Sequence Result: all codon optimized sequences detected due to screening of amino acid sequences
  • 73. Intervening sequences Potential Danger: SAT sequences hidden within larger, benign sequences 300bps NSAT 200bps SAT 300bps NSAT 300bps NSAT 300bps NSAT 250bps SAT 7/10/2014 73GenoTHREAT Result: All hidden sequences were detected
  • 74. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 74GenoTHREAT
  • 75. Mutated sequences Potential Danger: mutated, but still active, SAT sequences which do not align to GenBank entries 7/10/2014 75GenoTHREAT
  • 76. Nucleotides subsequences 7/10/2014 76GenoTHREAT Result: BLAST parameter settings affect screening capability
  • 77. Amino-Acid subsequences 7/10/2014 77GenoTHREAT Result: BLAST parameters do not clearly change the efficiency of the screening
  • 78. Nucleotides subsequences 7/10/2014 78GenoTHREAT Result: Direct relationship between screening time and ability to identify mutated sequences
  • 80. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 80GenoTHREAT
  • 81. Real world gene orders simulation Gene Synthesis company: low number of false hit needed 1. iGEM registry • Registry completed by iGEM teams each year • Contains 10,000 sequences 2. GenoCAD database • 1,258 sequences longer than 200 bp 7/10/2014 81GenoTHREAT
  • 82. iGEM Registry First step: screen registry sequences 1-->1724 Hit rate: 6.5% Major causes of hits: • 100% query coverage for Best Match too restrictive • Some results have 100% query coverage but very low Percent Identity • Keyword list issues 7/10/2014 82GenoTHREAT 95% 60% solved 2.9%
  • 84. GenoCAD database • 1,258 sequences • 32 hits: 2.54% • Manual review: • YopH: protein from Y.pestis (gi|14488772) 7/10/2014 GenoTHREAT 84
  • 85. Real world gene orders simulation Hits left are due to: • Very often: 1 subsequence of 1 Protein frame leads to a correct hit  Is it worth flagging the entire sequence? • Sometimes: many subsequences leads to correct hits  Probably worth flagging 7/10/2014 85GenoTHREAT
  • 86. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 86GenoTHREAT
  • 87. GenoTHREAT • “Best Match” • Hardware and software parameters • Keyword list • BLAST parameters • Certain types of sequence modifications • High-resolution screen 7/10/2014 87GenoTHREAT
  • 88. Guidance conclusion Government Guidance potentially usable by companies: • Reasonable time • Good detection of sequences of concern • Number of false hits potentially low (manual review) 7/10/2014 88GenoTHREAT
  • 90. 7/10/2014 GenoTHREAT 90 © iGEM and Justin Knight.
  • 94. 7/10/2014 GenoTHREAT 94 A T A A C T C C C T G G G T C G T T A A A C C G G C G G C T G C G G C A G T C T T A G C A T A A T A A T C G G A T A G C A C T T T A T G A C C T G T C G T C G G G G C A C T A A A T G A A C T A G T G G C A G T A A C T G T C A G G C A G C A T A T A C A A C G T T C A A A T A A C T G C A T A G A A C C C A G A A T A A C T A C C A C C A C C G A A T C T T T A T C C A G A C G A C T G C A T G A C T C G C T T C T A C G A C G G T G A A T G A C G T T G G G T T G C G T C G C A T G G T A C C T A C T T A A C T T C G G T C G C T C A A T G A T C T G C A A A A G A A T C G G C T A T T G G A C T C C T A G G C G C G T C T T A T A T A T G C G G C G C T T T T A C G A T C C G G A C A T A A T C T A A G G T A T C G T A C G C G C G G G A A C A C G A G G T T G T A A C A C C G T A G C T A T C T C A T G C A T T C C G A C C A G C G G T T A T A T A A T A C T C G T T T T T T C C G C G T G C C A T C A T A C G A C G C T G G C C G C C G C G T T A G T G T C G T G T G T A C A C A C C G A G T T A C C C T C C T T C G T T C G C A C C A G C G T T A C T G C G T G T A G A G G A A A T T G G C T T G A G A G C T T T G C C C C A C C G C A C G A G G T A A C T A T T G A G A T C A G T C T A C A G A G T G C A A T A C A C C A A C G C http://sourceforge.net/projects/genothreat/
  • 95. Acknowledgeme nts Dr. Jean Peccoud Mandy L. Wilson The VT-ENSIMAG iGEM team (2010): Michael Kozar Gaelle Letort Olivier Mirat Arunima Srivastava Tyler Stewart My PhD committee: Dr. Bevan Dr. Garner Dr. Peccoud Dr. Ramakrishnan Dr. Setubal 7/10/2014 GenoTHREAT 95