SlideShare une entreprise Scribd logo
1  sur  12
Computational challenges to getting
NextGen sequence right: implications
to diagnostics and therapeutics
Progress in biomedical discovery has
been enabled by technological progress
•

New sequencing technology – 100s of genomes a day are now produced

•

Advances in software
•

Standard DNA analysis codes have emerged

•

New versions continuously released

•

Custom software developed for unconventional analysis

•

Development of analysis pipelines for automated analysis and compilation of results

•

Advances in computational hardware
•
•

•

Codes standardized on Intel processor based systems ease porting to new systems
Continuous advances in Intel product line enable us to easily “keep up”

The bottom line – With process advances and new Intel MIC processors we have seen
speedups from 1 genome/2 weeks to 50 genomes/day. It is straightforward to expand
hardware in response to computational demand
Computing is primarily done on a
machine we developed: SHADOWFAX
 A heterogeneous computing
environment for data intensive
computations
 ~2,524 CPUs, > 12TB RAM
(spectrum of Intel)
 8 Intel® Xeon® E5-2600/FPGA
hybrid core systems (in partnership
with Convey)
 ~0.8 PB Disk Arrays (DDN)
 100 PB Sun/Oracle tape storage
system
Computing is primarily done on a
machine we developed: SHADOWFAX

 With local synchronized copies of major
databases:
 Medline, arXiv, PubMed Central,
Genbank, SwissProt,
 1,000 Genomes Project,
 The Cancer Genome Atlas,
Wikipedia
 To meet the needs of applications that
demand HPC:
 deep sequencing assembly and
analysis, molecular modeling,
simulations, proteomics analysis,
text mining, Health IT
NextGen DNA sequence analysis is now
the rate limiting step
•

The cost of sequencing has dropped from $3B/genome to ~$1K/genome.
•
•

•

New genomes are sequenced daily.
It is estimated that there are 30,000 human genomes complete, with 15,000
of these in the public domain.

Analysis has focused on on Single Nucleotide Polymorphisms (“ SNPs”), which
are single letter changes in the DNA code.

•

For complex diseases like cancer, heart disease and mental disorders,
extensive work has still only explains 10-20% of the known genetic
component.

•

Recent research indicates that do to experimental measurement noise,
perhaps most of the measured variations are false positives.
Microsatellites, or repetitive DNA
sequences are particularly challenging
•

Microsatellites, also called Simple Sequence Repeats or Short Tandem
Repeats, are an understudied portion of genome; because they are considered
part of our “Junk DNA” or more recently “Dark Matter” DNA; research focus
has been on Single Nucleotide Polymorphisms (“ SNPs”)

•

Microsatellites have known value: long used for paternity and forensic testing
and linked to neurological diseases (e.g. Huntington’s and Fragile-X)

•

None of major genomic research projects have focused on Microsatellites: not
Human Genome Project, 1000 Genome Project, The Cancer Genome Atlas,
ENCODE or the iCOGS study.
Genomeon’s Research Methodology
Download and rebuild thousands
of “healthy and “affected”
genomes

Create genotype distributions for
“healthy” and “affected”
populations

Compute Fishers Exact Test pvalue for each of ~1 million loci
and rank results
Identify “Patterns of Informative
Microsatellites” (PIM) from loci
that pass Bonferroni and
Benjamini–Hochberg False
Discovery Rate tests

Manually review, do QC, compute
sensitivity and specificity

Annotate with ontologies,
literature, input from experts

Validate PIM with
sequencing of wellcharacterized samples

Business analysis;
product definition; IP

Publish; translate, regulatory approval,
reimbursement; team with established
clinical services co.
Genomeon has created a unique library of
over 7700 genomes from 1000 Genomes
Project and The Cancer Genome Atlas with
corrected microsatellites
• “Healthy Population” representing many ethnicities
• Ovarian cancer
• Breast cancer
• Brain cancer: Glioma; Glioblastoma; Medulloblastoma
• Lung adenocarcinoma

• Prostate cancer
• Melanoma
• Autism
Breast Cancer
Pattern of 55 informative microsatellites
differentiates Breast Cancer germlines from
healthy germlines
Sensitivity = 84%
Specificity = 87%

BRCA
positive
samples
Applications of these microsatellite loci
variations – Microsatellite profiling for increased risk of cancer, and the
Cancer Risk Diagnostics
tissues at highest risk
Companion/Treatment Diagnostics - Many informative microsatellites are functional
elements implicated in therapeutic response
Clinical Trial Support - Use of microsatellite profile to differentiate sub-populations in
clinical trials

Drug Targets - Identification of large number of genes previously unassociated with cancer many with functions associated with cancer processes
Toxicology - Quantification of stress induced exposures via microsatellite mutation screen
Prognosis - Comparison of microsatellite variations between germlines and tumors
Non-cancer Diseases - PTSD, Autism, MS, cardiac diseases, aging
Thank you. Any Questions?

Contenu connexe

Tendances

DNA Technology
DNA TechnologyDNA Technology
DNA Technology
mgsonline
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
Dr. Gerry Higgins
 
Dna sequencing pp
Dna sequencing ppDna sequencing pp
Dna sequencing pp
libs6359
 
Genome Editing & Gene Therapy by Eric Kelsic
Genome Editing & Gene Therapy by Eric KelsicGenome Editing & Gene Therapy by Eric Kelsic
Genome Editing & Gene Therapy by Eric Kelsic
Impact.Tech
 

Tendances (20)

Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Clinical Assessment In Incorporating a Personal Genome
Clinical Assessment In Incorporating a Personal GenomeClinical Assessment In Incorporating a Personal Genome
Clinical Assessment In Incorporating a Personal Genome
 
DNA Technology
DNA TechnologyDNA Technology
DNA Technology
 
Ngs part ii 2013
Ngs part ii 2013Ngs part ii 2013
Ngs part ii 2013
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
 
AGCT - cell gene therapy for HIV cure
AGCT - cell gene therapy for HIV cureAGCT - cell gene therapy for HIV cure
AGCT - cell gene therapy for HIV cure
 
stem cell based gen therapy
stem cell based gen therapystem cell based gen therapy
stem cell based gen therapy
 
Human genome
Human genomeHuman genome
Human genome
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
Embed Repro Test
Embed Repro TestEmbed Repro Test
Embed Repro Test
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 
Nanoknife
NanoknifeNanoknife
Nanoknife
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014
 
Dna sequencing pp
Dna sequencing ppDna sequencing pp
Dna sequencing pp
 
Genome Editing & Gene Therapy by Eric Kelsic
Genome Editing & Gene Therapy by Eric KelsicGenome Editing & Gene Therapy by Eric Kelsic
Genome Editing & Gene Therapy by Eric Kelsic
 
Canopy BioSciences August 2017
Canopy BioSciences August 2017Canopy BioSciences August 2017
Canopy BioSciences August 2017
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 

En vedette

Li101 vendors
Li101 vendorsLi101 vendors
Li101 vendors
jleecbd
 
Laboratory Information Management System (LIMS)
Laboratory Information Management System (LIMS)Laboratory Information Management System (LIMS)
Laboratory Information Management System (LIMS)
mariam1020
 

En vedette (13)

Social Media for Business Success: Life Sciences January 2011
Social Media for Business Success: Life Sciences January 2011Social Media for Business Success: Life Sciences January 2011
Social Media for Business Success: Life Sciences January 2011
 
Li101 vendors
Li101 vendorsLi101 vendors
Li101 vendors
 
Global Laboratory Information Management System (LIMS) Market 2015-2019
Global Laboratory Information Management System (LIMS) Market 2015-2019Global Laboratory Information Management System (LIMS) Market 2015-2019
Global Laboratory Information Management System (LIMS) Market 2015-2019
 
HITC Workflow Webinar 4.24.12
HITC Workflow Webinar 4.24.12HITC Workflow Webinar 4.24.12
HITC Workflow Webinar 4.24.12
 
Biomarker Strategies
Biomarker StrategiesBiomarker Strategies
Biomarker Strategies
 
Odoo LIMS by LogicaSoft
Odoo LIMS by LogicaSoftOdoo LIMS by LogicaSoft
Odoo LIMS by LogicaSoft
 
LIMS - Workflow
LIMS - WorkflowLIMS - Workflow
LIMS - Workflow
 
LIMS Implementation
LIMS ImplementationLIMS Implementation
LIMS Implementation
 
Laboratory Information Management System
Laboratory Information Management SystemLaboratory Information Management System
Laboratory Information Management System
 
Laboratory Information Management System (LIMS)
Laboratory Information Management System (LIMS)Laboratory Information Management System (LIMS)
Laboratory Information Management System (LIMS)
 
2014 lims presentation
2014 lims presentation2014 lims presentation
2014 lims presentation
 
Progeny LIMS
Progeny LIMSProgeny LIMS
Progeny LIMS
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 

Similaire à Developing tools & Methodologies for the NExt Generation of Genomics & Bio Informatics

Similaire à Developing tools & Methodologies for the NExt Generation of Genomics & Bio Informatics (20)

2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
 
Genomics In Personal Care Product Development
Genomics In Personal Care Product DevelopmentGenomics In Personal Care Product Development
Genomics In Personal Care Product Development
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
Axt microarrays
Axt microarraysAxt microarrays
Axt microarrays
 
Día 19 - Noel Chen - Introducción a Novogene
Día 19 - Noel Chen - Introducción a Novogene Día 19 - Noel Chen - Introducción a Novogene
Día 19 - Noel Chen - Introducción a Novogene
 
Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
Biotechnology
Biotechnology Biotechnology
Biotechnology
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge
 
Analyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and VarcodeAnalyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and Varcode
 
Molecular profiling 2013
Molecular profiling 2013Molecular profiling 2013
Molecular profiling 2013
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptx
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
Whole Genome Sequencing .pptx
Whole Genome Sequencing .pptxWhole Genome Sequencing .pptx
Whole Genome Sequencing .pptx
 

Plus de Intel IT Center

Plus de Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Developing tools & Methodologies for the NExt Generation of Genomics & Bio Informatics

  • 1. Computational challenges to getting NextGen sequence right: implications to diagnostics and therapeutics
  • 2. Progress in biomedical discovery has been enabled by technological progress • New sequencing technology – 100s of genomes a day are now produced • Advances in software • Standard DNA analysis codes have emerged • New versions continuously released • Custom software developed for unconventional analysis • Development of analysis pipelines for automated analysis and compilation of results • Advances in computational hardware • • • Codes standardized on Intel processor based systems ease porting to new systems Continuous advances in Intel product line enable us to easily “keep up” The bottom line – With process advances and new Intel MIC processors we have seen speedups from 1 genome/2 weeks to 50 genomes/day. It is straightforward to expand hardware in response to computational demand
  • 3. Computing is primarily done on a machine we developed: SHADOWFAX  A heterogeneous computing environment for data intensive computations  ~2,524 CPUs, > 12TB RAM (spectrum of Intel)  8 Intel® Xeon® E5-2600/FPGA hybrid core systems (in partnership with Convey)  ~0.8 PB Disk Arrays (DDN)  100 PB Sun/Oracle tape storage system
  • 4. Computing is primarily done on a machine we developed: SHADOWFAX  With local synchronized copies of major databases:  Medline, arXiv, PubMed Central, Genbank, SwissProt,  1,000 Genomes Project,  The Cancer Genome Atlas, Wikipedia  To meet the needs of applications that demand HPC:  deep sequencing assembly and analysis, molecular modeling, simulations, proteomics analysis, text mining, Health IT
  • 5. NextGen DNA sequence analysis is now the rate limiting step • The cost of sequencing has dropped from $3B/genome to ~$1K/genome. • • • New genomes are sequenced daily. It is estimated that there are 30,000 human genomes complete, with 15,000 of these in the public domain. Analysis has focused on on Single Nucleotide Polymorphisms (“ SNPs”), which are single letter changes in the DNA code. • For complex diseases like cancer, heart disease and mental disorders, extensive work has still only explains 10-20% of the known genetic component. • Recent research indicates that do to experimental measurement noise, perhaps most of the measured variations are false positives.
  • 6. Microsatellites, or repetitive DNA sequences are particularly challenging • Microsatellites, also called Simple Sequence Repeats or Short Tandem Repeats, are an understudied portion of genome; because they are considered part of our “Junk DNA” or more recently “Dark Matter” DNA; research focus has been on Single Nucleotide Polymorphisms (“ SNPs”) • Microsatellites have known value: long used for paternity and forensic testing and linked to neurological diseases (e.g. Huntington’s and Fragile-X) • None of major genomic research projects have focused on Microsatellites: not Human Genome Project, 1000 Genome Project, The Cancer Genome Atlas, ENCODE or the iCOGS study.
  • 7. Genomeon’s Research Methodology Download and rebuild thousands of “healthy and “affected” genomes Create genotype distributions for “healthy” and “affected” populations Compute Fishers Exact Test pvalue for each of ~1 million loci and rank results Identify “Patterns of Informative Microsatellites” (PIM) from loci that pass Bonferroni and Benjamini–Hochberg False Discovery Rate tests Manually review, do QC, compute sensitivity and specificity Annotate with ontologies, literature, input from experts Validate PIM with sequencing of wellcharacterized samples Business analysis; product definition; IP Publish; translate, regulatory approval, reimbursement; team with established clinical services co.
  • 8. Genomeon has created a unique library of over 7700 genomes from 1000 Genomes Project and The Cancer Genome Atlas with corrected microsatellites • “Healthy Population” representing many ethnicities • Ovarian cancer • Breast cancer • Brain cancer: Glioma; Glioblastoma; Medulloblastoma • Lung adenocarcinoma • Prostate cancer • Melanoma • Autism
  • 10. Pattern of 55 informative microsatellites differentiates Breast Cancer germlines from healthy germlines Sensitivity = 84% Specificity = 87% BRCA positive samples
  • 11. Applications of these microsatellite loci variations – Microsatellite profiling for increased risk of cancer, and the Cancer Risk Diagnostics tissues at highest risk Companion/Treatment Diagnostics - Many informative microsatellites are functional elements implicated in therapeutic response Clinical Trial Support - Use of microsatellite profile to differentiate sub-populations in clinical trials Drug Targets - Identification of large number of genes previously unassociated with cancer many with functions associated with cancer processes Toxicology - Quantification of stress induced exposures via microsatellite mutation screen Prognosis - Comparison of microsatellite variations between germlines and tumors Non-cancer Diseases - PTSD, Autism, MS, cardiac diseases, aging
  • 12. Thank you. Any Questions?