SlideShare une entreprise Scribd logo
1  sur  16
Real-time Analysis of
Next Generation Sequencing Data

                    World Health Summit
                            Oct 24, 2012
               Prof. Dr. Christoph Meinel
                   Matthieu Schapranow
                 Hasso Plattner Institute
Genome Sequencing:
    Do you have enough time?
2




         Image taken from http://portal.ccg.uni-koeln.de/ccg/assets/images/3730.jpg


    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
The Archon Genomics X Prize
3


    !
            “$10 million will be awarded to the first team
            to   rapidly,   accurately   and   economically
            sequence 100 whole human genomes to an
            unprecedented level of accuracy.”

            !
                                      (Archon Genomics X Prize, 2012)!




    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Agenda
4


      ■  Conventional Medicine
      ■  Personalized Medicine
      ■  Challenges of Genome Data Analysis
      ■  High-Performance In-Memory Genome Project




    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Conventional Medicine
5

                                               Women                                           Will
                                                                                               Develop
                                                                                               Cancer

                                                   Men                                         Will Never
                                                                                               Delop
                                                                                               Cancer
                                                         0%        50% 100%
                                                            American Cancer Society, Surveillance Research, 2012

                                                                         Chemotherapies




                                                                                                   Fail
                                                                                                   Work



    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Personalized Medicine
6


           “Personalized medicine aims at treating patients
       specifically based on their individual dispositions, e.g.
                   genetic or environmental factors”!
               (K. Jain, Textbook of Personalized Medicine. Springer, 2009)!


                Enhanced by                                  Limiting Factor
             World-wide medical                 Research results in heterogeneously
             research activities                 formatted in distributed databases




    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Personalized Medicine
7   Patient suffering                          Conventional Therapy                           Treatment
     from Cancer                                                                               Decision


                                DNA                               Analysis of
                             Sequencing                          Genomic Data
                   •    Quantity: 3.2 Billion Base Pairs   •    Quantity:
                   •    Data Size: 1-20 GB                       •  Known Mutations: 80M
                                                                 •  Distinct Genes: 20k-25k
                                                                 •  Proteins: 50k-300k
                                                           •    Data Sizes:
                                                                 •  Alignment: 5-10 GB
                                                                 •  Variants: 10-100 GB
       Personalized Medicine



                        As of Today
              Supported by HPI

                                           0         10         20       30        40   Duration [Days]

    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Challenges of Genome Data Analysis
8
                                         Analysis of Genomic
                                                Data




                                  Alignment and                 Analysis of Annotations
                                  Variant Calling                 in World-wide DBs
        Bound To                 CPU Performance                      Memory Capacity
         Duration                        Hours                                Weeks
             HPI                        Minutes                             Real-time
                                      Multi-Core                Partitioning & Compression
       In-Memory
       Technology

    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Challenges of Genome Data Analysis
9
                                         Analysis of Genomic
                                                Data




                                  Alignment and                 Analysis of Annotations
                                  Variant Calling                 in World-wide DBs
        Bound To                 CPU Performance                      Memory Capacity
         Duration                        Hours                                Weeks
             HPI                        Minutes                             Real-time
                                      Multi-Core                Partitioning & Compression
       In-Memory
       Technology

    Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
High-Performance In-Memory Genome Project
     Real-time Analysis of Genome Data
10


       ■ 




     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
High-Performance In-Memory Genome Project
     It is your time
11     ■  ~10G FASTQ files resp. ~45M reads from 1k genome project
       ■  ~400k-700k variants detected BWA, Bowtie, Bowtie2, TMAP
       ■  ~45 min for alignment and variant calling
       ■  Analysis of result: Interactive exploration in real-time




     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
High-Performance In-Memory Genome Project
     Architecture
12




     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
High-Performance In-Memory Genome Project
     In-Memory Technology
                                                           ●
                                                              ●
                                                            ●
                                                               ●



                                           Read Event
                                           Read Event                   Verification
                                                                        Verification
                                           Repositories
                                           Repositories                  Services
                                                                         Services

                                    up to 8.000 read
                                    up to 8.000 read                         up to 2.000
                                                                             up to 2.000
                                   event notifications
                                   event notifications                         requests
                                                                              requests
                                      per second                             per second
13
                                      per second                             per second
         +       Combined                                                   Minimal                   Any attribute
                                                        Discovery Service
                 column
                                                        Discovery Service
                                                                            projections               as index
                 and row store
                 Insert only
                                                                                                      Multi-core/
     +           for time travel                                            Bulk load
     +++                                                                                              parallelization
                                                          SAP HANA
                                                          SAP HANA

                                                                    P       A
                 Active/passive                                     P       A
                                                                                                      Lightweight
     A       P   data store                                                 Partitioning
                                                                                                      Compression
                 Dynamic                                                                        SQL
                                                                            Analytics on              SQL interface
                 multi-                                                     historical
                 threading                                          t                                 on columns &
                                                                            data                      rows
                 within nodes
                 No aggregate                                               Single and                Reduction of
                                                                                            x
                 tables                                                     multi-tenancy
                                                                                            x
                                                                                                      layers
                                                                            Object to
     +++         On-the-fly                                                                           Text Retrieval
                 extensibility
                                                                            relational          T     and Extraction
                                                                            mapping

                  Map                                                       Group Key                 No disk
                  reduce



     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
High-Performance In-Memory Genome Project
     Hardware Characteristics at FSOC-Lab
14


       ■  1,000 core cluster at
          Hasso Plattner Institute with
          25 TB main memory
       ■  Consists of 25 nodes, each:
             □  40 cores
             □  1 TB main memory
             □  Intel® Xeon® E7- 4870
             □  2.40GHz
             □  30 MB Cache




     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
What to take home?
15

     Sequencing machines become faster, smaller,
     cheaper, and generate immense data sets in
     heterogeneous formats

       ■  IT technology is the key to explore and
          analyze these big data sets
       ■  Parallelization reduces time for processing of genome data
       ■  In-memory technology enables real-time analysis and interactive
          exploration of genome data
       ■  We integrate research results from int’l research databases in a
          single knowledge base

        “Let’s identify genomic roots and optimal treatments before
                  the patient wakes up from anaesthesia”
     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
Thank you for your interest!
     Keep in contact with us.
16




      Prof. Dr. Christoph Meinel                                          Matthieu-P. Schapranow, M.Sc.
      office-meinel@hpi.uni-potsdam.de                                  schapranow@hpi.uni-potsdam.de
      http://www.hpi.uni-potsdam.de/meinel/team/christoph_meinel.html                 http://j.mp/schapranow




                                                                           Hasso Plattner Institute
                                                       Enterprise Platform & Integration Concepts
                                                                           Matthieu-P. Schapranow
                                                                             August-Bebel-Str. 88
                                                                         14482 Potsdam, Germany

     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012

Contenu connexe

Similaire à Real-time Analysis of Next Generation Sequencing Data

Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
 
Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationWolfgang G. Hoeck
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelIntel IT Center
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsMelanie Swan
 
Centre for Genomic Regulation Talk February 2024.pptx
Centre for Genomic Regulation Talk February 2024.pptxCentre for Genomic Regulation Talk February 2024.pptx
Centre for Genomic Regulation Talk February 2024.pptxNick Brown
 
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...Intel IT Center
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Conference – iHT2
 
Bio it worldexpoeurope2012_shublaq
Bio it worldexpoeurope2012_shublaqBio it worldexpoeurope2012_shublaq
Bio it worldexpoeurope2012_shublaqNour Shublaq
 
Next-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxNext-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxSwetaTripathi13
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingIncedo
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data viaNeuroscience Information Framework
 

Similaire à Real-time Analysis of Next Generation Sequencing Data (20)

Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
 
Introduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHRIntroduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHR
 
Ngs update
Ngs updateNgs update
Ngs update
 
Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– Presentation
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at Intel
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance Genomics
 
Centre for Genomic Regulation Talk February 2024.pptx
Centre for Genomic Regulation Talk February 2024.pptxCentre for Genomic Regulation Talk February 2024.pptx
Centre for Genomic Regulation Talk February 2024.pptx
 
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...
Developing tools & Methodologies for the NExt Generation of Genomics & Bio In...
 
Wp3
Wp3Wp3
Wp3
 
Tech Forum FJMS
Tech Forum FJMSTech Forum FJMS
Tech Forum FJMS
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
 
Can scan final 2012 berkeley
Can scan final 2012 berkeleyCan scan final 2012 berkeley
Can scan final 2012 berkeley
 
Bio it worldexpoeurope2012_shublaq
Bio it worldexpoeurope2012_shublaqBio it worldexpoeurope2012_shublaq
Bio it worldexpoeurope2012_shublaq
 
Next-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxNext-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptx
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Dna chip
Dna chipDna chip
Dna chip
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 

Plus de Matthieu Schapranow

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticeMatthieu Schapranow
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?Matthieu Schapranow
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthMatthieu Schapranow
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Matthieu Schapranow
 

Plus de Matthieu Schapranow (20)

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 

Dernier

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Dernier (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

Real-time Analysis of Next Generation Sequencing Data

  • 1. Real-time Analysis of Next Generation Sequencing Data World Health Summit Oct 24, 2012 Prof. Dr. Christoph Meinel Matthieu Schapranow Hasso Plattner Institute
  • 2. Genome Sequencing: Do you have enough time? 2 Image taken from http://portal.ccg.uni-koeln.de/ccg/assets/images/3730.jpg Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 3. The Archon Genomics X Prize 3 ! “$10 million will be awarded to the first team to rapidly, accurately and economically sequence 100 whole human genomes to an unprecedented level of accuracy.”
 ! (Archon Genomics X Prize, 2012)! Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 4. Agenda 4 ■  Conventional Medicine ■  Personalized Medicine ■  Challenges of Genome Data Analysis ■  High-Performance In-Memory Genome Project Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 5. Conventional Medicine 5 Women Will Develop Cancer Men Will Never Delop Cancer 0% 50% 100% American Cancer Society, Surveillance Research, 2012 Chemotherapies Fail Work Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 6. Personalized Medicine 6 “Personalized medicine aims at treating patients specifically based on their individual dispositions, e.g. genetic or environmental factors”! (K. Jain, Textbook of Personalized Medicine. Springer, 2009)! Enhanced by Limiting Factor World-wide medical Research results in heterogeneously research activities formatted in distributed databases Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 7. Personalized Medicine 7 Patient suffering Conventional Therapy Treatment from Cancer Decision DNA Analysis of Sequencing Genomic Data •  Quantity: 3.2 Billion Base Pairs •  Quantity: •  Data Size: 1-20 GB •  Known Mutations: 80M •  Distinct Genes: 20k-25k •  Proteins: 50k-300k •  Data Sizes: •  Alignment: 5-10 GB •  Variants: 10-100 GB Personalized Medicine As of Today Supported by HPI 0 10 20 30 40 Duration [Days] Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 8. Challenges of Genome Data Analysis 8 Analysis of Genomic Data Alignment and Analysis of Annotations Variant Calling in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours Weeks HPI Minutes Real-time Multi-Core Partitioning & Compression In-Memory Technology Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 9. Challenges of Genome Data Analysis 9 Analysis of Genomic Data Alignment and Analysis of Annotations Variant Calling in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours Weeks HPI Minutes Real-time Multi-Core Partitioning & Compression In-Memory Technology Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 10. High-Performance In-Memory Genome Project Real-time Analysis of Genome Data 10 ■  Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 11. High-Performance In-Memory Genome Project It is your time 11 ■  ~10G FASTQ files resp. ~45M reads from 1k genome project ■  ~400k-700k variants detected BWA, Bowtie, Bowtie2, TMAP ■  ~45 min for alignment and variant calling ■  Analysis of result: Interactive exploration in real-time Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 12. High-Performance In-Memory Genome Project Architecture 12 Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 13. High-Performance In-Memory Genome Project In-Memory Technology ● ● ● ● Read Event Read Event Verification Verification Repositories Repositories Services Services up to 8.000 read up to 8.000 read up to 2.000 up to 2.000 event notifications event notifications requests requests per second per second 13 per second per second + Combined Minimal Any attribute Discovery Service column Discovery Service projections as index and row store Insert only Multi-core/ + for time travel Bulk load +++ parallelization SAP HANA SAP HANA P A Active/passive P A Lightweight A P data store Partitioning Compression Dynamic SQL Analytics on SQL interface multi- historical threading t on columns & data rows within nodes No aggregate Single and Reduction of x tables multi-tenancy x layers Object to +++ On-the-fly Text Retrieval extensibility relational T and Extraction mapping Map Group Key No disk reduce Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 14. High-Performance In-Memory Genome Project Hardware Characteristics at FSOC-Lab 14 ■  1,000 core cluster at Hasso Plattner Institute with 25 TB main memory ■  Consists of 25 nodes, each: □  40 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 15. What to take home? 15 Sequencing machines become faster, smaller, cheaper, and generate immense data sets in heterogeneous formats ■  IT technology is the key to explore and analyze these big data sets ■  Parallelization reduces time for processing of genome data ■  In-memory technology enables real-time analysis and interactive exploration of genome data ■  We integrate research results from int’l research databases in a single knowledge base “Let’s identify genomic roots and optimal treatments before the patient wakes up from anaesthesia” Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  • 16. Thank you for your interest! Keep in contact with us. 16 Prof. Dr. Christoph Meinel Matthieu-P. Schapranow, M.Sc. office-meinel@hpi.uni-potsdam.de schapranow@hpi.uni-potsdam.de http://www.hpi.uni-potsdam.de/meinel/team/christoph_meinel.html http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012