SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Computing for the Analysis
           of Genomic Data at CRS4


                        Chris Jones
                         24th March 2010


                                           1
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?




                                              2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?




                                              2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?

        • 10 years of particle physics research at Oxford
          and CERN in Geneva




                                                            2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?

        • 10 years of particle physics research at Oxford
          and CERN in Geneva
        • Strong interest in the use of computers to do
          things, especially science, BETTER




                                                            2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?

        • 10 years of particle physics research at Oxford
          and CERN in Geneva
        • Strong interest in the use of computers to do
          things, especially science, BETTER
        • The ’70s brought digital detectors and an
          massive waves of new data to particle physics,
          causing exciting major changes of use of, and
          attitude towards computers



                                                            2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?

        • 10 years of particle physics research at Oxford
          and CERN in Geneva
        • Strong interest in the use of computers to do
          things, especially science, BETTER
        • The ’70s brought digital detectors and an
          massive waves of new data to particle physics,
          causing exciting major changes of use of, and
          attitude towards computers
        • 20 years of innovating, building, developing and
          running services in the CERN Computer Centre
          Facility
                                                             2
giovedì 25 marzo 2010
Who is Chris Jones?
                        Who is Chris Jones?

        • 10 years of particle physics research at Oxford
          and CERN in Geneva
        • Strong interest in the use of computers to do
          things, especially science, BETTER
        • The ’70s brought digital detectors and an
          massive waves of new data to particle physics,
          causing exciting major changes of use of, and
          attitude towards computers
        • 20 years of innovating, building, developing and
          running services in the CERN Computer Centre
          Facility
                                                             2
giovedì 25 marzo 2010
Wellcome Trust Genome Campus




                                                       3
giovedì 25 marzo 2010
Wellcome Trust Genome Campus


        • Escaped on sabbatical to European
          Bioinformatics Institute – EBI




                                                       3
giovedì 25 marzo 2010
Wellcome Trust Genome Campus


        • Escaped on sabbatical to European
          Bioinformatics Institute – EBI
        • Strong links to Sanger Institute




                                                       3
giovedì 25 marzo 2010
Wellcome Trust Genome Campus


        • Escaped on sabbatical to European
          Bioinformatics Institute – EBI
        • Strong links to Sanger Institute
        • And to Roche – Roche Genetics IT Plan




                                                       3
giovedì 25 marzo 2010
Wellcome Trust Genome Campus


        • Escaped on sabbatical to European
          Bioinformatics Institute – EBI
        • Strong links to Sanger Institute
        • And to Roche – Roche Genetics IT Plan
        • Founded the PRISM Forum




                                                       3
giovedì 25 marzo 2010
Wellcome Trust Genome Campus


        • Escaped on sabbatical to European
          Bioinformatics Institute – EBI
        • Strong links to Sanger Institute
        • And to Roche – Roche Genetics IT Plan
        • Founded the PRISM Forum




                                                       3
giovedì 25 marzo 2010
Why Sequence Genomes?

        • I hope Francesco has explained that very well
        • Genomic sequence is the most fundamental
          information, the starting point, when you look at
          how living objects work…
        • And studies of “genotype” versus “phenotype” can
          bring us an understanding of the origins of
          disease which has been completely out of reach
          until now
        • The technology is just becoming available…

                                                              5
giovedì 25 marzo 2010
DNA sequence and genes look
                        like…
         cacaattacttccacaaatgcagtt
         gaagcttctactcttcttgcatagg
         taacctgagtcggagcagttttcct
         cgtggcttcatctttggtgctggat
         cttcagcataccaatttgaaggtgc
         agtaaacgaaggcggtagaggacca
         agtatttgggataccttcacccata
         aatatccagaaaaaataagggatgg
         aagcaatgcagacatcacggttgc
                                                      6
giovedì 25 marzo 2010
The Human Genome




                                           7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine




                                                            7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine
        • It took 15 years for the first human genome sequence




                                                                7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine
        • It took 15 years for the first human genome sequence
        • Which was released between 2003 - 2005




                                                                7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine
        • It took 15 years for the first human genome sequence
        • Which was released between 2003 - 2005
        • There are 3*109 or 3 Gigabases in the human genome




                                                                7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine
        • It took 15 years for the first human genome sequence
        • Which was released between 2003 - 2005
        • There are 3*109 or 3 Gigabases in the human genome
        • Pine trees have ~10 times more bases ! Why?




                                                                7
giovedì 25 marzo 2010
The Human Genome

        • The nucleotide bases are:
          a- adenine, c- cytosine, g- guanine, t- thymine
        • It took 15 years for the first human genome sequence
        • Which was released between 2003 - 2005
        • There are 3*109 or 3 Gigabases in the human genome
        • Pine trees have ~10 times more bases ! Why?
        • Do not confuse Gb - bits, GB - Bytes, Gbases (Gb)!




                                                                7
giovedì 25 marzo 2010
Genome Analyzer IIx


                                      In Edificio 3
                                      Two GAIIx machines
                                      Each of which:
                                      40 Gbases / run
                                      Paired end reads
                                      4 Gbases / day
                                      but which are complex
                                       and forefront
                                       technology...
                                                               8
giovedì 25 marzo 2010
Genome Analyzer IIx


                                      In Edificio 3
                                      Two GAIIx machines
                                      Each of which:
                                      40 Gbases / run
                                      Paired end reads
                                      4 Gbases / day
                                      but which are complex
                                       and forefront
                                       technology...
                                                               8
giovedì 25 marzo 2010
Genome Analyzer IIx
               Preparation Workflow


       Sample Prep




                                                 Pipeline Analysis



                                                                     9
giovedì 25 marzo 2010
Genome Analyzer IIx
                        FlowCell




         8 Lanes
         120 Tiles (2 cols 60 tiles)
         4 Pictures per tile (A-T-G-C fluos)
         On each tile ~220k clusters


                                                     10
giovedì 25 marzo 2010
How much data per run?




                                                 11
giovedì 25 marzo 2010
How much data per run?

       • 7.3 MBytes image data per tile * 120 tiles * 8
         lanes = 7 000 Mbytes = 7 GigaBytes




                                                          11
giovedì 25 marzo 2010
How much data per run?

       • 7.3 MBytes image data per tile * 120 tiles * 8
         lanes = 7 000 Mbytes = 7 GigaBytes
       • * 4 bases per read * read length (say 100) = 2
         800 GBytes or 2.8 TeraBytes (TB)




                                                          11
giovedì 25 marzo 2010
How much data per run?

       • 7.3 MBytes image data per tile * 120 tiles * 8
         lanes = 7 000 Mbytes = 7 GigaBytes
       • * 4 bases per read * read length (say 100) = 2
         800 GBytes or 2.8 TeraBytes (TB)
       • * 2 for the paired end = 5.6 TBytes




                                                          11
giovedì 25 marzo 2010
How much data per run?

       • 7.3 MBytes image data per tile * 120 tiles * 8
         lanes = 7 000 Mbytes = 7 GigaBytes
       • * 4 bases per read * read length (say 100) = 2
         800 GBytes or 2.8 TeraBytes (TB)
       • * 2 for the paired end = 5.6 TBytes
       • A run of ~1 week on both machines results
         in 11.2 TeraBytes of image data


                                                          11
giovedì 25 marzo 2010
Keeping the raw data?

        • If we run for ~40 weeks a year we have
          nearly 0.5 PetaBytes (1 PB = 1015 Bytes or 1
          000 000 000 000 000 Bytes)
        • But if we throw the images away there is no
          chance to recuperate more Sequence Data
          from the images when a better (promised)
          algorithm comes along…
        • So biology now faces the problem the
          physicists faced 35 years ago
                                                         12
giovedì 25 marzo 2010
Genome Analyzer IIx
                 Cluster generation

      Attach single molecules to surface
      Amplify to form clusters




                           103 molecules / µm



                         2.2·105 molecules/tile

                                                   13
giovedì 25 marzo 2010
Genome Analyzer IIx
                        Base Calling




                                   •   The identity of each base of each cluster is read off from
                                       sequential images (cycle by cycle)


                                                                                             15
giovedì 25 marzo 2010
Illumina Pipeline




                                            ACTGCTATCTT
                                            TCGATTCGTAC
                                            TGCTAGGCACC
                                            ATCGCATTTCA
                                            GGACGTCCTGC
                                            TAGGCACCATC
                                            GCATCTCCATC



                                                          18
giovedì 25 marzo 2010
Experiment Timeline


                               GA IIx Start             Day 1




                            Illumina Pipeline           Day 10




                         BWA and Yun LI workflow         Day 13




                           Quality-Check Tools          Day 15


                                          Timing for 115 Cycles Experiment on GA IIx


                                                                                       19
giovedì 25 marzo 2010
How much computing?

           A software pipeline has been implemented at CRS4 to perform such
            operations automatically after a sequencing run ends
           40 Gbases per run
           370,000,000 sequences
           4 samples per flowcell
           7,000,000 megabytes of raw data produced per run
           5 days for processing sequence-data on the cluster



           A huge load for the computer centre


                                                                               21
giovedì 25 marzo 2010
How much computing?




                                              22
giovedì 25 marzo 2010
Quality Control




                                          23
giovedì 25 marzo 2010
Quality Control
            We realised we needed an audit by external experts
             of how well we were doing (or how badly)




                                                                  23
giovedì 25 marzo 2010
Quality Control
            We realised we needed an audit by external experts
             of how well we were doing (or how badly)
            We asked experts from the Sanger Institute and from
             Cancer Research, Cambridge, UK




                                                                   23
giovedì 25 marzo 2010
Quality Control
            We realised we needed an audit by external experts
             of how well we were doing (or how badly)
            We asked experts from the Sanger Institute and from
             Cancer Research, Cambridge, UK
            We developed a Quality check process:
                        −   Qualitative and quantitative evaluation of illumina
                            summary file parameters
                        −   Evaluation of sequence quality (avg. number of
                            “blank” base calls)
                        −   Evaluation of coverage / holes
                        −   Evaluation of known/all SNPs found ratio


                                                                                  23
giovedì 25 marzo 2010
Quality Control
            We realised we needed an audit by external experts
             of how well we were doing (or how badly)
            We asked experts from the Sanger Institute and from
             Cancer Research, Cambridge, UK
            We developed a Quality check process:
                        −   Qualitative and quantitative evaluation of illumina
                            summary file parameters
                        −   Evaluation of sequence quality (avg. number of
                            “blank” base calls)
                        −   Evaluation of coverage / holes
                        −   Evaluation of known/all SNPs found ratio
         •   This has been very successful
                                                                                  23
giovedì 25 marzo 2010
Quality Check:
                              – Weekly Team Meeting

           Qualitative and quantitative evaluation of
            illumina summary file parameters:
                        −   Based on Sanger QC protocol
                        −   Quantitative examination of run results
                        −   Qualitative
                            inspection
                            of plots



                                                                      24
giovedì 25 marzo 2010
Summary of results

           In October 2008 we foresaw 6 Gbases per run per machine
           We started at the end of February 2009
           We started a Quality Control initiative in Sept. 2009
           We have continuously improved number of bases per run:
                           Upgrades of machines
                           Preparation of samples (reagents, PCR)
                           Increasing number of cycles
                           New algorithms for image processing and base-calling –
                            better alignment software
                           Quality control




                                                                                     27
giovedì 25 marzo 2010
28
giovedì 25 marzo 2010
Activity summary - statistics


           67 samples sequenced and aligned
           6 samples actually running on the GAs
           Average coverage of samples 2.98X
           ~800 Gbases of raw data
           ~590 Gbases of aligned data



                                                        30
giovedì 25 marzo 2010
Imputation

        • Program from Gonçalo Abecasis and Serena Sanna
        • Very powerful tool in the analysis of population genetics
        • Extrapolate measured data to infer more genomic
          variations that you have not measured
        • Excellent e-Science, use the computer to do better
          science
        • This certainly merits a seminar to itself




                                                                      31
giovedì 25 marzo 2010
Plans and Visions

        • Illumina has announced its latest sequencers, which will
          measure 200 Gbases in a run of 8 days
        • 5 times our current performance in 20% less time
        • Easy to predict 400 or 600 Gbases, – 10 to 15 times as
          much data per run
        • For the plans to sequence 2000 Sardinians together with
          NIH and with University at Ann Arbor, and also for other
          requests from the Park and from Sardinia, we would like
          to acquire some of these new machines




                                                                     32
giovedì 25 marzo 2010
My personal view




                                           33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,
     •   and which ultimately cost Sardinia (and the rest of humanity) a lot of
         money




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,
     •   and which ultimately cost Sardinia (and the rest of humanity) a lot of
         money
     •   It is driven by a predominantly Sardinia team doing excellent work




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,
     •   and which ultimately cost Sardinia (and the rest of humanity) a lot of
         money
     •   It is driven by a predominantly Sardinia team doing excellent work
     •   It binds together necessarily the strong computer centre of CRS4 and
         modern digital sequencing technology to build a forefront Sequencing
         Facility




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,
     •   and which ultimately cost Sardinia (and the rest of humanity) a lot of
         money
     •   It is driven by a predominantly Sardinia team doing excellent work
     •   It binds together necessarily the strong computer centre of CRS4 and
         modern digital sequencing technology to build a forefront Sequencing
         Facility
     •   If we don’t do this now we will lose a golden opportunity for ever




                                                                                         33
giovedì 25 marzo 2010
My personal view

     •   This is an opportunity for Sardinia to play frontier science on a world stage
     •   It exploits the Sardinian genomic heritage and its increased “signal to
         noise” to find the origins and mechanisms of diseases that affect people
         around the world,
     •   and which ultimately cost Sardinia (and the rest of humanity) a lot of
         money
     •   It is driven by a predominantly Sardinia team doing excellent work
     •   It binds together necessarily the strong computer centre of CRS4 and
         modern digital sequencing technology to build a forefront Sequencing
         Facility
     •   If we don’t do this now we will lose a golden opportunity for ever
     •   Where else would you set up such a Facility?



                                                                                         33
giovedì 25 marzo 2010
Thank you for your attention!




                                                        34
giovedì 25 marzo 2010

Contenu connexe

Plus de CRS4 Research Center in Sardinia

Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015CRS4 Research Center in Sardinia
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...CRS4 Research Center in Sardinia
 
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...CRS4 Research Center in Sardinia
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid CRS4 Research Center in Sardinia
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...CRS4 Research Center in Sardinia
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...CRS4 Research Center in Sardinia
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015CRS4 Research Center in Sardinia
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...CRS4 Research Center in Sardinia
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)CRS4 Research Center in Sardinia
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...CRS4 Research Center in Sardinia
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...CRS4 Research Center in Sardinia
 

Plus de CRS4 Research Center in Sardinia (20)

The future is close
The future is closeThe future is close
The future is close
 
The future is close
The future is closeThe future is close
The future is close
 
Presentazione Linea B2 progetto Tutti a Iscol@ 2017
Presentazione Linea B2 progetto Tutti a Iscol@ 2017Presentazione Linea B2 progetto Tutti a Iscol@ 2017
Presentazione Linea B2 progetto Tutti a Iscol@ 2017
 
Iscola linea B 2016
Iscola linea B 2016Iscola linea B 2016
Iscola linea B 2016
 
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
 
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
 
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. PirasBig Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
 
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 Big Data Analytics, Giovanni Delussu e Marco Enrico Piras  Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
 
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
 
Mobile Graphics (part2)
Mobile Graphics (part2)Mobile Graphics (part2)
Mobile Graphics (part2)
 
Mobile Graphics (part1)
Mobile Graphics (part1)Mobile Graphics (part1)
Mobile Graphics (part1)
 

Dernier

Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...perfect solution
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...narwatsonia7
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomdiscovermytutordmt
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi
 
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls JaipurRussian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...narwatsonia7
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...Taniya Sharma
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...CALL GIRLS
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 

Dernier (20)

Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls JaipurRussian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur
Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 

Chris Jones - CRS4 Staff Meeting - Pula (Italy) 24-03-2010

  • 1. Computing for the Analysis of Genomic Data at CRS4 Chris Jones 24th March 2010 1 giovedì 25 marzo 2010
  • 2. Who is Chris Jones? Who is Chris Jones? 2 giovedì 25 marzo 2010
  • 3. Who is Chris Jones? Who is Chris Jones? 2 giovedì 25 marzo 2010
  • 4. Who is Chris Jones? Who is Chris Jones? • 10 years of particle physics research at Oxford and CERN in Geneva 2 giovedì 25 marzo 2010
  • 5. Who is Chris Jones? Who is Chris Jones? • 10 years of particle physics research at Oxford and CERN in Geneva • Strong interest in the use of computers to do things, especially science, BETTER 2 giovedì 25 marzo 2010
  • 6. Who is Chris Jones? Who is Chris Jones? • 10 years of particle physics research at Oxford and CERN in Geneva • Strong interest in the use of computers to do things, especially science, BETTER • The ’70s brought digital detectors and an massive waves of new data to particle physics, causing exciting major changes of use of, and attitude towards computers 2 giovedì 25 marzo 2010
  • 7. Who is Chris Jones? Who is Chris Jones? • 10 years of particle physics research at Oxford and CERN in Geneva • Strong interest in the use of computers to do things, especially science, BETTER • The ’70s brought digital detectors and an massive waves of new data to particle physics, causing exciting major changes of use of, and attitude towards computers • 20 years of innovating, building, developing and running services in the CERN Computer Centre Facility 2 giovedì 25 marzo 2010
  • 8. Who is Chris Jones? Who is Chris Jones? • 10 years of particle physics research at Oxford and CERN in Geneva • Strong interest in the use of computers to do things, especially science, BETTER • The ’70s brought digital detectors and an massive waves of new data to particle physics, causing exciting major changes of use of, and attitude towards computers • 20 years of innovating, building, developing and running services in the CERN Computer Centre Facility 2 giovedì 25 marzo 2010
  • 9. Wellcome Trust Genome Campus 3 giovedì 25 marzo 2010
  • 10. Wellcome Trust Genome Campus • Escaped on sabbatical to European Bioinformatics Institute – EBI 3 giovedì 25 marzo 2010
  • 11. Wellcome Trust Genome Campus • Escaped on sabbatical to European Bioinformatics Institute – EBI • Strong links to Sanger Institute 3 giovedì 25 marzo 2010
  • 12. Wellcome Trust Genome Campus • Escaped on sabbatical to European Bioinformatics Institute – EBI • Strong links to Sanger Institute • And to Roche – Roche Genetics IT Plan 3 giovedì 25 marzo 2010
  • 13. Wellcome Trust Genome Campus • Escaped on sabbatical to European Bioinformatics Institute – EBI • Strong links to Sanger Institute • And to Roche – Roche Genetics IT Plan • Founded the PRISM Forum 3 giovedì 25 marzo 2010
  • 14. Wellcome Trust Genome Campus • Escaped on sabbatical to European Bioinformatics Institute – EBI • Strong links to Sanger Institute • And to Roche – Roche Genetics IT Plan • Founded the PRISM Forum 3 giovedì 25 marzo 2010
  • 15. Why Sequence Genomes? • I hope Francesco has explained that very well • Genomic sequence is the most fundamental information, the starting point, when you look at how living objects work… • And studies of “genotype” versus “phenotype” can bring us an understanding of the origins of disease which has been completely out of reach until now • The technology is just becoming available… 5 giovedì 25 marzo 2010
  • 16. DNA sequence and genes look like… cacaattacttccacaaatgcagtt gaagcttctactcttcttgcatagg taacctgagtcggagcagttttcct cgtggcttcatctttggtgctggat cttcagcataccaatttgaaggtgc agtaaacgaaggcggtagaggacca agtatttgggataccttcacccata aatatccagaaaaaataagggatgg aagcaatgcagacatcacggttgc 6 giovedì 25 marzo 2010
  • 17. The Human Genome 7 giovedì 25 marzo 2010
  • 18. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine 7 giovedì 25 marzo 2010
  • 19. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine • It took 15 years for the first human genome sequence 7 giovedì 25 marzo 2010
  • 20. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine • It took 15 years for the first human genome sequence • Which was released between 2003 - 2005 7 giovedì 25 marzo 2010
  • 21. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine • It took 15 years for the first human genome sequence • Which was released between 2003 - 2005 • There are 3*109 or 3 Gigabases in the human genome 7 giovedì 25 marzo 2010
  • 22. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine • It took 15 years for the first human genome sequence • Which was released between 2003 - 2005 • There are 3*109 or 3 Gigabases in the human genome • Pine trees have ~10 times more bases ! Why? 7 giovedì 25 marzo 2010
  • 23. The Human Genome • The nucleotide bases are: a- adenine, c- cytosine, g- guanine, t- thymine • It took 15 years for the first human genome sequence • Which was released between 2003 - 2005 • There are 3*109 or 3 Gigabases in the human genome • Pine trees have ~10 times more bases ! Why? • Do not confuse Gb - bits, GB - Bytes, Gbases (Gb)! 7 giovedì 25 marzo 2010
  • 24. Genome Analyzer IIx  In Edificio 3  Two GAIIx machines  Each of which:  40 Gbases / run  Paired end reads  4 Gbases / day  but which are complex and forefront technology... 8 giovedì 25 marzo 2010
  • 25. Genome Analyzer IIx  In Edificio 3  Two GAIIx machines  Each of which:  40 Gbases / run  Paired end reads  4 Gbases / day  but which are complex and forefront technology... 8 giovedì 25 marzo 2010
  • 26. Genome Analyzer IIx Preparation Workflow Sample Prep Pipeline Analysis 9 giovedì 25 marzo 2010
  • 27. Genome Analyzer IIx FlowCell  8 Lanes  120 Tiles (2 cols 60 tiles)  4 Pictures per tile (A-T-G-C fluos)  On each tile ~220k clusters 10 giovedì 25 marzo 2010
  • 28. How much data per run? 11 giovedì 25 marzo 2010
  • 29. How much data per run? • 7.3 MBytes image data per tile * 120 tiles * 8 lanes = 7 000 Mbytes = 7 GigaBytes 11 giovedì 25 marzo 2010
  • 30. How much data per run? • 7.3 MBytes image data per tile * 120 tiles * 8 lanes = 7 000 Mbytes = 7 GigaBytes • * 4 bases per read * read length (say 100) = 2 800 GBytes or 2.8 TeraBytes (TB) 11 giovedì 25 marzo 2010
  • 31. How much data per run? • 7.3 MBytes image data per tile * 120 tiles * 8 lanes = 7 000 Mbytes = 7 GigaBytes • * 4 bases per read * read length (say 100) = 2 800 GBytes or 2.8 TeraBytes (TB) • * 2 for the paired end = 5.6 TBytes 11 giovedì 25 marzo 2010
  • 32. How much data per run? • 7.3 MBytes image data per tile * 120 tiles * 8 lanes = 7 000 Mbytes = 7 GigaBytes • * 4 bases per read * read length (say 100) = 2 800 GBytes or 2.8 TeraBytes (TB) • * 2 for the paired end = 5.6 TBytes • A run of ~1 week on both machines results in 11.2 TeraBytes of image data 11 giovedì 25 marzo 2010
  • 33. Keeping the raw data? • If we run for ~40 weeks a year we have nearly 0.5 PetaBytes (1 PB = 1015 Bytes or 1 000 000 000 000 000 Bytes) • But if we throw the images away there is no chance to recuperate more Sequence Data from the images when a better (promised) algorithm comes along… • So biology now faces the problem the physicists faced 35 years ago 12 giovedì 25 marzo 2010
  • 34. Genome Analyzer IIx Cluster generation  Attach single molecules to surface  Amplify to form clusters 103 molecules / µm 2.2·105 molecules/tile 13 giovedì 25 marzo 2010
  • 35. Genome Analyzer IIx Base Calling • The identity of each base of each cluster is read off from sequential images (cycle by cycle) 15 giovedì 25 marzo 2010
  • 36. Illumina Pipeline ACTGCTATCTT TCGATTCGTAC TGCTAGGCACC ATCGCATTTCA GGACGTCCTGC TAGGCACCATC GCATCTCCATC 18 giovedì 25 marzo 2010
  • 37. Experiment Timeline GA IIx Start Day 1 Illumina Pipeline Day 10 BWA and Yun LI workflow Day 13 Quality-Check Tools Day 15 Timing for 115 Cycles Experiment on GA IIx 19 giovedì 25 marzo 2010
  • 38. How much computing?  A software pipeline has been implemented at CRS4 to perform such operations automatically after a sequencing run ends  40 Gbases per run  370,000,000 sequences  4 samples per flowcell  7,000,000 megabytes of raw data produced per run  5 days for processing sequence-data on the cluster  A huge load for the computer centre 21 giovedì 25 marzo 2010
  • 39. How much computing? 22 giovedì 25 marzo 2010
  • 40. Quality Control 23 giovedì 25 marzo 2010
  • 41. Quality Control  We realised we needed an audit by external experts of how well we were doing (or how badly) 23 giovedì 25 marzo 2010
  • 42. Quality Control  We realised we needed an audit by external experts of how well we were doing (or how badly)  We asked experts from the Sanger Institute and from Cancer Research, Cambridge, UK 23 giovedì 25 marzo 2010
  • 43. Quality Control  We realised we needed an audit by external experts of how well we were doing (or how badly)  We asked experts from the Sanger Institute and from Cancer Research, Cambridge, UK  We developed a Quality check process: − Qualitative and quantitative evaluation of illumina summary file parameters − Evaluation of sequence quality (avg. number of “blank” base calls) − Evaluation of coverage / holes − Evaluation of known/all SNPs found ratio 23 giovedì 25 marzo 2010
  • 44. Quality Control  We realised we needed an audit by external experts of how well we were doing (or how badly)  We asked experts from the Sanger Institute and from Cancer Research, Cambridge, UK  We developed a Quality check process: − Qualitative and quantitative evaluation of illumina summary file parameters − Evaluation of sequence quality (avg. number of “blank” base calls) − Evaluation of coverage / holes − Evaluation of known/all SNPs found ratio • This has been very successful 23 giovedì 25 marzo 2010
  • 45. Quality Check: – Weekly Team Meeting  Qualitative and quantitative evaluation of illumina summary file parameters: − Based on Sanger QC protocol − Quantitative examination of run results − Qualitative inspection of plots 24 giovedì 25 marzo 2010
  • 46. Summary of results  In October 2008 we foresaw 6 Gbases per run per machine  We started at the end of February 2009  We started a Quality Control initiative in Sept. 2009  We have continuously improved number of bases per run:  Upgrades of machines  Preparation of samples (reagents, PCR)  Increasing number of cycles  New algorithms for image processing and base-calling – better alignment software  Quality control 27 giovedì 25 marzo 2010
  • 48. Activity summary - statistics  67 samples sequenced and aligned  6 samples actually running on the GAs  Average coverage of samples 2.98X  ~800 Gbases of raw data  ~590 Gbases of aligned data 30 giovedì 25 marzo 2010
  • 49. Imputation • Program from Gonçalo Abecasis and Serena Sanna • Very powerful tool in the analysis of population genetics • Extrapolate measured data to infer more genomic variations that you have not measured • Excellent e-Science, use the computer to do better science • This certainly merits a seminar to itself 31 giovedì 25 marzo 2010
  • 50. Plans and Visions • Illumina has announced its latest sequencers, which will measure 200 Gbases in a run of 8 days • 5 times our current performance in 20% less time • Easy to predict 400 or 600 Gbases, – 10 to 15 times as much data per run • For the plans to sequence 2000 Sardinians together with NIH and with University at Ann Arbor, and also for other requests from the Park and from Sardinia, we would like to acquire some of these new machines 32 giovedì 25 marzo 2010
  • 51. My personal view 33 giovedì 25 marzo 2010
  • 52. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage 33 giovedì 25 marzo 2010
  • 53. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, 33 giovedì 25 marzo 2010
  • 54. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, • and which ultimately cost Sardinia (and the rest of humanity) a lot of money 33 giovedì 25 marzo 2010
  • 55. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, • and which ultimately cost Sardinia (and the rest of humanity) a lot of money • It is driven by a predominantly Sardinia team doing excellent work 33 giovedì 25 marzo 2010
  • 56. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, • and which ultimately cost Sardinia (and the rest of humanity) a lot of money • It is driven by a predominantly Sardinia team doing excellent work • It binds together necessarily the strong computer centre of CRS4 and modern digital sequencing technology to build a forefront Sequencing Facility 33 giovedì 25 marzo 2010
  • 57. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, • and which ultimately cost Sardinia (and the rest of humanity) a lot of money • It is driven by a predominantly Sardinia team doing excellent work • It binds together necessarily the strong computer centre of CRS4 and modern digital sequencing technology to build a forefront Sequencing Facility • If we don’t do this now we will lose a golden opportunity for ever 33 giovedì 25 marzo 2010
  • 58. My personal view • This is an opportunity for Sardinia to play frontier science on a world stage • It exploits the Sardinian genomic heritage and its increased “signal to noise” to find the origins and mechanisms of diseases that affect people around the world, • and which ultimately cost Sardinia (and the rest of humanity) a lot of money • It is driven by a predominantly Sardinia team doing excellent work • It binds together necessarily the strong computer centre of CRS4 and modern digital sequencing technology to build a forefront Sequencing Facility • If we don’t do this now we will lose a golden opportunity for ever • Where else would you set up such a Facility? 33 giovedì 25 marzo 2010
  • 59. Thank you for your attention! 34 giovedì 25 marzo 2010