SlideShare une entreprise Scribd logo
1  sur  26
Health Sciences Driving
UCSD Research Cyberinfrastructure

                   Invited Talk
       UCSD Health Sciences Faculty Council
                  UC San Diego
                   April 3, 2012


                      Dr. Larry Smarr
Director, California Institute for Telecommunications and
                 Information Technology
                Harry E. Gruber Professor,
      Dept. of Computer Science and Engineering
          Jacobs School of Engineering, UCSD
           Follow me at http://lsmarr.calit2.net
UCSD Researcher
          Research Cyberinfrastructure Needs
• UCSD Researchers                   Diverse Sources of Data
  Surveyed in 2008 to
  Determine Their Unmet CI
  Needs
• Answer: DATA – Help!
  – Data Infrastructure
    (Storage, Transmission,
    Curation)
  – Data Expertise
    (Management, Analysis,
    Visualization, Curation)




                     Source: Mike Norman, SDSC
“Blueprint for
a Digital University”




    Report 2009
    http://rci.ucsd.edu
UCSD RCI
                 Provider Organizations




RCI element SDSC           UCSD          ACT       Calit2
                           Libraries
Co-Location Lead

Storage      Lead          Partner                 Partner
Curation     Partner       Lead
Computing    Lead
Networking   Partner                     Lead      Partner




                                                             4
                       Source: Mike Norman, SDSC
From One to a Billion Data Points Defining Me:
The Exponential Rise in Body Data in Just One Decade

                                           Full Genome




                                  SNPs




                    Blood
                   Variables


      Weight
First Stage of Metagenomic Sequencing of
My Gut Microbiome at J. Craig Venter Institute




                                              I Received
                                          a Disk Drive Today
                                         With 30-50 GigaBytes




 Gel Image of Extract from Smarr Sample-Next is Library Construction
      Manny Torralba, Project Lead - Human Genomic Medicine
                        J Craig Venter Institute
                           January 25, 2012
The Coming Digital Transformation
           of Health




  www.technologyreview.com/biomedicine/39636
Integrative Personal Omics Profiling
Reveals Details of Clinical Onset of Viruses and Diabetes
                                    Cell 148, 1293–1307, March 16, 2012

                                         •   Michael Snyder,
                                             Chair of Genomics
                                             Stanford Univ.
                                         •   Genome 140x
                                             Coverage
                                         •   Blood Tests 20
                                             Times in 14 Months
                                               – tracked nearly
                                                 20,000 distinct
                                                 transcripts coding
                                                 for 12,000 genes
                                               – measured the
                                                 relative levels of
                                                 more than 6,000
                                                 proteins and 1,000
                                                 metabolites in
                                                 Snyder's blood
Source: Lucila Ohno-Machado, UCSD SOM




    iDASH




      Outcome of NIH Botstein-Smarr Report (1999)
          9
http://acd.od.nih.gov/agendas/060399_Biomed_Computing_WG_RPT.htm
integrating Data for Analysis,
Anonymization, and SHaring (iDASH)




                                  Private Cloud at SD Supercomputer Center
                                         Medical Center Data Hosting
                                             HIPAA certified facility




    Source: Lucila Ohno-Machado, UCSD SOM                         10
           funded by NIH U54HL108460
Data + Ontologies + Tools


                    UCSF          UC Davis   UC Irvine    UCLA         UCSD

Complications
associated with
a new drug or                     Extraction Transformation Load
device?             (even with same vendor, the EMRs are configured differently)




                                        Semantic Integration

                  Query



                    Information

                          Source: Lucila Ohno-Machado, UCSD SOM
Personalized Care and Population Health


• Genomics
  – SNP-based therapy (cancer)
• ‘Phenomics’
  – Electronic Health Records
  – Personal monitoring
     – Blood pressure, glucose
  – Behavior
     – Adherence to medication, exercise
• Public Health and Environment
  – Air quality, food
  – Surveillance

                                                           Source: DOE




                   Source: Lucila Ohno-Machado, UCSD SOM
NCMIR’s Integrated Infrastructure
                   of Shared Resources



                      Shared Infrastructure




 Scientific                                            Local SOM
Instruments                                           Infrastructure




                                   End User
                                  Workstations
                       Source: Steve Peltier, NCMIR
Ideker Lab Workflow

Leichtag/Sequencer      Storage                           Skaggs/Users




                     Calit2/Storage                       SDSC/Triton
                       Source: Chris Misleh, Calit2/SOM
Next Generation Genome Sequencers
      Produce Large Data Sets




         Source: Chris Misleh, SOM
Moving to Shared Enterprise Data Storage & Analysis
Resources: SDSC Triton Resource & Calit2 GreenLight
   http://tritonresource.sdsc.edu                Source: Philip Papadopoulos, SDSC, UCSD
 SDSC
 Large Memory                                                                  SDSC Shared
 Nodes                                                                         Resource
 • 256/512 GB/sys                                                              Cluster
 • 8TB Total                                                                   • 24 GB/Node
 • 128 GB/sec                                                                  • 6TB Total
 • ~ 9 TF                                                                      • 256 GB/sec
                                                                     x256      • ~ 20 TF
                          x28

                                                                            UCSD Research Labs
                                SDSC Data Oasis
                                Large Scale Storage
                                • 2 PB
                                • 50 GB/sec
                                • 3000 – 6000 disks
                                • Phase 0: 1/3 PB, 8GB/
                                s



N x 10Gb/s                                                Campus
                                                          Research
                                                          Network
                            Calit2 GreenLight
SOM Use of
                   SDSC Triton Resource
• 10 SOM PIs Received Substantial Allocations
   – 100K CPU-hours or more


• 8 SOM PIs / Labs Currently Using Triton with Time Purchased
  from Grant Funds

• 30+ Active Trial Accounts

• Supporting ~6 Next Generation Sequencing Projects with PIs
  from SOM, SIO, and 2 Outside Research Institutes (TSRI, LIAI)
Community Cyberinfrastructure for Advanced
  Microbial Ecology Research and Analysis


        http://camera.calit2.net/
Calit2 Microbial Metagenomics Cluster-
 Next Generation Optically Linked Science Data Server
                                Source: Phil Papadopoulos, SDSC, Calit2




       512 Processors
                                                  ~200TB
         ~5 Teraflops                               Sun
                                                   X4500
   ~ 200 Terabytes Storage      1GbE and
                                                  Storage
                                 10GbE
                                Switched/
                                                  10GbE
                                 Routed
                                  Core




   4000 Users
From 90 Countries
Creating CAMERA 2.0 -
Advanced Cyberinfrastructure Service Oriented Architecture




                                                   Source:
                                                CAMERA CTO
                                                 Mark Ellisman
Access to Computing Resources Tailored by
   User’s Requirements and Resources




               Advanced HPC Platforms
    CAMERA
    Core HPC
    Resource


                NSF/DOE TeraScale
                Resources




               Source: Jeff Grethe, CAMERA
NSF Funds a Data-Intensive Track 2 Supercomputer:
        SDSC’s Gordon-Coming Summer 2011
• Data-Intensive Supercomputer Based on
  SSD Flash Memory and Virtual Shared Memory SW
  – Emphasizes MEM and IOPS over FLOPS
  – Supernode has Virtual Shared Memory:
     – 2 TB RAM Aggregate
     – 8 TB SSD Aggregate
     – Total Machine = 32 Supernodes
     – 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access
  to Massive Data Bases being Generated in
  Many Fields of Science, Engineering, Medicine,
  and Social Science

                Source: Mike Norman, Allan Snavely SDSC
Rapid Evolution of 10GbE Port Prices
   Makes Campus-Scale 10Gbps CI Affordable
    • Port Pricing is Falling
    • Density is Rising – Dramatically
    • Cost of 10GbE Approaching Cluster HPC Interconnects
$80K/port
Chiaro
(60 Max)



                 $ 5K
                 Force 10
                 (40 max)                                      ~$1000
                                                               (300+ Max)

                                       $ 500
                                       Arista                  $ 400
                                       48 ports                Arista
                                                               48 ports
2005              2007                  2009            2010




             Source: Philip Papadopoulos, SDSC/Calit2
10G Switched Data Analysis Resource:
          SDSC’s Data Oasis – Scaled Performance
10Gbps
            OptIPuter                            UCSD
                                                  RCI                Radical Change Enabled by
                                 Co-Lo
                                                                      Arista 7508 10G Switch
                            5                                            384 10G Capable
                                   8                        CENIC/
                                             2
                 32                                          NLR
   Triton                                          4

                                                                                Existing
                                                        8
                                                                               Commodity
  Trestles 32                            2                                      Storage
  100 TF          12                                                             1/3 PB

                                             40128
                  8
   Dash
                                                                             2000 TB
                                             Oasis Procurement (RFP)
                                                                            > 50 GB/s
                      128          • Phase0: > 8GB/s Sustained Today
  Gordon                           • Phase I: > 50 GB/sec for Lustre (May 2011)
                                    :Phase II: >100 GB/s (Feb 2012)

                            Source: Philip Papadopoulos, SDSC/Calit2
2012 RCI Initiatives


• RCI is Preparing an Attractive Storage Offering
  for All UCSD Researchers to Encourage Adoption
  – “Wide and Deep”
  – On-Ramp to Digital Curation Efforts
• SOM Possesses Many of the Most Data-Intensive
  Instruments on Campus (NGS, MassSpec, MRI)
  – Effort to Connect Them to RCI Resources This Year
• SDSC Working with DBMI to Define a HIPPA-compliant
  Cloud Computing Resource that Would Leverage or
  Extend RCI Resources
• RCI Implementation Team Needs your Input and
  Collaboration (email Richard Moore @ SDSC)

                    Source: Mike Norman, SDSC
Potential UCSD Optical Networked
               Biomedical Researchers and Instruments
                                                                             •   Connects at 10 Gbps :
   CryoElectron
Microscopy Facility                                                               – Microarrays
                                                               San Diego          – Genome Sequencers
                                                             Supercomputer        – Mass Spectrometry
                                                                Center
                                                                                  – Light and Electron
                                                                                    Microscopes
                                                                                  – Whole Body Imagers
                                                                                  – Computing
Cellular & Molecular
                                                                                  – Storage
  Medicine East
                                        Calit2@UCSD




                                        Bioengineering
                                                            Radiology
                                                           Imaging Lab
  National
 Center for                                                                        Developing
Microscopy &
  Imaging                                        Center for
                                             Molecular Genetics
                                                                                  Detailed Plan
  Pharmaceutical
 Sciences Building                  Cellular & Molecular
              Biomedical Research     Medicine West

Contenu connexe

Similaire à Health Sciences Driving UCSD Research Cyberinfrastructure

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 

Similaire à Health Sciences Driving UCSD Research Cyberinfrastructure (20)

A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Driving Applications on the UCSD Big Data Freeway System
Driving Applications on the UCSD Big Data Freeway SystemDriving Applications on the UCSD Big Data Freeway System
Driving Applications on the UCSD Big Data Freeway System
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
 
Dna chip
Dna chipDna chip
Dna chip
 

Plus de Larry Smarr

Plus de Larry Smarr (20)

My Remembrances of Mike Norman Over The Last 45 Years
My Remembrances of Mike Norman Over The Last 45 YearsMy Remembrances of Mike Norman Over The Last 45 Years
My Remembrances of Mike Norman Over The Last 45 Years
 
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
 
Panel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
 
Global Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
 
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
 
Panel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
 
Panel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical NetworksPanel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical Networks
 
Global Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine BrownGlobal Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine Brown
 
Built around answering questions
Built around answering questionsBuilt around answering questions
Built around answering questions
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Health Sciences Driving UCSD Research Cyberinfrastructure

  • 1. Health Sciences Driving UCSD Research Cyberinfrastructure Invited Talk UCSD Health Sciences Faculty Council UC San Diego April 3, 2012 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD Follow me at http://lsmarr.calit2.net
  • 2. UCSD Researcher Research Cyberinfrastructure Needs • UCSD Researchers Diverse Sources of Data Surveyed in 2008 to Determine Their Unmet CI Needs • Answer: DATA – Help! – Data Infrastructure (Storage, Transmission, Curation) – Data Expertise (Management, Analysis, Visualization, Curation) Source: Mike Norman, SDSC
  • 3. “Blueprint for a Digital University” Report 2009 http://rci.ucsd.edu
  • 4. UCSD RCI Provider Organizations RCI element SDSC UCSD ACT Calit2 Libraries Co-Location Lead Storage Lead Partner Partner Curation Partner Lead Computing Lead Networking Partner Lead Partner 4 Source: Mike Norman, SDSC
  • 5. From One to a Billion Data Points Defining Me: The Exponential Rise in Body Data in Just One Decade Full Genome SNPs Blood Variables Weight
  • 6. First Stage of Metagenomic Sequencing of My Gut Microbiome at J. Craig Venter Institute I Received a Disk Drive Today With 30-50 GigaBytes Gel Image of Extract from Smarr Sample-Next is Library Construction Manny Torralba, Project Lead - Human Genomic Medicine J Craig Venter Institute January 25, 2012
  • 7. The Coming Digital Transformation of Health www.technologyreview.com/biomedicine/39636
  • 8. Integrative Personal Omics Profiling Reveals Details of Clinical Onset of Viruses and Diabetes Cell 148, 1293–1307, March 16, 2012 • Michael Snyder, Chair of Genomics Stanford Univ. • Genome 140x Coverage • Blood Tests 20 Times in 14 Months – tracked nearly 20,000 distinct transcripts coding for 12,000 genes – measured the relative levels of more than 6,000 proteins and 1,000 metabolites in Snyder's blood
  • 9. Source: Lucila Ohno-Machado, UCSD SOM iDASH Outcome of NIH Botstein-Smarr Report (1999) 9 http://acd.od.nih.gov/agendas/060399_Biomed_Computing_WG_RPT.htm
  • 10. integrating Data for Analysis, Anonymization, and SHaring (iDASH) Private Cloud at SD Supercomputer Center Medical Center Data Hosting HIPAA certified facility Source: Lucila Ohno-Machado, UCSD SOM 10 funded by NIH U54HL108460
  • 11. Data + Ontologies + Tools UCSF UC Davis UC Irvine UCLA UCSD Complications associated with a new drug or Extraction Transformation Load device? (even with same vendor, the EMRs are configured differently) Semantic Integration Query Information Source: Lucila Ohno-Machado, UCSD SOM
  • 12. Personalized Care and Population Health • Genomics – SNP-based therapy (cancer) • ‘Phenomics’ – Electronic Health Records – Personal monitoring – Blood pressure, glucose – Behavior – Adherence to medication, exercise • Public Health and Environment – Air quality, food – Surveillance Source: DOE Source: Lucila Ohno-Machado, UCSD SOM
  • 13. NCMIR’s Integrated Infrastructure of Shared Resources Shared Infrastructure Scientific Local SOM Instruments Infrastructure End User Workstations Source: Steve Peltier, NCMIR
  • 14. Ideker Lab Workflow Leichtag/Sequencer Storage Skaggs/Users Calit2/Storage SDSC/Triton Source: Chris Misleh, Calit2/SOM
  • 15. Next Generation Genome Sequencers Produce Large Data Sets Source: Chris Misleh, SOM
  • 16. Moving to Shared Enterprise Data Storage & Analysis Resources: SDSC Triton Resource & Calit2 GreenLight http://tritonresource.sdsc.edu Source: Philip Papadopoulos, SDSC, UCSD SDSC Large Memory SDSC Shared Nodes Resource • 256/512 GB/sys Cluster • 8TB Total • 24 GB/Node • 128 GB/sec • 6TB Total • ~ 9 TF • 256 GB/sec x256 • ~ 20 TF x28 UCSD Research Labs SDSC Data Oasis Large Scale Storage • 2 PB • 50 GB/sec • 3000 – 6000 disks • Phase 0: 1/3 PB, 8GB/ s N x 10Gb/s Campus Research Network Calit2 GreenLight
  • 17. SOM Use of SDSC Triton Resource • 10 SOM PIs Received Substantial Allocations – 100K CPU-hours or more • 8 SOM PIs / Labs Currently Using Triton with Time Purchased from Grant Funds • 30+ Active Trial Accounts • Supporting ~6 Next Generation Sequencing Projects with PIs from SOM, SIO, and 2 Outside Research Institutes (TSRI, LIAI)
  • 18. Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis http://camera.calit2.net/
  • 19. Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server Source: Phil Papadopoulos, SDSC, Calit2 512 Processors ~200TB ~5 Teraflops Sun X4500 ~ 200 Terabytes Storage 1GbE and Storage 10GbE Switched/ 10GbE Routed Core 4000 Users From 90 Countries
  • 20. Creating CAMERA 2.0 - Advanced Cyberinfrastructure Service Oriented Architecture Source: CAMERA CTO Mark Ellisman
  • 21. Access to Computing Resources Tailored by User’s Requirements and Resources Advanced HPC Platforms CAMERA Core HPC Resource NSF/DOE TeraScale Resources Source: Jeff Grethe, CAMERA
  • 22. NSF Funds a Data-Intensive Track 2 Supercomputer: SDSC’s Gordon-Coming Summer 2011 • Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW – Emphasizes MEM and IOPS over FLOPS – Supernode has Virtual Shared Memory: – 2 TB RAM Aggregate – 8 TB SSD Aggregate – Total Machine = 32 Supernodes – 4 PB Disk Parallel File System >100 GB/s I/O • System Designed to Accelerate Access to Massive Data Bases being Generated in Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC
  • 23. Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable • Port Pricing is Falling • Density is Rising – Dramatically • Cost of 10GbE Approaching Cluster HPC Interconnects $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) ~$1000 (300+ Max) $ 500 Arista $ 400 48 ports Arista 48 ports 2005 2007 2009 2010 Source: Philip Papadopoulos, SDSC/Calit2
  • 24. 10G Switched Data Analysis Resource: SDSC’s Data Oasis – Scaled Performance 10Gbps OptIPuter UCSD RCI Radical Change Enabled by Co-Lo Arista 7508 10G Switch 5 384 10G Capable 8 CENIC/ 2 32 NLR Triton 4 Existing 8 Commodity Trestles 32 2 Storage 100 TF 12 1/3 PB 40128 8 Dash 2000 TB Oasis Procurement (RFP) > 50 GB/s 128 • Phase0: > 8GB/s Sustained Today Gordon • Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012) Source: Philip Papadopoulos, SDSC/Calit2
  • 25. 2012 RCI Initiatives • RCI is Preparing an Attractive Storage Offering for All UCSD Researchers to Encourage Adoption – “Wide and Deep” – On-Ramp to Digital Curation Efforts • SOM Possesses Many of the Most Data-Intensive Instruments on Campus (NGS, MassSpec, MRI) – Effort to Connect Them to RCI Resources This Year • SDSC Working with DBMI to Define a HIPPA-compliant Cloud Computing Resource that Would Leverage or Extend RCI Resources • RCI Implementation Team Needs your Input and Collaboration (email Richard Moore @ SDSC) Source: Mike Norman, SDSC
  • 26. Potential UCSD Optical Networked Biomedical Researchers and Instruments • Connects at 10 Gbps : CryoElectron Microscopy Facility – Microarrays San Diego – Genome Sequencers Supercomputer – Mass Spectrometry Center – Light and Electron Microscopes – Whole Body Imagers – Computing Cellular & Molecular – Storage Medicine East Calit2@UCSD Bioengineering Radiology Imaging Lab National Center for Developing Microscopy & Imaging Center for Molecular Genetics Detailed Plan Pharmaceutical Sciences Building Cellular & Molecular Biomedical Research Medicine West

Notes de l'éditeur

  1. I will quickly hint to the problem of data harmonization without getting into details, speak about how difficult it is to find A1ATD patients despite ICD-9 codes.
  2. This is a production cluster with it’s own Force10 e1200 switch. It is connected to quartzite and is labeled as the “CAMERA Force10 E1200”. We built CAMERA this way because of technology deployed successfully in Quartzite