SlideShare une entreprise Scribd logo
1  sur  38
Open Science Data Cloud Robert Grossman Open Cloud Consortium
Today is a good day to get involved with the Open Science Data Cloud.
Part 1: Basic Facts About the OSDC Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief 3
Who are we?
501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and  IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 5 www.opencloudconsortium.org
OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities:  CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 6
Operates Clouds 500 nodes 3000 cores 1.5+ PB Four data centers 10 Gbps Target to refresh 1/3 each year. ,[object Object]
Open Science Data Cloud
IntercloudTestbed
Cloud-based Disaster Relief Services,[object Object]
What Are the Projects?
Project 1: Bionimubs 10 www.cistrack.org
Project 2: Bulk Download of the SDSS 11 ,[object Object]
 Sector LLPR varies between 0.61 and 0.98 Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.
Project 3: Image Processing in the Cloud  Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper
Project 4: Anomalies in Network Data 13
What is the OSDC?
Hosted, managed, distributed facility to: Manage & archive your medium and large datasets Provide computational resources to analyze it Provide networking to share it with your colleagues and the public.
Long Time Goal Build a (small) data center for science.
And preserve your data the same way that libraries preserve books & museums preserve art.
Why do it?
Work on something that matters to you more than money [and, presumably, papers]. Create more value than you capture. Take the long view. Work on Stuff That MattersTim O’Reilly, Jan 11, 2009
What is similar?
Internet Archive
Wayback Machine
Part 2:Why Another Cloud Project?
Variety of analysis Scientist with laptop Wide Open Science Data Cloud Med High energy physics, astronomy Low Data Size Medium to Large  Small Very Large Dedicated infrastructure No infrastructure General infrastructure
Persistent data Large data clouds Med databases HPC Small Cycles Large & spec. clusters Small to medium clusters Single workstations
Who do you most trust to manage your data for 100 years? Companies may not be here tomorrow. Government agencies have a role, but not always easy to use. Think of a not for profit with that mission.
Part 3:Technical Approach
Condominium Clouds In a condominium cloud, you buy your own rack or bunch of racks. The racks are managed and operated by the condominium association, in this case the OCC. If your rack is 120 TB, you get the rights to c. 40 TB of storage in the cloud.   The rest is a shared resource.   The Open Cloud Testbed is a condo cloud managed by the OCC. 28
Condo Clouds Open source software stack: Hadoop, Sector, Eucalyptus, Nova, NoSQLDBs,  Raywulf rack
Data Migration Challenge: data migration. Solution: use Hadoop style replication.
Operating Model Operating model requires constant cap ex investment each year, for example 10 racks or $1M.  (Cap in PB)
Retiring Equipment ,[object Object]
Solution: Support virtual networks, virtual data centers, etc.,[object Object]
But No Vendor Neutral VN Standard That  That scales to 100,000+ VMs  Supported by multiple vendors  Spans multiple physical switches  Supports VN Mobility  Provides strong isolation of VN  Is easy for VMs to join and leave VNs  Includes management interfaces  …. OCC has a working group working on VN standards
Bridging the Gaps…A Small Step ,[object Object]

Contenu connexe

Tendances

Tendances (20)

Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Results from Retrofit for the Future.
Results from Retrofit for the Future.Results from Retrofit for the Future.
Results from Retrofit for the Future.
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...
DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...
DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 

En vedette (7)

Welcome To Networks! Boise
Welcome To Networks! BoiseWelcome To Networks! Boise
Welcome To Networks! Boise
 
Vetlladors 2009
Vetlladors 2009Vetlladors 2009
Vetlladors 2009
 
Virtual Networks - A Perspective from a Cloud Connect 2010 Panel
Virtual Networks - A Perspective from a Cloud Connect 2010 PanelVirtual Networks - A Perspective from a Cloud Connect 2010 Panel
Virtual Networks - A Perspective from a Cloud Connect 2010 Panel
 
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
Promote, Inform, & Engage Your B2B Clients & Prospects Via Social Media
Promote, Inform, & Engage Your B2B Clients & Prospects Via Social MediaPromote, Inform, & Engage Your B2B Clients & Prospects Via Social Media
Promote, Inform, & Engage Your B2B Clients & Prospects Via Social Media
 
patent
patentpatent
patent
 

Similaire à Open Science Data Cloud (June 21, 2010)

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
Ian Foster
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
OCTO Technology Suisse
 

Similaire à Open Science Data Cloud (June 21, 2010) (20)

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformatics
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 

Plus de Robert Grossman

Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
Robert Grossman
 

Plus de Robert Grossman (17)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Open Science Data Cloud (June 21, 2010)

  • 1. Open Science Data Cloud Robert Grossman Open Cloud Consortium
  • 2. Today is a good day to get involved with the Open Science Data Cloud.
  • 3. Part 1: Basic Facts About the OSDC Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief 3
  • 5. 501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 5 www.opencloudconsortium.org
  • 6. OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 6
  • 7.
  • 10.
  • 11. What Are the Projects?
  • 12. Project 1: Bionimubs 10 www.cistrack.org
  • 13.
  • 14. Sector LLPR varies between 0.61 and 0.98 Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.
  • 15. Project 3: Image Processing in the Cloud Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper
  • 16. Project 4: Anomalies in Network Data 13
  • 17. What is the OSDC?
  • 18. Hosted, managed, distributed facility to: Manage & archive your medium and large datasets Provide computational resources to analyze it Provide networking to share it with your colleagues and the public.
  • 19. Long Time Goal Build a (small) data center for science.
  • 20. And preserve your data the same way that libraries preserve books & museums preserve art.
  • 22. Work on something that matters to you more than money [and, presumably, papers]. Create more value than you capture. Take the long view. Work on Stuff That MattersTim O’Reilly, Jan 11, 2009
  • 26. Part 2:Why Another Cloud Project?
  • 27. Variety of analysis Scientist with laptop Wide Open Science Data Cloud Med High energy physics, astronomy Low Data Size Medium to Large Small Very Large Dedicated infrastructure No infrastructure General infrastructure
  • 28. Persistent data Large data clouds Med databases HPC Small Cycles Large & spec. clusters Small to medium clusters Single workstations
  • 29. Who do you most trust to manage your data for 100 years? Companies may not be here tomorrow. Government agencies have a role, but not always easy to use. Think of a not for profit with that mission.
  • 31. Condominium Clouds In a condominium cloud, you buy your own rack or bunch of racks. The racks are managed and operated by the condominium association, in this case the OCC. If your rack is 120 TB, you get the rights to c. 40 TB of storage in the cloud. The rest is a shared resource. The Open Cloud Testbed is a condo cloud managed by the OCC. 28
  • 32. Condo Clouds Open source software stack: Hadoop, Sector, Eucalyptus, Nova, NoSQLDBs, Raywulf rack
  • 33. Data Migration Challenge: data migration. Solution: use Hadoop style replication.
  • 34. Operating Model Operating model requires constant cap ex investment each year, for example 10 racks or $1M. (Cap in PB)
  • 35.
  • 36.
  • 37. But No Vendor Neutral VN Standard That That scales to 100,000+ VMs Supported by multiple vendors Spans multiple physical switches Supports VN Mobility Provides strong isolation of VN Is easy for VMs to join and leave VNs Includes management interfaces …. OCC has a working group working on VN standards
  • 38.
  • 40.
  • 44. Physical ResourcesOpen Cloud Computing Interface (OCCI) Open Virtualization Format (OVF)
  • 45. One Day We Hope to Peer Open Science Data Cloud
  • 46. More Challenges: Finding a Business Model That Works Long Term Challenge: raising constant amount of funding each year. To date: talking to foundations.
  • 47. Thank You For more information: www.opencloudconsortium.org rgrossman.com (for research papers, etc.)