2. Part 1. Overview of theOpen Cloud Consortium (OCC) www.opencloudconsortium.org 2
3. 501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 3 www.opencloudconsortium.org
4. OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 4
5. OCC Working Groups Large Data Cloud Working Group Open Cloud Testbed Working Group. Intercloud Testbed Working Group Open Science Data Cloud Working Group
32. Part 3. Large Data Cloud Working Group 11 Standards for integrating and interoperating large data cloud services such as those provided by Hadoop and similar systems.
33. Focus of Working Group 12 App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services Developing APIs for this framework.
34. Benchmarks for Large Data Clouds Until recently, the only benchmark used was Terasort (sorting 10 billion 100 byte records) Replaced by Gray Sort and Minute Sort Gray Sort tries to maximize TB / min sorted on 100 TB or more of data. Hadoop holds the current Gray Sort and Minute Sort records. Problem: sort is just one of the types of work load for analytic applications
35. MalStone MalGen – generates synthetic data with realistic distributions. MalStone A & B – “stylized” computations that can be used as benchmarks for architectures, software and systems for large data clouds. Open source and available at malgen.googlecode.com 14
37. Condominium Clouds In a condominium cloud, you buy your own rack or bunch of racks. The racks are managed and operated by the condominium association, in this case the OCC. If your rack is 120 TB, you get the rights to c. 40 TB of storage in the cloud. The rest is a shared resource. The Open Cloud Testbed is a condo cloud managed by the OCC. 16
43. Part 5. Open Science Data Cloud Working Group 18
44. Open Science Data Cloud Biological data (Bionimbus) Astronomical data Provide a long term home for selected scientific data sets and support elastic cloud-based analysis & integration of the data. Networking data 19
45. Part 6. Image Processing for Disaster Relief Using Elastic Clouds
46. The Challenge When a disaster strikes, there is usually an immediate and critical need for computing power to process images. Example, there was a delay getting current images of Haiti to non-government organizations (NGO) after earthquake on January 12, 2009.
47. The Idea …The OCC Elastic Cloud for Disaster Relief Set up a permanent elastic cloud that is available to assist with disaster relief. Establish connections to sources of images that can be enabled at times of need. Set up a network of volunteers with accounts on the cloud and knowledge of the tools that can swarm when needed. Use as a test of large data cloud standards and interoperability.
48. Image Processing on Large Data Clouds Data parallel applications Parallelism is often required at file or directory level Data locality is important Parallel disk IO is also critical Requirements The input data size can be at 10+ TB per day Want to integrate with open source libraries such as OSSIM
49. Distributed File Systems & Image Processing Sector is broadly similar to the Hadoop Distributed File System Main differences Hadoop directly implements a distributed block based file system Sector is a layer over a native file system Sector does not split files A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network A directory can be kept together on a single node as well, as an option
55. … To Add Another Public Cloud to A Private/Public Cloud?
56. We Have Several Ways of Defining Virtual Networks…. VN-Link Open vSwitch VPNs CloudSwitch vSwitch VLAN BGP MPLS OpenFlow
57. But No Vendor Neutral VN Standard That That scales to 100,000+ VMs Supported by multiple vendors Spans multiple physical switches Supports VN Mobility Provides strong isolation of VN Is easy for VMs to join and leave VNs Includes management interfaces ….