1. www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010
2. Project Matsu Goals Provide persistent data resources and elastic computing to assist in disasters: Make imagery available for disaster relief workers Elastic computing for large scale image processing Change detection for temporally different and geospatially identical image sets Provide a resource to test standards and interoperability studies large data clouds
4. 501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 4 www.opencloudconsortium.org
5. OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 5
10. Focus of OCC Large Data Cloud Working Group 8 App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services Developing APIs for this framework.
11. Tools and Standards Apache Hadoop/MapReduce Sector/Sphere large data cloud Open Geospatial Consortium Web Map Service (WMS) OCC tools are open source (matsu-project) http://code.google.com/p/matsu-project/
12. Part 2: Technical Approach Hadoop – Lead Andrew Levine Hadoop with Python Streams – Lead Collin Bennet Sector/Sphere – Lead YunhongGu
15. Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output
18. Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).geolocation timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer
19.
20. All of the work for mapping was done in the pre-process step
23. Sector Distributed File System Sector aggregate hard disk storage across commodity computers With single namespace, file system level reliability (using replication), high availability Sector does not split files A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network A directory can be kept together on a single node as well, as an option
24. Sphere UDF Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) Existing applications can be wrapped up in a Sphere UDF In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs ./stream -ihaiti -cossim_foo -o results