Sector Sphere 2009

Distributed Data Storage and Parallel Processing Engine Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago

What is Sector/Sphere? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Overview ,[object Object],[object Object],[object Object],[object Object]

Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO, data locality

Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. But too complicated Sector/Sphere model (cloud model): Clusters are a unity to the developer, simplified programming interface. Limited to certain data parallel applications.

Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.

Sector Distributed File System Security Server Masters slaves slaves SSL SSL Clients User account Data protection System Security Metadata Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional

Sector Distributed File System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Sector: Performance ,[object Object],[object Object],[object Object],[object Object],[object Object]

UDT: UDP-based Data Transfer ,[object Object],[object Object],[object Object],[object Object],[object Object]

Sector: Fault Tolerance ,[object Object],[object Object],[object Object]

Sector: Security ,[object Object],[object Object],[object Object],[object Object],[object Object]

Sector: Tools and API ,[object Object],[object Object],[object Object],[object Object],[object Object]

Sphere: Simplified Data Processing ,[object Object],[object Object],[object Object],[object Object],[object Object]

Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss," findBrownDwarf ", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

Sphere: Data Movement ,[object Object],[object Object],[object Object]

Sphere/UDF vs. MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Sphere/UDF vs. MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object]

Why Sector doesn’t Split Files? ,[object Object],[object Object],[object Object],[object Object]

Load Balance ,[object Object],[object Object]

Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object]

Open Cloud Testbed ,[object Object],[object Object],[object Object],[object Object],[object Object]

The TeraSort Benchmark ,[object Object],[object Object],[object Object]

TeraSort 10-byte 90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket-1023 0-1023 Stage 1 : Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2 : Sort each bucket on local node 100 bytes record

Performance Results: TeraSort Run time: seconds Sector v1.16 vs Hadoop 0.17 1.2TB 900GB 600GB 300GB Data Size 3702 6675 1526 UIC + StarLight + Calit2 + JHU 3069 4341 1430 UIC + StarLight + Calit2 2617 2896 1361 UIC + StarLight 2252 2889 1265 UIC Hadoop (1 replica) Hadoop (3 replicas) Sphere

Performance Results: TeraSort ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The MalStone Benchmark ,[object Object],[object Object],[object Object],http://code.google.com/p/malgen/

MalStone Site ID Time Key Value 3-byte site-000X site-001X site-999X 000-999 Stage 1 : Process each record and hash into buckets according to site ID site-000X site-001X site-999x Stage 2 : Compute infection rate for each merchant Event ID | Timestamp | Site ID | Compromise Flag | Entity ID 00000000005000000043852268954353585368|2008-11-08 17:56:52.422640|3857268954353628599|1|000000497829 Text Record Transform Flag

Performance Results: MalStone * Courtesy of Collin Bennet and Jonathan Seidman of Open Data Group. Process 10 billions records on 20 OCT nodes (local). 43m 44s 33m 40s Sector/Sphere 142m 32s 87m 29s Hadoop Streaming/Python 840m 50s 454m 13s Hadoop MalStone-B MalStone-A

System Monitoring (Sector/Sphere)

For More Information ,[object Object],[object Object],[object Object]

Sector Sphere 2009

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sector Sphere 2009

Similar to Sector Sphere 2009 (20)

Sector Sphere 2009