SlideShare une entreprise Scribd logo
1  sur  25
Sector: An Open Source Cloud for Data Intensive Computing Robert Grossman University of Illinois at ChicagoOpen Data Group October 20, 2009
Part 1.  Sector 2 http://sector.sourceforge.net
Sector Overview Sector is fastest open source large data cloud As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 has a backup master node server 3
About Sector  YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector. Sector is open source (BSD License) and available from sector.sourceforge.net The current version is 1.24a 4
Target Configurations Sector is designed to run on racks of commodity computers Typical rack configuration today (Oct, 2009) Rack of 32 quad-core 1U computers Each computer has 4 x 1TB disks Each computer has 1 Gbps connection to a top of a rack switch Sometimes these are called Raywulf clusters 5
Google’s Large Data Cloud Compute Services Data Services Storage Services 6 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
Hadoop’s Large Data Cloud Compute Services Storage Services 7 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
Sector’s Large Data Cloud 8 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
Comparing Sector and Hadoop 9
Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack.  Data consisted of 20 nodes with 500 million 100-byte records / node.
How Do You Program A Data Center? 12
Idea 1 – Support UDF’s Over Data Center Think of MapReduce as Map acting on (text) records With fixed Shuffle and Sort Followed by Reducing acting on (text) records We generalize this framework as follows: Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files. MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc. 13
Applying UDF using Sector/Sphere 14 1. Split data Application Sphere Client Input  stream SPE SPE SPE 2. Locate & schedule Sphere Processing Engine (SPE) 3. Collect results Output stream
Sector Programming Model Sector dataset consists of one or more physical files Sphere applies User Defined Functions over streams of data consisting of data segments Data segments can be data records, collections of data records, or files Example of UDFs: Map function, Reduce function, Split function for CART, etc. Outputs of UDFs can be returned to originating node, written to local node,  or shuffled to another node. 15
How Do Move Data in a Cloud & Between Clouds? 16 Option 1: Use TCP and close your eyes. Option 2:    ?????
Idea 2: Sector is Built on Top of UDT 17 ,[object Object]
UDT can take advantage of wide area high performance 10 Gbps network
Sector is a wide area distributed file system built over UDT.
Sector is layered over the native file system (vs being a block-based file system).,[object Object]
(x) UDT Scalable TCP HighSpeed TCP AIMD (TCP NewReno) x Alternatives to TCP – Decreasing Increases AIMD Protocols increase of packet sending rate x decrease factor
UDT Makes Wide Area Clouds Possible Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps) 20 10 Gbps per application
What About Security? 21
Idea 3: Add Security From the Start Security Server Security server maintains information about users and slaves. User access control: password and client IP address. File level access control. Messages are encrypted over SSL. Certificate is used for authentication. Sector is HIPAA capable. Master Client SSL SSL AAA data Slaves
For More Information About Sector YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009 http://arxiv.org/abs/0809.1181 http://rsta.royalsocietypublishing.org/content/367/1897/2429 23

Contenu connexe

Tendances (20)

DBMS Unit IV and V Material
DBMS Unit IV and V MaterialDBMS Unit IV and V Material
DBMS Unit IV and V Material
 
assignment3
assignment3assignment3
assignment3
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Upper layer protocol
Upper layer protocolUpper layer protocol
Upper layer protocol
 
Ds1 int (1)
Ds1 int (1)Ds1 int (1)
Ds1 int (1)
 
Memory allocation (4)
Memory allocation (4)Memory allocation (4)
Memory allocation (4)
 
Ch20
Ch20Ch20
Ch20
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
 
Network topology for ha
Network topology for haNetwork topology for ha
Network topology for ha
 
RAID Levels
RAID LevelsRAID Levels
RAID Levels
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Tape Access Optimization With TReqS
Tape Access Optimization With TReqSTape Access Optimization With TReqS
Tape Access Optimization With TReqS
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage DelugeWebinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
報告
報告報告
報告
 

Similaire à Sector - Presentation at Cloud Computing & Its Applications 2009

My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)Robert Grossman
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Robert Grossman
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaGabriele Bozzi
 
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...IJNSA Journal
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutoriallilyco
 
IRJET- Collaborative Network Security in Data Center for Cloud Computing
IRJET-  	  Collaborative Network Security in Data Center for Cloud ComputingIRJET-  	  Collaborative Network Security in Data Center for Cloud Computing
IRJET- Collaborative Network Security in Data Center for Cloud ComputingIRJET Journal
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptMohmdUmer
 

Similaire à Sector - Presentation at Cloud Computing & Its Applications 2009 (20)

My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom Italia
 
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutorial
 
IRJET- Collaborative Network Security in Data Center for Cloud Computing
IRJET-  	  Collaborative Network Security in Data Center for Cloud ComputingIRJET-  	  Collaborative Network Security in Data Center for Cloud Computing
IRJET- Collaborative Network Security in Data Center for Cloud Computing
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
sdnppt.pdf
sdnppt.pdfsdnppt.pdf
sdnppt.pdf
 
grid mining
grid mininggrid mining
grid mining
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 

Plus de Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 

Plus de Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Sector - Presentation at Cloud Computing & Its Applications 2009

  • 1. Sector: An Open Source Cloud for Data Intensive Computing Robert Grossman University of Illinois at ChicagoOpen Data Group October 20, 2009
  • 2. Part 1. Sector 2 http://sector.sourceforge.net
  • 3. Sector Overview Sector is fastest open source large data cloud As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 has a backup master node server 3
  • 4. About Sector YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector. Sector is open source (BSD License) and available from sector.sourceforge.net The current version is 1.24a 4
  • 5. Target Configurations Sector is designed to run on racks of commodity computers Typical rack configuration today (Oct, 2009) Rack of 32 quad-core 1U computers Each computer has 4 x 1TB disks Each computer has 1 Gbps connection to a top of a rack switch Sometimes these are called Raywulf clusters 5
  • 6. Google’s Large Data Cloud Compute Services Data Services Storage Services 6 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
  • 7. Hadoop’s Large Data Cloud Compute Services Storage Services 7 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
  • 8. Sector’s Large Data Cloud 8 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
  • 10. Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
  • 11. MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
  • 12. How Do You Program A Data Center? 12
  • 13. Idea 1 – Support UDF’s Over Data Center Think of MapReduce as Map acting on (text) records With fixed Shuffle and Sort Followed by Reducing acting on (text) records We generalize this framework as follows: Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files. MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc. 13
  • 14. Applying UDF using Sector/Sphere 14 1. Split data Application Sphere Client Input stream SPE SPE SPE 2. Locate & schedule Sphere Processing Engine (SPE) 3. Collect results Output stream
  • 15. Sector Programming Model Sector dataset consists of one or more physical files Sphere applies User Defined Functions over streams of data consisting of data segments Data segments can be data records, collections of data records, or files Example of UDFs: Map function, Reduce function, Split function for CART, etc. Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node. 15
  • 16. How Do Move Data in a Cloud & Between Clouds? 16 Option 1: Use TCP and close your eyes. Option 2: ?????
  • 17.
  • 18. UDT can take advantage of wide area high performance 10 Gbps network
  • 19. Sector is a wide area distributed file system built over UDT.
  • 20.
  • 21. (x) UDT Scalable TCP HighSpeed TCP AIMD (TCP NewReno) x Alternatives to TCP – Decreasing Increases AIMD Protocols increase of packet sending rate x decrease factor
  • 22. UDT Makes Wide Area Clouds Possible Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps) 20 10 Gbps per application
  • 24. Idea 3: Add Security From the Start Security Server Security server maintains information about users and slaves. User access control: password and client IP address. File level access control. Messages are encrypted over SSL. Certificate is used for authentication. Sector is HIPAA capable. Master Client SSL SSL AAA data Slaves
  • 25. For More Information About Sector YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009 http://arxiv.org/abs/0809.1181 http://rsta.royalsocietypublishing.org/content/367/1897/2429 23
  • 26. For Related Information Related information can be found at: blog.rgrossman.com www.rgrossman.com 24