This document discusses Cloudman, a cluster management software from Qubole that launches and manages big data clusters on major public clouds. Cloudman automates cluster provisioning, configuration of big data software stacks, and cluster lifecycle management. It aims to provide highly optimized compute costs. The document outlines some of Cloudman's key features and challenges in areas like autoscaling clusters based on workload and addressing differences between cloud providers.
3. 2015
Cloud offers salvation...
▪Stretches with the workload
▪Pay-as-you-go
...but brings its own challenges
▪Moving data to the cloud
▪Security/Privacy
Big Data in the Cloud
4. 2015
Qubole as Big Data Service
▪Enables Big Data on the cloud
▪Enterprise ready deployments
▪On major public clouds
▪Simple and Fast
9. 2015
Challenges
▪ Autoscale based on workload
▪ Abstractions to address differences in
behaviors of each cloud provider
Examples
− Image creation and registration
− Configuring clusters
10. 2015
▪ Launched automatically when needed
▪ Expands automatically if the load is high
▪ Terminate the cluster with no running jobs
▪ Remove nodes at billing boundary
Autoscaling Clusters
11. 2015
insert overwrite table dest
select … from ads join campaigns on …group by …;
Map Tasks
ReduceTasks
Demand Supply
Progress
Master
Slaves
Job Tracker
Cloudman
Cloudman: AutoScaling
13. 2015
▪ Image creation
▪ Public images in AWS
▪ Not well supported in Azure
▪ Images copied to user’s account in Azure
Image creation and registration
14. 2015
▪ Configure credentials
−Storage and Compute keys
▪Configure the big data stack
−Start appropriate s/w, example JobTracker and
NameNode on Master and TaskTracker and
DataNode on Slaves
Cluster Configuration
15. 2015
Optimizing cost of compute in Cloud
▪ Utilize ephemeral compute instances to lower cost
− AWS Spot Instances
− GCE Preemptible VMs
▪ Challenges
− Data loss
− Big data job failures
21. 2015
Architecture
▪ QDS has a user interface, Python and Java SDKs
and APIs that allows users to talk to QDS and
analyze data sets without knowing cluster
management.
▪ A QDS user can submit primitive commands to
logical clusters.
▪ The middleware layer communicates to the cloud
orchestration layer called Cloudman
▪ Cloudman is responsible for spinning up clusters in
the concerned cloud
22. 2015
▪ One such example is Image creation and registration
▪ Procedure
▪ Precreate a machine image with all the the
softwares to be deployed baked into it
▪ We start the cluster machines using this as the
underlying image
▪ Saves us the time in deploying the softwares on
the nodes after they are up
▪ This process is very different in all the cloud
providers
Image creation and registration
23. 2015
Cluster Configuration
▪Another operation that had to be implemented
differently for each cloud
▪Startup scripts are used for to programmatically
customize virtual machine instances
▪AWS and Google cloud had support for this
▪Azure did not support automatic execution of
this script at the VM boot up time in the Centos
VMs
24. 2015
▪Hadoop clusters in QDS come up automatically
when applications that require them are
launched
▪If the load on the cluster is high, then the
cluster automatically expands.
▪Cloudman automatically launches additional
nodes which eventually join the running cluster
and are able to pick up part of the workload
Autoscaling Clusters