2. Who am I ?
• Joined Citrix OSS team in July 2012
• Associate professor at Clemson
University prior
• High Performance Computing, Grid
computing
• At CERN summer 2009/2010, built their
first cloud on opennebula
• http://sebgoa.blogspot.com
@sebgoa
3. • Apache CloudStack and licloud committer + PMC
member
• Looking at techs and how they work together
• Half dev, half community manager, + half event planner
What do I do ?
16. Goals
• Utility computing
• Elasticity of the infrastructure
• On-demand
• Pay as you go
• Multi-tenant
• Programmable access
17. So what…
Let’s assume this is solved.
What is not solved:
- Application deployment
- Application scalability
- Application portability
- Application composability
27. CoreOS
• Linux distribution
• Rolling upgrades
• Minimal OS
• Docker support
• etcd and fleet tools
to manage distributed
applications based on
containers.
• Cloud-init support
• Systemd units
36. CoreOS clustering
etcd HA key value store
• Raft election algorithm
• Writes when majority in cluster has committed
update
• e.g 5 nodes, tolerates 2 nodes failure
fleet distributed init system (schedules
systemd units in a cluster)
• Submits systemd units cluster wide
• Affinity, anti-affinity, global “scheduling”
53. Kubernetes on CloudStack
Find a CloudStack cloud that supports
CoreOS
Then use:
https://github.com/runseb/ansible-kubernetes
Based on the Ansible cloudstack module
54. Cloud API
Libcloud startup
scripts
Etcd cluster
5 nodes
Discovery service to
bootstrap
Kubernetes cluster
5 nodes
Start Kube* services via fleet
Run guestbook example
PR welcome:
https://github.com/runseb/
kubernetes-exoscale
OLD WAY
55. Cloud (e.g CloudStack based = exoscale)
coreOS coreOS coreOS
K* K* K*
Docker
container
Docker
container
Docker
container
API calls to
Kubernetes API
66. New Distributed systems for:
Large scale datasets
• From scientific instruments
• From Web apps logs
Complex datasets
• Not necessarily large.
Object stores
• S3 clones
67. BigData and map-reduce
• While BigData is often associated with HDFS,
Map-Reduce is the algorithm used to
parallelize data processing.
• BigData ≠ Map-Reduce ≠ HDFS
• Map-reduce is a way to express
embarrassingly parallel work easily.
• You can do Map-Reduce without HDFS.
• e.g Basho map-reduce on riackCS
80. Clouds and BigData
• Object store + compute IaaS to build EC2+S3
clone
• BigData solutions as storage backends for
image catalogue and large scale instance
storage.
• BigData solutions as workloads to CloudStack
based clouds.
81. EC2, S3 clone
• An open source IaaS with an EC2
wrapper e.g Opennebula, CloudStack
• Deploy a S3 compatible object store –
separately- e.g riakCS
• Two independent distributed systems
deployed
Cloud = EC2 + S3
82. Big Data
as IaaS backend
“Big Data” solutions can be used as secondary
storage in CloudStack
.
83. Example
• Open source IaaS + EC2 wrapper, e.g
CloudStack
• Deploy S3 compatible object store, e.g
riakCS or Ceph or glusterFS
• Use S3 as image store
• Your EC2 service is a customer to your S3
service
+ Logstash + elasticsearch for logs/
monitoring
85. A note on Scheduling
• Core problem of computer science
• knapsack is NP complete
• Central scheduling has been used for a long
time in HPC
• Optimizing the cluster utilization requires
multi-level scheduling (e.g backfill,
preemption etc..)
• Google Omega paper 2013
• Mesos 2009/2011, ASF Dec 2011
87. Food for thought
Mesos Framework for managing VM ?
Workload sharing in your data-center:
• Big Data
• VM
• Services
Cloud and BigData
88. Conclusions
• Big Data is “catching up”
• Tackle the “big three” head on:
• BigData, Cloud and DevOps
• Add a big data backend to your cloud
from the start
• Provide Big Data services on your cloud
90. Get Involved with Apache
CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.
91. The Velocity Conference
Santa Clara, May 27-29
• 2 days of keynotes & sessions
• 1 day of tutorials
• New full-day trainings
• Amazing presenters – Jez Humble,
Patrick Meenan, Mesosphere, Fastly &
more
Use discount code
CLOUDSTACK20 during
registration for 20% off
http://velocityconf.com/velocity2015