Unleash Your Potential - Namagunga Girls Coding Club
Openstack In Real Life
1. Openstack Clouds
IRL
Paul Guth
Cloudscaling
July 26th, 2012 1
LA #openstack meetup
2. Me
• @pgutheb
• wordpress.com/constructolution
• slideshare.net/pgutheb
• paul | at | cloudscaling - dot - com
July 26th, 2012
LA #openstack meetup 2
3. Stuff I will say today
• Context setting
• What do you do before you build your Openstack cloud
• What do you do while you’re building your Openstack
cloud
• What do you do after you build your Openstack cloud
July 26th, 2012
LA #openstack meetup 3
4. What is Openstack?
• Open source cloud management system (IaaS)
• Five components:
• Nova - VMs
• Swift - Object Storage
• Glance - Images
• Horizon - Dashboard/UI
• Keystone - Authentication
• Great Openstack architecture overview from Ken Pepple
July 26th, 2012
LA #openstack meetup 4
5. What is a private cloud?
July 26th, 2012
LA #openstack meetup 5
6. What is a private cloud?
• Blahblahblahblahblahblahblah
July 26th, 2012
LA #openstack meetup 5
7. What is a private cloud?
• Blahblahblahblahblahblahblah
• “I am not a dictionary, I am a free man!” - Number 6
July 26th, 2012
LA #openstack meetup 5
8. What Have We Done?
July 26th, 2012
LA #openstack meetup 6
9. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
July 26th, 2012
LA #openstack meetup 6
10. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
• ~10 distinct internal groups using
• So exercises multi-tenant features
• Being used for new cloud-y apps
July 26th, 2012
LA #openstack meetup 6
11. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
• ~10 distinct internal groups using
• So exercises multi-tenant features
• Being used for new cloud-y apps
• Multiple sites set up as independent regions (1 AZ/
region)
July 26th, 2012
LA #openstack meetup 6
12. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
• ~10 distinct internal groups using
• So exercises multi-tenant features
• Being used for new cloud-y apps
• Multiple sites set up as independent regions (1 AZ/
region)
• “Several dozen” compute nodes per site (VM capacity of
“a few thousand”)
July 26th, 2012
LA #openstack meetup 6
13. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
• ~10 distinct internal groups using
• So exercises multi-tenant features
• Being used for new cloud-y apps
• Multiple sites set up as independent regions (1 AZ/
region)
• “Several dozen” compute nodes per site (VM capacity of
“a few thousand”)
• One site containing Swift
July 26th, 2012
LA #openstack meetup 6
14. What Have We Done?
• Private cloud deployment for BigCo (not a Web co)
• ~10 distinct internal groups using
• So exercises multi-tenant features
• Being used for new cloud-y apps
• Multiple sites set up as independent regions (1 AZ/
region)
• “Several dozen” compute nodes per site (VM capacity of
“a few thousand”)
• One site containing Swift
• Integrated into existing IT Ops processes
July 26th, 2012
LA #openstack meetup 6
15. Plan Your Plan
• Which version of Openstack?
• What hardware do you use?
• What Operating System?
July 26th, 2012
LA #openstack meetup 7
16. What’s in a version?
• Openstack versions:
• Bexar (Feb 2011)
• Cactus (Apr 2011)
• Diablo-release (Sept 2011)
• Diablo-stable (Oct 2011) (-final?)
• Essex (Apr 2012) (you are here)
• Folsom (Sept 2012)
July 26th, 2012
LA #openstack meetup 8
17. What’s in a version?
• Openstack versions:
• Bexar (Feb 2011) Danger
• Cactus (Apr 2011) Will Robinson!
• Diablo-release (Sept 2011)
• Diablo-stable (Oct 2011) (-final?)
• Essex (Apr 2012) (you are here)
• Folsom (Sept 2012)
July 26th, 2012
LA #openstack meetup 8
18. What do you put it on?
• “There are more x86 server platforms in heaven and
earth than are dreamt of in your philosophy” -
Shakespeare
• At scale, you’ll have a huge number of compute nodes, a
reasonable number of storage nodes, and a small
number of other nodes - so compute node platform
generally dominates the decision
July 26th, 2012
LA #openstack meetup 9
19. Server Hardware
• Compute nodes
• Lots of RAM (64G+)
• Enough CPU (cost/benefit)
• Enough disk (more spindles = better)
• Model the workloads you want to offer to find the right
hardware
• Storage nodes (object servers, not proxies)
• Lots of spindles
• Enough space
• Server memory/CPU not as critical
July 26th, 2012
LA #openstack meetup 10
20. One Operating System to Rule Them All
• We use Ubuntu
• The vast majority of Openstack development is done
on Ubuntu
• Canonical are big-time bought into Openstack
• Other OSes probably work, YMMV
July 26th, 2012
LA #openstack meetup 11
25. Bootstrapping
• Openstack does need a running OS to be installed on
• Fairy dust not on roadmap until 2014
July 26th, 2012
LA #openstack meetup 14
26. Bootstrapping
• Openstack does need a running OS to be installed on
• Fairy dust not on roadmap until 2014
• Options
• Bootstrap all your nodes off an ISO yourself
• Bootstrap all your nodes off a PXEboot server yourself
• Cobbler, Juju, MAAS (Canonical)
• Crowbar (Dell)
• Open Cloud OS (Cloudscaling)
• PentOS (Piston)
July 26th, 2012
LA #openstack meetup 14
27. Bootstrapping
• Openstack does need a running OS to be installed on
• Fairy dust not on roadmap until 2014
• Options
• Bootstrap all your nodes off an ISO yourself
• Bootstrap all your nodes off a PXEboot server yourself
• Cobbler, Juju, MAAS (Canonical)
• Crowbar (Dell)
• Open Cloud OS (Cloudscaling)
• PentOS (Piston)
• (don’t forget to burnin your hardware!)
July 26th, 2012
LA #openstack meetup 14
29. The Network
• Typical setups use 3-4 distinct networks
• Hardware management - IPMI, bare-metal mgmt
• Node management - PXE, installation, configuration
• External - Ingress/Egress traffic
• Internal - Admin, Compute->Storage, Compute->Compute
• Internal/External can be high bandwidth
• Hardware/node management are low-bandwidth
July 26th, 2012
LA #openstack meetup 15
30. The Network
• Typical setups use 3-4 distinct networks
• Hardware management - IPMI, bare-metal mgmt
• Node management - PXE, installation, configuration
• External - Ingress/Egress traffic
• Internal - Admin, Compute->Storage, Compute->Compute
• Internal/External can be high bandwidth
• Hardware/node management are low-bandwidth
• Nova inserts a weird network traffic flow model into your pretty
network architecture diagrams (central node providing NAT and
DHCP for all VMs)
• Creates SPOF leading to ugly failover implementations (for
varying definitions of ugly)
• Creates aggregation point for traffic
July 26th, 2012
LA #openstack meetup 15
31. Tomato, Tomato
• You have choices wrt configuration when you deploy
Openstack
• APIs
• S3 vs Swift
• EC2 vs Nova
• Queuing models (RabbitMQ vs 0MQ)
• Network models
• nova-volume (EBS) vs DAS
July 26th, 2012
LA #openstack meetup 16
32. Tomato, Tomato
• You have choices wrt configuration when you deploy
Openstack
• APIs
• S3 vs Swift
• EC2 vs Nova
• Queuing models (RabbitMQ vs 0MQ)
• Network models
• nova-volume (EBS) vs DAS
July 26th, 2012
LA #openstack meetup 16
33. Have Cloud, Will Operate
• Now that you’ve got it, it’s got to run
July 26th, 2012
LA #openstack meetup 17
34. You Bought It, You Break It
July 26th, 2012
LA #openstack meetup 18
35. You Bought It, You Break It
• Configuration Management
• Capacity Management
• User Management
• Remote Access
• High Availability
• Upgrading
• Troubleshooting
• Monitoring
• Auditing
July 26th, 2012
LA #openstack meetup 18
36. You Bought It, You Break It
• Configuration Management
• Capacity Management
• User Management
• Remote Access
• High Availability
• Upgrading
• Troubleshooting
• Monitoring
• Auditing
• Yes, you need to do these, and no, Openstack doesn’t do them
July 26th, 2012
LA #openstack meetup 18
37. What We Do
• Configuration Management, User Management = chef
• Capacity Management = metrics via collectd
• Remote Access = dedicated/hardened SSH server, VPN
• Troubleshooting = rsyslog centralized logging, collectd
• Monitoring = Nagios, Zabbix
• Auditing = tcpspy, auditd
July 26th, 2012
LA #openstack meetup 19
38. High Availability
• For Nova, several services can be redundant just by
running more than one instance, like the scheduler
• nova-compute and swift-object-server redundancy at the
node level (any node can fail without it being an
emergency) - we use that for the services we build where
possible
• Replace RabbitMQ with 0MQ, nova single mySQL DB with
mySQL cluster behind UCARP failover (similar for glance)
• Some services need TCP loadbalancing, e.g. the nova-api
and swift-proxy services
• We do this with Quagga (ECMP OSPF) + Pound (also SSL)
• nova-network is tricky - we are taking it out of the critical
path of NAT and DHCP, replacing it with our own plug-in
networking driver for Openstack
July 26th, 2012
LA #openstack meetup 20
40. Patches/Upgrades
• just run apt-get upgrade
July 26th, 2012
LA #openstack meetup 21
41. Patches/Upgrades
• just run apt-get upgrade
• JUST KIDDING
July 26th, 2012
LA #openstack meetup 21
42. Patches/Upgrades
• just run apt-get upgrade
• JUST KIDDING
• Need to consider persistent data - mySQL DB and queues
• Can use live-ish migration to take load off of compute servers
and upgrade individually
• As long as you have 3+ zones, you can just take swift nodes
offline to upgrade individually
• Central services require careful planning and testing
• Hope for tolerance of multiple versions at once
• So far we’ve been able to do everything without impacting VMs
or objects
July 26th, 2012
LA #openstack meetup 21
43. Patches/Upgrades
• just run apt-get upgrade
• JUST KIDDING
• Need to consider persistent data - mySQL DB and queues
• Can use live-ish migration to take load off of compute servers
and upgrade individually
• As long as you have 3+ zones, you can just take swift nodes
offline to upgrade individually
• Central services require careful planning and testing
• Hope for tolerance of multiple versions at once
• So far we’ve been able to do everything without impacting VMs
or objects
• ^^ That is not a guarantee!
July 26th, 2012
LA #openstack meetup 21
44. Who Ops your Ops?
• You probably have an existing IT Ops org and processes
that will interface with your cloud - if not run it directly
• Cloud is a different paradigm, requiring different Ops
perspectives and potentially skillsets
• Recommendation
• Build a dedicated Cloud Ops team
• Build bridges from that team to existing teams from
day one
July 26th, 2012
LA #openstack meetup 22
46. Summary
• You can build and run a private cloud on Openstack
today
July 26th, 2012
LA #openstack meetup 23
47. Summary
• You can build and run a private cloud on Openstack
today
• It does require a fair bit of clue and experience
July 26th, 2012
LA #openstack meetup 23
48. Summary
• You can build and run a private cloud on Openstack
today
• It does require a fair bit of clue and experience
• To be reliable and scalable, you have to add lots of stuff
around the edges and configure it all the right way
July 26th, 2012
LA #openstack meetup 23
49. Summary
• You can build and run a private cloud on Openstack
today
• It does require a fair bit of clue and experience
• To be reliable and scalable, you have to add lots of stuff
around the edges and configure it all the right way
Thank you!
@pgutheb / paul at cloudscaling dot com
July 26th, 2012
LA #openstack meetup 23