Presentation at OpenStack Days Mountain West sharing lessons Rackspace has learned building and operating the world's largest OpenStack public cloud and some of the world's largest private clouds.
5. RACKSPACE PUBLIC CLOUD
• 6 Geographic regions around the globe
• Tens of thousands of hypervisors
• Over 350,000 Cores, Over 1.2 Petabytes of RAM
• Hundreds of thousands of virtual machines
• Several hundred on-metal instances
• Hundreds of thousands of virtual switch ports
6. Concept of Nova Cells to
scale regions to 1,000 of nodes
Tempest: the initial QA test
framework for OpenStack
OpenStack Ansible
deployment project
Magnum: the container
management system
Rewriting the Swift
object server in Go to Meet
hyper-scale demands
Barbican: the key
management service
KEY COMMUNITY
CONTRIBUTIONS
TO OPENSTACK
7. RACKSPACE’S
LEADERSHIP
• Freely share lessons learned
• Contribute code and ideas to the OpenStack
project
• Open source tools based on what we use to
operate our clouds
8. OPENSTACK INNOVATION
CENTER (OSIC)
THREE PILLARS
1. Train the Next Generation of OpenStack Contributors
2. Contribute to the removal of Enterprise barriers to OpenStack
adoption
3. Provide an avenue for operational scale testing to the OpenStack
community
10. • Before OpenStack, there was Slicehost
• Scaling limits led to OpenStack
• Xen is Slicehost’s legacy in the Rackspace
Public Cloud
• 10’s of thousands of existing customers meant
starting at scale
• Private Cloud started with clean sheet of paper
ORIGIN STORY
11. RACKSPACE’S
APPROACH
• Continuously upgrade our public cloud
– Deploy upstream OpenStack code
– Patch regularly
• Only use projects stable enough to run in production at scale
• Don’t reinvent the wheel
• Change code in production to meet scale requirements
– Certain bugs we only find in production
– Contribute back upstream when appropriate
• Move ahead of community when necessary
– Create service with internal software
– Contribute code and lessons learned to project
– Switch to project code when ready
13. • Why Cells?
– Scaling – DB & RabbitMQ,
– Reduce failure impact– Broadcast domains/ Nova
– Multiple compute flavors – SSD
– Multiple hardware types
• How we use Cells
– ~100 hosts per cell – scaling/failure impact
– Multiple cells per region – Failure impact
– Group same flavor types
– Group servers from same vendor – Live migration
• Takeaways
– Use cells from day 1
– Plan for scale
PARTITION
YOUR CLOUD
14. ABSTRACT YOUR
CONTROL PLANE
• iNova- Ancestor to TripleO
– Seed servers in each region
– Seed servers & Cells runs on VMs
– Easy to deploy, tear down, redeploy services
– React to issues quickly - Spikes
• Virtualized compute nodes
– Nova compute runs as VM on compute node
– Limits impact of compute node failure
– Reboot compute node but not hypervisor
– Security isolation
• Takeaways
– Explore TripleO – Red Hat OpenStack
– Containerize your control plane – OSA
– Protect your control plane – Use HA
15. AUTOMATE
EVERYTHING
• Operator error is more common than software
failure
• Automation = Making time
• OpenStack Ansible
– Encodes recommended practices
– Rackspace Private Cloud RA
– Highly customizable
– Great community support
• Takeaways
– Automation starts day 1
– Pick an appropriate tool and run with it
16. USE FLEET
MANAGEMENT
• Failure is inevitable at scale
• We created tools to manage the fleet
– Auditor – Monitor for rules compliance
– Resolver – Automate tasks based on events
– Use Cases
• Upgrades and patches – Xen vulnerability live patch
• Maintenance – Live migration
• Takeaways
– Focus on service availability over component
availability
– You can’t manage what you don’t know
– Leverage live migration
– Check out Project Craton
17. • Rackspace Public Cloud
https://www.rackspace.com/cloud
• Rackspace Private Cloud
https://www.rackspace.com/cloud/private/openstacksolutions
• OpenStack Innovation Center
https://osic.org/
• Rackspace Blog
http://blog.rackspace.com/
• Rackspace Videos at OpenStack Summits
https://www.youtube.com/user/OpenStackFoundation/playlists
• Project Craton
https://github.com/openstack/craton
RESOURCES