More Related Content Similar to Matthew Mosesohn - Configuration Management at Large Companies (20) Matthew Mosesohn - Configuration Management at Large Companies 1. © MIRANTIS 2013 PAGE 1© MIRANTIS 2013
Scaling Puppet
Deployments
Matthew Mosesohn
Senior Deployment Engineer
2. © MIRANTIS 2013 PAGE 2
Configure by hand
●
Insert media into system
●
Install OS
●
Install software
●
Configure software
●
Verify
●
Done?
3. © MIRANTIS 2013 PAGE 3
Automate
●
PXE installation
– Imaging
– Cobbler
– Foreman
– Razor
●
Configuration
– Puppet
– Chef
– Salt
– Ansible
4. © MIRANTIS 2013 PAGE 4
Puppet
●
Powerful tool written in Ruby
●
Extensible
●
Built in syntax checking
●
Large community
●
Used in many major companies, including:
– Google
– Cisco
– PayPal
– VMWare
5. © MIRANTIS 2013 PAGE 5
Our purpose
●
FUEL is a tool designed to deploy OpenStack
●
FUEL consists of:
– Astute: Orchestration library built on Mcollective
– Library: Puppet manifests
– Web: Python web app to deliver a rich user experience
– Cobbler: provisioning of bare metal
– Bootstrap: lightweight install environment for node discovery
6. © MIRANTIS 2013 PAGE 6
Tiny example
●
1 master Cobbler and Puppet server
●
2 node OpenStack cluster
●
OS deployment: 5 minutes
●
Puppet configuration: 15 minutes each
●
Total time: ~40 minutes
7. © MIRANTIS 2013 PAGE 7
Typical example
●
1 master Cobbler and Puppet server
●
10 node OpenStack cluster
●
OS deployment: 30 minutes total
●
Puppet configuration: 15 minutes each
●
Total time: ~2hr 45min
8. © MIRANTIS 2013 PAGE 8
Stretching the limits
●
1 master Cobbler and Puppet server
●
100 node OpenStack cluster
●
OS deployment: ?? minutes total
●
Puppet configuration: 15 minutes each
●
Total time: Maybe 24 hours?
9. © MIRANTIS 2013 PAGE 9
How to get to 1,000?
●
Physical limitations of physical disks
●
Physical limitations of network
●
Puppet limitations
●
Cobbler limitations
●
Messaging/orchestration limitations
●
Durability/patience of client applications
10. © MIRANTIS 2013 PAGE 10
Approach: Scale the server!
●
Pure speed. Don't care about anything else.
●
Buy expensive system with 2 SSDs in RAID-0, 12
cores, 256GB memory, and bonded NICs
●
Peak I/O: ~800MB/s
11. © MIRANTIS 2013 PAGE 11
How crowded is your network segment?
●
More than 500 nodes on one network is bad
●
Broadcast traffic will hinder normal traffic
●
One lost packet means TFTP must fail and start
over
●
Make a second network and set a DHCP relay
●
Update your PXE server's DHCP configuration
12. © MIRANTIS 2013 PAGE 12
err: Could not retrieve catalog
from remote server: Connection
refused connect(2)
13. © MIRANTIS 2013 PAGE 13
Puppet load
●
Catalog compile time
– 12s per node
●
Serve files: 12mb each host
●
Receive and store 500kb report in YAML format
●
Store in PuppetDB
14. © MIRANTIS 2013 PAGE 14
How to avoid failure
●
IPMI control of all nodes (expensive)
●
Orchestration that can reset a host if it gets
“stuck” along the way
●
Staggered approach to avoid overload on master
15. © MIRANTIS 2013 PAGE 15
How the pros do it
●
Large US bank
●
2 Puppet CA servers
●
3 Puppet catalog masters
●
DNS round robin for catalog servers
●
2000 hosts
●
Must stagger initial deployments
16. © MIRANTIS 2013 PAGE 16
Conclusion
●
Not fast enough
●
Too much data
●
Still a bottleneck
●
Expensive hardware
17. © MIRANTIS 2013 PAGE 17
Approach: Ditch Puppetmaster!
●
Still need to provision a base OS
●
Still need package repository
●
Still need to be fast
●
Still need to have some “brain” to identify
servers
18. © MIRANTIS 2013 PAGE 18
Speed up provisioning
●
Install every nth server to serve as a provisioning
mirror all in RAM
●
TFTP still must come from master server, but 30
minutes of pain for bootstrap is okay
●
HTTP for OS installation can be balanced via
DNS round robin to each mirror
●
Provision mirror hosts last
19. © MIRANTIS 2013 PAGE 19
Package repository
●
YUM repository should be located close to
cluster
●
Mirror via Cobbler/Foreman
●
Or somewhere in your organization with fast
disks
20. © MIRANTIS 2013 PAGE 20
External Node Classifiers
Arbitrary script to tell nodes
what resources to install
ENC providers include:
– Puppet Dashboard
– Foreman
– Hiera
– LDAP
– Amazon CloudFormation
– YAML file carried by
pigeon
21. © MIRANTIS 2013 PAGE 21
External Node Classifiers
●
What they can provide:
– Puppet master hostname
– Environment name (production, devel, stage)
– Classes to use
– Puppet facts needed for installation
22. © MIRANTIS 2013 PAGE 22
Getting Puppet manifests to nodes
●
How do you place manifests on a node?
●
Without relying on one host, pick most robust
system available
23. © MIRANTIS 2013 PAGE 23
Getting Puppet manifests to nodes
●
Plain Git
– Version controlled system
– Widely implemented
– Simple to get started
– Fits into Puppet's environment structure via branches
24. © MIRANTIS 2013 PAGE 24
Getting Puppet manifests to nodes
●
Puppet Librarian
– Created by Tim “Rodjek” Sharpe from GitHub
– Flexible manifest sources
– Can specify a puppet “forge”
– Can retrieve from git repositories
– Dependency handling
– Version specification optional
– Creates a local Git repository to track changes
25. © MIRANTIS 2013 PAGE 25
Getting Puppet manifests to nodes
●
RPM format
– Technique used by Sam Bashton
– Versioned as well
– As easy to deploy as any other package
– Requires clever building process
26. © MIRANTIS 2013 PAGE 26
Getting Puppet manifests to nodes
●
RPM format magic
– Jenkins job to take GIT code with manifests
– Run puppet-lint on all puppet code
– Create tarball of puppet manifests and hiera data
– Wrap inside a package with a new version number
– Push ready package to software repository
27. © MIRANTIS 2013 PAGE 27
Running local is better
●
Deploying on great new
hardware
●
Faster catalog build
●
No waiting for manifests or
uploading reports
●
No timeouts or connections
refused
29. © MIRANTIS 2013 PAGE 29
Rsyslog
●
Scaling rsyslog requires lots of disk, but they
don't have to be fast
●
Rsyslog can throttle clients effectively
●
Clients can hold logs until server is ready to
receive
●
Everybody wins
30. © MIRANTIS 2013 PAGE 30
Doing the math
Stage Before After
Bootstrap OS 10min 10min (but that's okay)
Base OS provision 8hrs (10 concurrent) 30min to set up 20 mirrors
25-40min to install (200 concurrent)
30min to install mirrors
Puppet provisioning 10d 10hr (15min x
1000 hosts, one at a
time)
45 mins for all 3 controllers, one at a
time
20 mins for compute nodes
Totals: 12 days 2-3 hours
31. © MIRANTIS 2013 PAGE 31
References
● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html
● http://www.masterzen.fr/2012/01/08/benchmarking-puppet-stacks/
● http://theforeman.org/manuals/1.3/index.html#3.5.5FactsandtheENC
● https://github.com/rodjek/librarian-puppet
● http://www.slideshare.net/PuppetLabs/sam-bashton
32. © MIRANTIS 2013 PAGE 32
Ref commands
puppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "/e'
Time:
....
Nova paste api ini: 0.02
Package: 0.03
Notify: 0.03
Nova config: 0.10
File: 0.40
Exec: 0.56
Service: 1.39
Augeas: 1.56
Total: 11.85
Last run: 1379522172
Config retrieval: 7.73