Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 1© MIRANTIS 2013
Scaling Puppet
Deployments
Matthew Mosesohn
Senior Deployment Engineer

© MIRANTIS 2013 PAGE 2
Configure by hand
●
Insert media into system
●
Install OS
●
Install software
●
Configure software
●
Verify
●
Done?

Automate
●
PXE installation
– Imaging
– Cobbler
– Foreman
– Razor
●
Configuration
– Puppet
– Chef
– Salt
– Ansible

Puppet
●
Powerful tool written in Ruby
●
Extensible
●
Built in syntax checking
●
Large community
●
Used in many major companies, including:
– Google
– Cisco
– PayPal
– VMWare

Our purpose
●
FUEL is a tool designed to deploy OpenStack
●
FUEL consists of:
– Astute: Orchestration library built on Mcollective
– Library: Puppet manifests
– Web: Python web app to deliver a rich user experience
– Cobbler: provisioning of bare metal
– Bootstrap: lightweight install environment for node discovery

Tiny example
●
1 master Cobbler and Puppet server
●
2 node OpenStack cluster
●
OS deployment: 5 minutes
●
Puppet configuration: 15 minutes each
●
Total time: ~40 minutes

Typical example
●
●
●
OS deployment: 30 minutes total
●
●
Total time: ~2hr 45min

Stretching the limits
●
●
●
OS deployment: ?? minutes total
●
●
Total time: Maybe 24 hours?

How to get to 1,000?
●
Physical limitations of physical disks
●
Physical limitations of network
●
Puppet limitations
●
Cobbler limitations
●
Messaging/orchestration limitations
●
Durability/patience of client applications

Approach: Scale the server!
●
Pure speed. Don't care about anything else.
●
Buy expensive system with 2 SSDs in RAID-0, 12
cores, 256GB memory, and bonded NICs
●
Peak I/O: ~800MB/s

How crowded is your network segment?
●
More than 500 nodes on one network is bad
●
Broadcast traffic will hinder normal traffic
●
One lost packet means TFTP must fail and start
over
●
Make a second network and set a DHCP relay
●
Update your PXE server's DHCP configuration

err: Could not retrieve catalog
from remote server: Connection
refused connect(2)

Puppet load
●
Catalog compile time
– 12s per node
●
Serve files: 12mb each host
●
Receive and store 500kb report in YAML format
●
Store in PuppetDB

How to avoid failure
●
IPMI control of all nodes (expensive)
●
Orchestration that can reset a host if it gets
“stuck” along the way
●
Staggered approach to avoid overload on master

How the pros do it
●
Large US bank
●
2 Puppet CA servers
●
3 Puppet catalog masters
●
DNS round robin for catalog servers
●
2000 hosts
●
Must stagger initial deployments

Conclusion
●
Not fast enough
●
Too much data
●
Still a bottleneck
●
Expensive hardware

Approach: Ditch Puppetmaster!
●
Still need to provision a base OS
●
Still need package repository
●
Still need to be fast
●
Still need to have some “brain” to identify
servers

Speed up provisioning
●
Install every nth server to serve as a provisioning
mirror all in RAM
●
TFTP still must come from master server, but 30
minutes of pain for bootstrap is okay
●
HTTP for OS installation can be balanced via
DNS round robin to each mirror
●
Provision mirror hosts last

Package repository
●
YUM repository should be located close to
cluster
●
Mirror via Cobbler/Foreman
●
Or somewhere in your organization with fast
disks

External Node Classifiers
Arbitrary script to tell nodes
what resources to install
ENC providers include:
– Puppet Dashboard
– Foreman
– Hiera
– LDAP
– Amazon CloudFormation
– YAML file carried by
pigeon

External Node Classifiers
●
What they can provide:
– Puppet master hostname
– Environment name (production, devel, stage)
– Classes to use
– Puppet facts needed for installation

Getting Puppet manifests to nodes
●
How do you place manifests on a node?
●
Without relying on one host, pick most robust
system available

●
Plain Git
– Version controlled system
– Widely implemented
– Simple to get started
– Fits into Puppet's environment structure via branches

●
Puppet Librarian
– Created by Tim “Rodjek” Sharpe from GitHub
– Flexible manifest sources
– Can specify a puppet “forge”
– Can retrieve from git repositories
– Dependency handling
– Version specification optional
– Creates a local Git repository to track changes

●
RPM format
– Technique used by Sam Bashton
– Versioned as well
– As easy to deploy as any other package
– Requires clever building process

●
RPM format magic
– Jenkins job to take GIT code with manifests
– Run puppet-lint on all puppet code
– Create tarball of puppet manifests and hiera data
– Wrap inside a package with a new version number
– Push ready package to software repository

Running local is better
●
Deploying on great new
hardware
●
Faster catalog build
●
No waiting for manifests or
uploading reports
●
No timeouts or connections
refused

What about my precious logs?!

Rsyslog
●
Scaling rsyslog requires lots of disk, but they
don't have to be fast
●
Rsyslog can throttle clients effectively
●
Clients can hold logs until server is ready to
receive
●
Everybody wins

Doing the math
Stage Before After
Bootstrap OS 10min 10min (but that's okay)
Base OS provision 8hrs (10 concurrent) 30min to set up 20 mirrors
25-40min to install (200 concurrent)
30min to install mirrors
Puppet provisioning 10d 10hr (15min x
1000 hosts, one at a
time)
45 mins for all 3 controllers, one at a
time
20 mins for compute nodes
Totals: 12 days 2-3 hours

References
● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html
● http://www.masterzen.fr/2012/01/08/benchmarking-puppet-stacks/
● http://theforeman.org/manuals/1.3/index.html#3.5.5FactsandtheENC
● https://github.com/rodjek/librarian-puppet
● http://www.slideshare.net/PuppetLabs/sam-bashton

Ref commands
puppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "/e'
Time:
....
Nova paste api ini: 0.02
Package: 0.03
Notify: 0.03
Nova config: 0.10
File: 0.40
Exec: 0.56
Service: 1.39
Augeas: 1.56
Total: 11.85
Last run: 1379522172
Config retrieval: 7.73

Matthew Mosesohn - Configuration Management at Large Companies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Matthew Mosesohn - Configuration Management at Large Companies

Similar to Matthew Mosesohn - Configuration Management at Large Companies (20)

More from Yandex

More from Yandex (20)

Recently uploaded

Recently uploaded (20)

Matthew Mosesohn - Configuration Management at Large Companies