Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate)

SCALING PUPPET
ENTERPRISE TO 5,000
NODES IN 9 MONTHS
Lesson’s learned,
and how PE makes me think of goats

WHO AM I?
• DevOps and Cloud Admin* at Te
Connectivity
• ~9 years of assorted technical
operations experience
• ~1 year of PE usage/administration
• Puppet Featured Community
Member (for most verbose
complaints by a Test Pilot 2014)
• Puppet Certiﬁed Professional 2015
(sample scores: Puppet Language
94%, Console 40%)
• Can’t be bothered to take internal
“Making compelling presentations
training”
<= LIAR =>

PE DEPLOYMENT STATS
• 5100 PE licenses
• Prod => 4157 Agents
• Dev => 72 Agents
• 871 Licenses purchased for systems of stubborn
people.
• 14 supported OS spanning 7 OS families
• Prod PE deployment consists of 11 servers.
• 1 CA / Filebucket Server
• 1 PuppetDB server (using embedded
PostgreSQL)
• 1 Puppet Console
• 4 Puppet Compile Masters
• 1 Active MQ Hub
• 3 Active MQ Brokers

THE CRUELEST LIES ARE OFTENTOLD
WHENTRYINGTO GET MANAGERSTO
BUYTHE RIGHTTOOLS
• Compliance reporting (without
remediation)
• Application code deployment
• Service discovery
• DNS?!
• Any phrase that includes “I’m
sure there is a way puppet
can…”

NO-OP (AKA MY ARCH
NEMESIS)
• No-Op is a tool, not a solution.
• No-Op != Operational Intelligence
• Pandora’s Box full of excuses not to embrace change
(see also: “brownﬁeld”, “legacy”,“near-EoL”)
• Make sure you enforce enough code to control your
agent conﬁguration…

THE FASTEST WAYTO CAUSE
4000 AGENT RUNSTO FAIL
• Custom Facter facts are
your friend, until they aren’t.
• #1 culprit for massive agent
failures is bad conﬁnes in
custom facts not tested
against enough canary
nodes.
• “It worked when I tested it,
the fact even returns the
right value”.
Important

#puppet.conf.stub
[main]
server = puppet.example.net
archive_ﬁle = true
archive_ﬁle_server = puppet.example.net
ca_server = puppet.example.net
#puppetdb.conf.stub
[main]
#console.conf.stub
[main]
Evolution of puppet.conf

#puppet.conf.stub
[main]
archive_ﬁle_server = puppet.example.net
ca_server = puppet.example.net
#puppetdb.conf.stub
[main]
server = puppetdb.example.net
#console.conf.stub
[main]
server = puppetconsole.example.net

#puppet.conf.stub
[main]
server = puppet.example.net (Now an LB)
archive_ﬁle_server = puppetfb.example.net*
ca_server = puppetca.example.net*
#puppetdb.conf.stub
[main]
server = puppetdb.example.net
#console.conf.stub
[main]
server = puppetconsole.example.net

LOAD BALANCING PITFALLS
• Do Load Balance
• Port 8140 between compile masters
• If you use connection stickiness > 30 minutes agents will never
change masters.
• Port 61613 between ActiveMQ Brokers
• Don’t Load Balance
• Puppet CA, or any cert signing requests.
• File Bucket (archive_ﬁle_server)
• ActiveMQ hub, more split brain SSL

PERFORMANCE ISSUES
(You’re looking down.)

• Sizing Recommendations Revised
• PuppetDB needs way more RAM than is recommended when
you scale. (Req 30GB, Our present 50GB, and it should be
higher)
• PostgreSQL best practices claim 3xDB size of memory for
best performance. @4000 nodes, puppetdb ~ 50GB,
consoledb ~40GB @ 3days retention.
• ConsoleDB needs pruned aggressively.  
(reports = nodes * 48 * days retention). That much  
information is not useful in the console.
• Console uses less RAM than expected. (Req 30GB, Our present
10GB)

Pain
0%
15,000%
30,000%
45,000%
60,000%
None Agent Registered Agent Runs Agent Classiﬁed
PuppetDB Puppet Console
Puppet Scaling Experience
(highly scientiﬁc data)

• @4000 nodes we use 8 dashboard workers.
• When # of nodes grows, the default page of
the console can become very sluggish.
edit /opt/puppet/share/puppet-dashboard/conﬁg/routes.rb to adjust
the route:
PuppetDashboard::Application.routes do
# root :to => 'pages#home'
root :to => 'reports#index'
CONSOLE CONFIGURATIONS

JVMTUNING
• Problem: Service stops, log show Out of Memory Exceptions.
• Heap Sizes:
• puppetserver - 4GB
• puppetdb - 1GB
• PE console - 2GB
• ActiveMQ Hub - 1.5GB
• ActiveMQ Broker - 1GB
• PuppetDB (server component) has been a JVM for a while, so
most GC actions can be tuned as Puppet Params

GREAT WISDOMS AND
PERSISTING PAINS

• Use R10K. Use Puppetﬁle. Use Roles and Proﬁles.
• Learn what nanlui/staging does. Then use it.
• exec { ‘horrible_idea’:  
cmd => ‘dostuff.sh && touch /tmp/didstuff.proof’,  
creates => ‘/tmp/didstuff.proof’,  
}
• PuppetLabs, myself, and most of our profession are absolutely terrible at naming things.
• Problem: 
(‘Environment’ && ‘Deployment’ && ‘Tier’ && ‘Branches’ && ‘Forks’) => [‘Production’,
‘Dev’, ‘QA’]
• Result: 
cats.all? { cats.content[:name] == ‘Selso’ } => true
• Proxy Servers are evil. Spaceship Operators have a cool name.
• Problem: universally_respected_proxy_variables.exists? => false
• Solution: Use site.pp + Resource Collection to set top level resource defaults.
The “read this later” slide

“IF I HAVE SEEN FURTHER IT IS BY STANDING ON
YE SHOULDERS OF GIANTS” ~ ISAAC NEWTON
Resources that have gotten me by:
• https://docs.puppetlabs.com/
references/latest/type.html
• Puppet Types and Providers by
Dan Bode and Nan Liu
• Puppet Practitioner’s Training
• Gary Larizza’s Blog (aka nsfw
missing puppet documentation)
• PuppetLabs Support
• Puppet Professional Services
And Most importantly
• A healthy mixture of ambition,
stubbornness and stupidity.

QUESTIONS?
@pwattstbd
github.com/Marsupermammal
pwatts217@gmail.com

Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate)

Similaire à Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate) (20)

Plus de Puppet

Plus de Puppet (20)

Dernier

Dernier (20)

Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate)