PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

Keeping an Eye on the PE Stack
An Introduction to Measuring and Tuning PE Performance
Charlie Sharpsteen, Puppet Inc.

Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Overview
• How do I measure PE performance? What sources of
data are available?
• What numbers are actually important?
• What settings can I adjust when important metrics
start showing unhealthy trends?
2

3
Gathering Data
From PE Services
JVM Logging and Metrics

PE Server Components
TrapperKeeper JVM
Puppet Server
PuppetDB
Console Services
Orchestration Services
JVM
ActiveMQ
Other
PostgreSQL
NGINX
Mostly Java based with shared logging and metrics interfaces.
4

TrapperKeeper Logging
• Configuration for main logs can be found in: 
/etc/puppetlabs/<service name>/logback.xml
• Controls output destinations, log levels and message formatting.
• Ship to a log aggregator to provide context for investigations.
• Default log pattern is: 
Date Level [Java Namespace] message
• Puppet Server also includes thread ID: 
Date Level [thread] [Java Namespace] message
• Thread ID is useful for grouping activity related to a single request.
5

TrapperKeeper Logging
• Configuration for main logs can be found in: 
/etc/puppetlabs/<service name>/request-logging.xml
• Default format is Apache Combined Log + request duration
• Easily parsed by most log processors.
• Can add additional bits of information such as request headers. 
 
6

TrapperKeeper Metrics
• Metrics are recorded using JMX MBeans.
• Metrics that measure activity over time are weighted to represent the last 5 minutes.
• Metrics can be retrieved via the JMX protocol.
• Full access to all available metrics and all available measurements.
• Can attach tools such as JConsole and JVisualVM.
• Requires additional ports to be opened, configuration can be complex. Java tools only.
• Metrics can be retrieved as JSON over HTTP:
• For a curated set of common metrics: status/v1?level=debug
• For access to all available metrics: metrics/v1/mbeans
7

TrapperKeeper Configuration
• Configuration files are stored under: 
/etc/puppetlabs/<service name>/conf.d
• Most important settings are managed by puppet_enterprise::profile classes and are
tunable via the Console and Hiera.
• JVM settings are specified in /etc/sysconfig or /etc/default
• JVM memory limit, -Xmx is the primary tunable setting. Enable the G1 garbage
collector when using limits higher than 10 GB: -XX:+UseG1GC
• These flags are configurable via the java_args parameter on profile classes.
8

Puppet Server
It’s all about the JRubies.
9

Puppet Server Metrics Overview
● JVM resource usage: status-service
● JMX namespace: java.lang:*
● HTTP request times per endpoint: pe-master
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.http.*
● Catalog Compilation metrics: pe-puppet-profiler
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.compiler.* 
puppetserver:name=puppetlabs.<fqdn>.functions.* 
puppetserver:name=puppetlabs.<fqdn>.puppetdb.*
● JRuby Metrics: pe-jruby-metrics
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.jruby.*
10

New PE 2016.4.0 Features
● The metrics/v1/mbeans endpoint has been added to Puppet Server. Must be enabled via Hiera: 
 
puppet_enterprise::master::puppetserver::metrics_webservice_enabled: true
● The Graphite metrics reporter has been optimized and extended:
● Only a subset of available metrics are reported by default.
● Reported metrics can be customized using the metrics_puppetserver_metrics_allowed
parameter of the puppet_enterprise::profile::master class.
11

JRuby Metrics
● Almost all Puppet Server requests must be handled by a JRuby instance — this makes JRuby
availability the primary performance bottleneck.
● num-free-jrubies
● Measures spare capacity for incoming requests.
● average-wait-time
● Should never grow to a significant fraction of HTTP request times.
● Impacted by agent checkin distribution, resource availability, Puppet plugins and code.
12

Agent Checkin Activity
● Agents will check in runinterval after starting their last run — this can lead to pile-ups or
“thundering herds”. Be careful of:
● Starting or re-starting a group of agents without the splay setting enabled.
● Triggering a group of agent runs via: mco puppet runonce
● Monitor average-requested-jrubies and Puppet Server access logs for spikes in agent activity.
● Use PostgreSQL to pull a histogram of Agent start times from report data: 
 
sudo su - pe-postgres -s /bin/bash -c "psql -d pe-puppetdb" 
SELECT date_part('minute', start_time), count(*) 
FROM reports 
WHERE start_time BETWEEN '2016-10-20 13:30:00' AND '2015-10-20 14:30:00' 
GROUP BY date_part('minute', start_time) 
ORDER BY date_part('minute', start_time) ASC;
13

Re-balancing Agent Checkins
● Use MCollective to orchestrate a batched re-start: 
 
su - peadmin -c "mco rpc service stop service=puppet" 
su - peadmin -c "mco rpc service start service=puppet --batch 1  
--batch-sleep <runinterval in seconds / #nodes>”
● Batching is not necessary if the agents have splay enabled.
● For a stable distribution that isn’t affected by re-starts, puppet agent -t can be run on a schedule
determined by the fqdn_rand() function instead of using the service.
● Load due to agent activity can be cut dramatically by shifting to the Direct Puppet workflow where
Orchestrator or MCollective are used to push catalog updates.
14

Adding More JRuby Capacity
● JRuby count is set via jruby_max_active_instances, constrained by available CPU and RAM:
● Compile masters tend to top out around NCPU - 1. Monolithic masters need to share with
PuppetDB and tend more towards (NCPU / 2 - 1).
● RAM requirements are 512 MB per JRuby, but may need to be increased if catalog compilation
uses large datasets or dozens of environments are in use.
● The environment_timeout setting can be used to reduce the CPU requirements of catalog
compilation. Set to 0 globally and unlimited for long-lived environments with lots of agents.
● Each environment using an unlimited timeout will add to the per-JRuby RAM requirements. 
Monitor memory usage of pre-2016.4.0 installations closely when using unlimited timeouts.
● Code Manager should be enabled when an unlimited timeout is used so that caches are flushed
when new code is deployed.
15

Investigating Compile Times
● PE Puppet Server tracks compilation time on several different levels: per-node, per-environment, per-
resource, per-function, and more.
● Top 10 resources and functions are available via the status API and Puppet Server performance
dashboard: 
 
https://<puppetmaster>:8140/puppet/experimental/dashboard.html
● Full access available through JMX and the metrics API.
● Detailed timing on catalog compilation can be obtained by setting the Puppet Server log level to
DEBUG and running puppet agent -t --profile on nodes of interest.
16

Investigating Agent Run Times
● Agent run summaries are stored at: 
 
/opt/puppetlabs/puppet/cache/state/last_run_summary.yaml
● Summaries are also stored by PuppetDB and can be viewed from the PE Console, or queried: 
 
reports[metrics] { 
latest_report? = true and certname = '<node name>'  
}
● The time section shows amount of time taken per resource type along with config_retrieval
measuring the amount of time it took to receive a catalog.
● Per-resource timing can be logged by running: puppet agent -t --evaltrace
17

PuppetDB
Processing Time and Storage Space
18

PuppetDB Storage Usage
● Monitor disk space! 
/opt/puppetlabs/server/data/postgresql/ 
/opt/puppetlabs/server/data/puppetdb/
● If disk space runs out, there are two options for returning space to the operating system:
● The existing volume can be enlarged so that a VACUUM FULL can be run.
● Alternately, a new volume can be attached for a database backup and restore.
● The primary source of disk usage is report storage, this can be tuned by setting: report-ttl
● For infrastructure with high node turnover, consider setting node-purge-ttl to remove data related
to decommissioned nodes.
19

PuppetDB Command Processing
● Every PuppetDB operation, aside from queries, is executed by an asynchronous command
processing queue. This queue is managed by an internal ActiveMQ server: 
 
org.apache.activemq:type=Broker,brokerName=localhost, 
destinationType=Queue,destinationName=puppetlabs.puppetdb.commands
● Important metrics:
● Backlog of commands waiting for processing: QueueSize
● Largest command seen: MaxMessageSize
● Available memory for in-flight commands: MemoryPercentUsage
● Increase PuppetDB heap size along with the command-processing.memory-usage setting if the
percentage spikes close to 100%. This will prevent ActiveMQ from paging commands to disk.
20

PuppetDB Command Processing
● Command processing rates: 
 
puppetlabs.puppetdb.mq:name=global.processing-time 
 
puppetlabs.puppetdb.storage:name=replace-facts-time 
puppetlabs.puppetdb.storage:name=replace-catalog-time 
puppetlabs.puppetdb.storage:name=store-report-time
● Additional processing threads can be added using the command-processing.threads setting.
● On a monolithic install, PuppetDB processing threads must be balanced against Puppet Server
JRubies and the number of CPU cores available. 
21

PostgreSQL Query Performance
● PostgreSQL configuration can be found in: 
 
/opt/puppetlabs/server/data/postgresql/9.4/data/postgresql.conf
● Add settings to improve logging around slow queries: 
 
log_min_duration_statement = 3000ms 
log_temp_files = 0
● If a temp file shows up in the logs, that means Postgres had to perform an operation outside of
RAM; which is slow. Consider increasing the work_mem setting to be greater than the size of the
temp files used.
● If query performance has been dropping over time, a database VACCUM may be needed: 
 
su - pe-postgres -s /bin/bash -c "vacuumdb --analyze --verbose --all"
22

Resources
This Slide Deck: https://goo.gl/ytzCA5
23

Resources
Logging:
• Directing Output: http://logback.qos.ch/manual/appenders.html
• Formatting Main Logs: http://logback.qos.ch/manual/layouts.html
• Formatting Access Logs: http://logback.qos.ch/manual/layouts.html#logback-access
JMX:
• Configuration: 
https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
• Metric Polling Tool: https://github.com/jmxtrans/jmxtrans
24

Resources
Puppet Server:
• Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html
• Configuration Reference: https://docs.puppet.com/puppetserver/2.6/configuration.html
• Direct Puppet Workflow: https://docs.puppet.compe/2016.4/direct_puppet_workflow.html
PuppetDB:
• Metrics Reference: https://docs.puppet.com/puppetdb/4.2/api/metrics/v1/mbeans.html
• Configuration Reference: https://docs.puppet.com/puppetdb/4.2/configure.html
• Backup Procedures: https://docs.puppet.com/pe/2016.4/maintain_console-db.html
• PostgreSQL Maintenance: https://github.com/npwalker/pe_databases
25

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

Similaire à PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet (20)

Plus de Puppet

Plus de Puppet (20)

Dernier

Dernier (20)

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet