SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Keeping an Eye on the PE Stack
An Introduction to Measuring and Tuning PE Performance
Charlie Sharpsteen, Puppet Inc.
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Overview
• How do I measure PE performance? What sources of
data are available?
• What numbers are actually important?
• What settings can I adjust when important metrics
start showing unhealthy trends?
2
3
Gathering Data
From PE Services
JVM Logging and Metrics
PE Server Components
TrapperKeeper JVM
Puppet Server
PuppetDB
Console Services
Orchestration Services
JVM
ActiveMQ
Other
PostgreSQL
NGINX
Mostly Java based with shared logging and metrics interfaces.
4
TrapperKeeper Logging
• Configuration for main logs can be found in:

/etc/puppetlabs/<service name>/logback.xml
• Controls output destinations, log levels and message formatting.
• Ship to a log aggregator to provide context for investigations.
• Default log pattern is:

Date Level [Java Namespace] message
• Puppet Server also includes thread ID:

Date Level [thread] [Java Namespace] message
• Thread ID is useful for grouping activity related to a single request.
5
TrapperKeeper Logging
• Configuration for main logs can be found in:

/etc/puppetlabs/<service name>/request-logging.xml
• Default format is Apache Combined Log + request duration
• Easily parsed by most log processors.
• Can add additional bits of information such as request headers.



6
TrapperKeeper Metrics
• Metrics are recorded using JMX MBeans.
• Metrics that measure activity over time are weighted to represent the last 5 minutes.
• Metrics can be retrieved via the JMX protocol.
• Full access to all available metrics and all available measurements.
• Can attach tools such as JConsole and JVisualVM.
• Requires additional ports to be opened, configuration can be complex. Java tools only.
• Metrics can be retrieved as JSON over HTTP:
• For a curated set of common metrics: status/v1?level=debug
• For access to all available metrics: metrics/v1/mbeans
7
TrapperKeeper Configuration
• Configuration files are stored under:

/etc/puppetlabs/<service name>/conf.d
• Most important settings are managed by puppet_enterprise::profile classes and are
tunable via the Console and Hiera.
• JVM settings are specified in /etc/sysconfig or /etc/default
• JVM memory limit, -Xmx is the primary tunable setting. Enable the G1 garbage
collector when using limits higher than 10 GB: -XX:+UseG1GC
• These flags are configurable via the java_args parameter on profile classes.
8
Puppet Server
It’s all about the JRubies.
9
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Puppet Server Metrics Overview
● JVM resource usage: status-service
● JMX namespace: java.lang:*
● HTTP request times per endpoint: pe-master
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.http.*
● Catalog Compilation metrics: pe-puppet-profiler
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.compiler.*

puppetserver:name=puppetlabs.<fqdn>.functions.*

puppetserver:name=puppetlabs.<fqdn>.puppetdb.*
● JRuby Metrics: pe-jruby-metrics
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.jruby.*
10
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
New PE 2016.4.0 Features
● The metrics/v1/mbeans endpoint has been added to Puppet Server. Must be enabled via Hiera:



puppet_enterprise::master::puppetserver::metrics_webservice_enabled: true
● The Graphite metrics reporter has been optimized and extended:
● Only a subset of available metrics are reported by default.
● Reported metrics can be customized using the metrics_puppetserver_metrics_allowed
parameter of the puppet_enterprise::profile::master class.
11
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
JRuby Metrics
● Almost all Puppet Server requests must be handled by a JRuby instance — this makes JRuby
availability the primary performance bottleneck.
● num-free-jrubies
● Measures spare capacity for incoming requests.
● average-wait-time
● Should never grow to a significant fraction of HTTP request times.
● Impacted by agent checkin distribution, resource availability, Puppet plugins and code.
12
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Agent Checkin Activity
● Agents will check in runinterval after starting their last run — this can lead to pile-ups or
“thundering herds”. Be careful of:
● Starting or re-starting a group of agents without the splay setting enabled.
● Triggering a group of agent runs via: mco puppet runonce
● Monitor average-requested-jrubies and Puppet Server access logs for spikes in agent activity.
● Use PostgreSQL to pull a histogram of Agent start times from report data:



sudo su - pe-postgres -s /bin/bash -c "psql -d pe-puppetdb"

SELECT date_part('minute', start_time), count(*)

FROM reports

WHERE start_time BETWEEN '2016-10-20 13:30:00' AND '2015-10-20 14:30:00'

GROUP BY date_part('minute', start_time)

ORDER BY date_part('minute', start_time) ASC;
13
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Re-balancing Agent Checkins
● Use MCollective to orchestrate a batched re-start:



su - peadmin -c "mco rpc service stop service=puppet"

su - peadmin -c "mco rpc service start service=puppet --batch 1 

--batch-sleep <runinterval in seconds / #nodes>”
● Batching is not necessary if the agents have splay enabled.
● For a stable distribution that isn’t affected by re-starts, puppet agent -t can be run on a schedule
determined by the fqdn_rand() function instead of using the service.
● Load due to agent activity can be cut dramatically by shifting to the Direct Puppet workflow where
Orchestrator or MCollective are used to push catalog updates.
14
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Adding More JRuby Capacity
● JRuby count is set via jruby_max_active_instances, constrained by available CPU and RAM:
● Compile masters tend to top out around NCPU - 1. Monolithic masters need to share with
PuppetDB and tend more towards (NCPU / 2 - 1).
● RAM requirements are 512 MB per JRuby, but may need to be increased if catalog compilation
uses large datasets or dozens of environments are in use.
● The environment_timeout setting can be used to reduce the CPU requirements of catalog
compilation. Set to 0 globally and unlimited for long-lived environments with lots of agents.
● Each environment using an unlimited timeout will add to the per-JRuby RAM requirements.

Monitor memory usage of pre-2016.4.0 installations closely when using unlimited timeouts.
● Code Manager should be enabled when an unlimited timeout is used so that caches are flushed
when new code is deployed.
15
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Compile Times
● PE Puppet Server tracks compilation time on several different levels: per-node, per-environment, per-
resource, per-function, and more.
● Top 10 resources and functions are available via the status API and Puppet Server performance
dashboard:



https://<puppetmaster>:8140/puppet/experimental/dashboard.html
● Full access available through JMX and the metrics API.
● Detailed timing on catalog compilation can be obtained by setting the Puppet Server log level to
DEBUG and running puppet agent -t --profile on nodes of interest.
16
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Agent Run Times
● Agent run summaries are stored at:



/opt/puppetlabs/puppet/cache/state/last_run_summary.yaml
● Summaries are also stored by PuppetDB and can be viewed from the PE Console, or queried:



reports[metrics] {

latest_report? = true and certname = '<node name>' 

}
● The time section shows amount of time taken per resource type along with config_retrieval
measuring the amount of time it took to receive a catalog.
● Per-resource timing can be logged by running: puppet agent -t --evaltrace
17
PuppetDB
Processing Time and Storage Space
18
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Storage Usage
● Monitor disk space!

/opt/puppetlabs/server/data/postgresql/

/opt/puppetlabs/server/data/puppetdb/
● If disk space runs out, there are two options for returning space to the operating system:
● The existing volume can be enlarged so that a VACUUM FULL can be run.
● Alternately, a new volume can be attached for a database backup and restore.
● The primary source of disk usage is report storage, this can be tuned by setting: report-ttl
● For infrastructure with high node turnover, consider setting node-purge-ttl to remove data related
to decommissioned nodes.
19
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Processing
● Every PuppetDB operation, aside from queries, is executed by an asynchronous command
processing queue. This queue is managed by an internal ActiveMQ server:



org.apache.activemq:type=Broker,brokerName=localhost,

destinationType=Queue,destinationName=puppetlabs.puppetdb.commands
● Important metrics:
● Backlog of commands waiting for processing: QueueSize
● Largest command seen: MaxMessageSize
● Available memory for in-flight commands: MemoryPercentUsage
● Increase PuppetDB heap size along with the command-processing.memory-usage setting if the
percentage spikes close to 100%. This will prevent ActiveMQ from paging commands to disk.
20
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Processing
● Command processing rates:



puppetlabs.puppetdb.mq:name=global.processing-time



puppetlabs.puppetdb.storage:name=replace-facts-time

puppetlabs.puppetdb.storage:name=replace-catalog-time

puppetlabs.puppetdb.storage:name=store-report-time
● Additional processing threads can be added using the command-processing.threads setting.
● On a monolithic install, PuppetDB processing threads must be balanced against Puppet Server
JRubies and the number of CPU cores available.

21
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PostgreSQL Query Performance
● PostgreSQL configuration can be found in:



/opt/puppetlabs/server/data/postgresql/9.4/data/postgresql.conf
● Add settings to improve logging around slow queries:



log_min_duration_statement = 3000ms

log_temp_files = 0
● If a temp file shows up in the logs, that means Postgres had to perform an operation outside of
RAM; which is slow. Consider increasing the work_mem setting to be greater than the size of the
temp files used.
● If query performance has been dropping over time, a database VACCUM may be needed:



su - pe-postgres -s /bin/bash -c "vacuumdb --analyze --verbose --all"
22
Resources
This Slide Deck: https://goo.gl/ytzCA5
23
Resources
Logging:
• Directing Output: http://logback.qos.ch/manual/appenders.html
• Formatting Main Logs: http://logback.qos.ch/manual/layouts.html
• Formatting Access Logs: http://logback.qos.ch/manual/layouts.html#logback-access
JMX:
• Configuration:

https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
• Metric Polling Tool: https://github.com/jmxtrans/jmxtrans
24
Resources
Puppet Server:
• Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html
• Configuration Reference: https://docs.puppet.com/puppetserver/2.6/configuration.html
• Direct Puppet Workflow: https://docs.puppet.compe/2016.4/direct_puppet_workflow.html
PuppetDB:
• Metrics Reference: https://docs.puppet.com/puppetdb/4.2/api/metrics/v1/mbeans.html
• Configuration Reference: https://docs.puppet.com/puppetdb/4.2/configure.html
• Backup Procedures: https://docs.puppet.com/pe/2016.4/maintain_console-db.html
• PostgreSQL Maintenance: https://github.com/npwalker/pe_databases
25
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

Contenu connexe

Tendances

OpenStack Heat slides
OpenStack Heat slidesOpenStack Heat slides
OpenStack Heat slides
dbelova
 

Tendances (20)

Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
 
Terraform
TerraformTerraform
Terraform
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Optimized Hive replication
Optimized Hive replicationOptimized Hive replication
Optimized Hive replication
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
OpenStack Heat slides
OpenStack Heat slidesOpenStack Heat slides
OpenStack Heat slides
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
StackiFest16: What's Next in Stacki - Mason Katz
StackiFest16: What's Next in Stacki - Mason Katz StackiFest16: What's Next in Stacki - Mason Katz
StackiFest16: What's Next in Stacki - Mason Katz
 
DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
 
OpenStack Heat
OpenStack HeatOpenStack Heat
OpenStack Heat
 
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
 
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg BrunoStackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
 
London HUG 8/3 - Nomad
London HUG 8/3 - NomadLondon HUG 8/3 - Nomad
London HUG 8/3 - Nomad
 
DevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with ChefDevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with Chef
 

En vedette

En vedette (20)

Canadian Cyber Cecurity
Canadian Cyber CecurityCanadian Cyber Cecurity
Canadian Cyber Cecurity
 
PuppetConf 2016: Puppet 4.x: The Low WAT-tage Edition – Nick Fagerlund, Puppet
PuppetConf 2016: Puppet 4.x: The Low WAT-tage Edition – Nick Fagerlund, PuppetPuppetConf 2016: Puppet 4.x: The Low WAT-tage Edition – Nick Fagerlund, Puppet
PuppetConf 2016: Puppet 4.x: The Low WAT-tage Edition – Nick Fagerlund, Puppet
 
PuppetConf 2016: Easily Manage Software on Windows with Chocolatey – Rob Reyn...
PuppetConf 2016: Easily Manage Software on Windows with Chocolatey – Rob Reyn...PuppetConf 2016: Easily Manage Software on Windows with Chocolatey – Rob Reyn...
PuppetConf 2016: Easily Manage Software on Windows with Chocolatey – Rob Reyn...
 
PuppetConf 2016: Testing and Delivering Puppet – Michael Stahnke, Puppet
PuppetConf 2016: Testing and Delivering Puppet – Michael Stahnke, PuppetPuppetConf 2016: Testing and Delivering Puppet – Michael Stahnke, Puppet
PuppetConf 2016: Testing and Delivering Puppet – Michael Stahnke, Puppet
 
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
 
Puppet at GitHub
Puppet at GitHubPuppet at GitHub
Puppet at GitHub
 
Configuration Changes Don't Have to be Scary: Testing with containers
Configuration Changes Don't Have to be Scary: Testing with containersConfiguration Changes Don't Have to be Scary: Testing with containers
Configuration Changes Don't Have to be Scary: Testing with containers
 
PuppetConf 2016: Heresy in the Church of Docker – Corey Quinn, The Quinn Adv...
PuppetConf 2016:  Heresy in the Church of Docker – Corey Quinn, The Quinn Adv...PuppetConf 2016:  Heresy in the Church of Docker – Corey Quinn, The Quinn Adv...
PuppetConf 2016: Heresy in the Church of Docker – Corey Quinn, The Quinn Adv...
 
Plugging Chocolatey into your Puppet Infrastructure PuppetConf2014
Plugging Chocolatey into your Puppet Infrastructure PuppetConf2014Plugging Chocolatey into your Puppet Infrastructure PuppetConf2014
Plugging Chocolatey into your Puppet Infrastructure PuppetConf2014
 
Paasta: Application Delivery at Yelp
Paasta: Application Delivery at YelpPaasta: Application Delivery at Yelp
Paasta: Application Delivery at Yelp
 
Introducion to Puppet Enterprise
Introducion to Puppet EnterpriseIntroducion to Puppet Enterprise
Introducion to Puppet Enterprise
 
PuppetConf 2016: Delivering Premium Quality Modules: Using Beaker and VMpoole...
PuppetConf 2016: Delivering Premium Quality Modules: Using Beaker and VMpoole...PuppetConf 2016: Delivering Premium Quality Modules: Using Beaker and VMpoole...
PuppetConf 2016: Delivering Premium Quality Modules: Using Beaker and VMpoole...
 
PuppetConf 2016: Puppet and UCS: Policy-Based Management All the Way Down – C...
PuppetConf 2016: Puppet and UCS: Policy-Based Management All the Way Down – C...PuppetConf 2016: Puppet and UCS: Policy-Based Management All the Way Down – C...
PuppetConf 2016: Puppet and UCS: Policy-Based Management All the Way Down – C...
 
PuppetConf 2016: The Long, Twisty Road to Automation: Implementing Puppet at ...
PuppetConf 2016: The Long, Twisty Road to Automation: Implementing Puppet at ...PuppetConf 2016: The Long, Twisty Road to Automation: Implementing Puppet at ...
PuppetConf 2016: The Long, Twisty Road to Automation: Implementing Puppet at ...
 
Pro Puppet
Pro PuppetPro Puppet
Pro Puppet
 
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
 
PuppetConf 2016: High Availability for Puppet – Russ Mull & Zack Smith, Puppet
PuppetConf 2016: High Availability for Puppet – Russ Mull & Zack Smith, PuppetPuppetConf 2016: High Availability for Puppet – Russ Mull & Zack Smith, Puppet
PuppetConf 2016: High Availability for Puppet – Russ Mull & Zack Smith, Puppet
 
PuppetConf. 2016: Puppet Best Practices: Roles & Profiles – Gary Larizza, Puppet
PuppetConf. 2016: Puppet Best Practices: Roles & Profiles – Gary Larizza, PuppetPuppetConf. 2016: Puppet Best Practices: Roles & Profiles – Gary Larizza, Puppet
PuppetConf. 2016: Puppet Best Practices: Roles & Profiles – Gary Larizza, Puppet
 
PuppetConf 2016: Successful Puppet Implementation in Large Organizations – Ja...
PuppetConf 2016: Successful Puppet Implementation in Large Organizations – Ja...PuppetConf 2016: Successful Puppet Implementation in Large Organizations – Ja...
PuppetConf 2016: Successful Puppet Implementation in Large Organizations – Ja...
 
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
 

Similaire à PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

ProstgreSQLFailoverConfiguration
ProstgreSQLFailoverConfigurationProstgreSQLFailoverConfiguration
ProstgreSQLFailoverConfiguration
Suyog Shirgaonkar
 

Similaire à PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet (20)

How To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in UbuntuHow To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in Ubuntu
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
Administration and Management with UltraESB
Administration and Management with UltraESBAdministration and Management with UltraESB
Administration and Management with UltraESB
 
Graphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagiosGraphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagios
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
 
♨️CPU limitation per Oracle database instance
♨️CPU limitation per Oracle database instance♨️CPU limitation per Oracle database instance
♨️CPU limitation per Oracle database instance
 
The Essential postgresql.conf
The Essential postgresql.confThe Essential postgresql.conf
The Essential postgresql.conf
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
How to Replicate PostgreSQL Database
How to Replicate PostgreSQL DatabaseHow to Replicate PostgreSQL Database
How to Replicate PostgreSQL Database
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
ProstgreSQLFailoverConfiguration
ProstgreSQLFailoverConfigurationProstgreSQLFailoverConfiguration
ProstgreSQLFailoverConfiguration
 

Plus de Puppet

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepo
Puppet
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)
Puppet
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
Puppet
 

Plus de Puppet (20)

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepo
 
Puppetcamp r10kyaml
Puppetcamp r10kyamlPuppetcamp r10kyaml
Puppetcamp r10kyaml
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)
 
Puppet camp vscode
Puppet camp vscodePuppet camp vscode
Puppet camp vscode
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twenties
 
Applying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codeApplying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance code
 
KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approach
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliance
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden Windows
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael Pinson
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User Group
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

  • 1. Keeping an Eye on the PE Stack An Introduction to Measuring and Tuning PE Performance Charlie Sharpsteen, Puppet Inc.
  • 2. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Overview • How do I measure PE performance? What sources of data are available? • What numbers are actually important? • What settings can I adjust when important metrics start showing unhealthy trends? 2
  • 3. 3 Gathering Data From PE Services JVM Logging and Metrics
  • 4. PE Server Components TrapperKeeper JVM Puppet Server PuppetDB Console Services Orchestration Services JVM ActiveMQ Other PostgreSQL NGINX Mostly Java based with shared logging and metrics interfaces. 4
  • 5. TrapperKeeper Logging • Configuration for main logs can be found in:
 /etc/puppetlabs/<service name>/logback.xml • Controls output destinations, log levels and message formatting. • Ship to a log aggregator to provide context for investigations. • Default log pattern is:
 Date Level [Java Namespace] message • Puppet Server also includes thread ID:
 Date Level [thread] [Java Namespace] message • Thread ID is useful for grouping activity related to a single request. 5
  • 6. TrapperKeeper Logging • Configuration for main logs can be found in:
 /etc/puppetlabs/<service name>/request-logging.xml • Default format is Apache Combined Log + request duration • Easily parsed by most log processors. • Can add additional bits of information such as request headers.
 
 6
  • 7. TrapperKeeper Metrics • Metrics are recorded using JMX MBeans. • Metrics that measure activity over time are weighted to represent the last 5 minutes. • Metrics can be retrieved via the JMX protocol. • Full access to all available metrics and all available measurements. • Can attach tools such as JConsole and JVisualVM. • Requires additional ports to be opened, configuration can be complex. Java tools only. • Metrics can be retrieved as JSON over HTTP: • For a curated set of common metrics: status/v1?level=debug • For access to all available metrics: metrics/v1/mbeans 7
  • 8. TrapperKeeper Configuration • Configuration files are stored under:
 /etc/puppetlabs/<service name>/conf.d • Most important settings are managed by puppet_enterprise::profile classes and are tunable via the Console and Hiera. • JVM settings are specified in /etc/sysconfig or /etc/default • JVM memory limit, -Xmx is the primary tunable setting. Enable the G1 garbage collector when using limits higher than 10 GB: -XX:+UseG1GC • These flags are configurable via the java_args parameter on profile classes. 8
  • 9. Puppet Server It’s all about the JRubies. 9
  • 10. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Puppet Server Metrics Overview ● JVM resource usage: status-service ● JMX namespace: java.lang:* ● HTTP request times per endpoint: pe-master ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.http.* ● Catalog Compilation metrics: pe-puppet-profiler ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.compiler.*
 puppetserver:name=puppetlabs.<fqdn>.functions.*
 puppetserver:name=puppetlabs.<fqdn>.puppetdb.* ● JRuby Metrics: pe-jruby-metrics ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.jruby.* 10
  • 11. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All New PE 2016.4.0 Features ● The metrics/v1/mbeans endpoint has been added to Puppet Server. Must be enabled via Hiera:
 
 puppet_enterprise::master::puppetserver::metrics_webservice_enabled: true ● The Graphite metrics reporter has been optimized and extended: ● Only a subset of available metrics are reported by default. ● Reported metrics can be customized using the metrics_puppetserver_metrics_allowed parameter of the puppet_enterprise::profile::master class. 11
  • 12. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All JRuby Metrics ● Almost all Puppet Server requests must be handled by a JRuby instance — this makes JRuby availability the primary performance bottleneck. ● num-free-jrubies ● Measures spare capacity for incoming requests. ● average-wait-time ● Should never grow to a significant fraction of HTTP request times. ● Impacted by agent checkin distribution, resource availability, Puppet plugins and code. 12
  • 13. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Agent Checkin Activity ● Agents will check in runinterval after starting their last run — this can lead to pile-ups or “thundering herds”. Be careful of: ● Starting or re-starting a group of agents without the splay setting enabled. ● Triggering a group of agent runs via: mco puppet runonce ● Monitor average-requested-jrubies and Puppet Server access logs for spikes in agent activity. ● Use PostgreSQL to pull a histogram of Agent start times from report data:
 
 sudo su - pe-postgres -s /bin/bash -c "psql -d pe-puppetdb"
 SELECT date_part('minute', start_time), count(*)
 FROM reports
 WHERE start_time BETWEEN '2016-10-20 13:30:00' AND '2015-10-20 14:30:00'
 GROUP BY date_part('minute', start_time)
 ORDER BY date_part('minute', start_time) ASC; 13
  • 14. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Re-balancing Agent Checkins ● Use MCollective to orchestrate a batched re-start:
 
 su - peadmin -c "mco rpc service stop service=puppet"
 su - peadmin -c "mco rpc service start service=puppet --batch 1 
 --batch-sleep <runinterval in seconds / #nodes>” ● Batching is not necessary if the agents have splay enabled. ● For a stable distribution that isn’t affected by re-starts, puppet agent -t can be run on a schedule determined by the fqdn_rand() function instead of using the service. ● Load due to agent activity can be cut dramatically by shifting to the Direct Puppet workflow where Orchestrator or MCollective are used to push catalog updates. 14
  • 15. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Adding More JRuby Capacity ● JRuby count is set via jruby_max_active_instances, constrained by available CPU and RAM: ● Compile masters tend to top out around NCPU - 1. Monolithic masters need to share with PuppetDB and tend more towards (NCPU / 2 - 1). ● RAM requirements are 512 MB per JRuby, but may need to be increased if catalog compilation uses large datasets or dozens of environments are in use. ● The environment_timeout setting can be used to reduce the CPU requirements of catalog compilation. Set to 0 globally and unlimited for long-lived environments with lots of agents. ● Each environment using an unlimited timeout will add to the per-JRuby RAM requirements.
 Monitor memory usage of pre-2016.4.0 installations closely when using unlimited timeouts. ● Code Manager should be enabled when an unlimited timeout is used so that caches are flushed when new code is deployed. 15
  • 16. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Investigating Compile Times ● PE Puppet Server tracks compilation time on several different levels: per-node, per-environment, per- resource, per-function, and more. ● Top 10 resources and functions are available via the status API and Puppet Server performance dashboard:
 
 https://<puppetmaster>:8140/puppet/experimental/dashboard.html ● Full access available through JMX and the metrics API. ● Detailed timing on catalog compilation can be obtained by setting the Puppet Server log level to DEBUG and running puppet agent -t --profile on nodes of interest. 16
  • 17. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Investigating Agent Run Times ● Agent run summaries are stored at:
 
 /opt/puppetlabs/puppet/cache/state/last_run_summary.yaml ● Summaries are also stored by PuppetDB and can be viewed from the PE Console, or queried:
 
 reports[metrics] {
 latest_report? = true and certname = '<node name>' 
 } ● The time section shows amount of time taken per resource type along with config_retrieval measuring the amount of time it took to receive a catalog. ● Per-resource timing can be logged by running: puppet agent -t --evaltrace 17
  • 18. PuppetDB Processing Time and Storage Space 18
  • 19. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Storage Usage ● Monitor disk space!
 /opt/puppetlabs/server/data/postgresql/
 /opt/puppetlabs/server/data/puppetdb/ ● If disk space runs out, there are two options for returning space to the operating system: ● The existing volume can be enlarged so that a VACUUM FULL can be run. ● Alternately, a new volume can be attached for a database backup and restore. ● The primary source of disk usage is report storage, this can be tuned by setting: report-ttl ● For infrastructure with high node turnover, consider setting node-purge-ttl to remove data related to decommissioned nodes. 19
  • 20. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Command Processing ● Every PuppetDB operation, aside from queries, is executed by an asynchronous command processing queue. This queue is managed by an internal ActiveMQ server:
 
 org.apache.activemq:type=Broker,brokerName=localhost,
 destinationType=Queue,destinationName=puppetlabs.puppetdb.commands ● Important metrics: ● Backlog of commands waiting for processing: QueueSize ● Largest command seen: MaxMessageSize ● Available memory for in-flight commands: MemoryPercentUsage ● Increase PuppetDB heap size along with the command-processing.memory-usage setting if the percentage spikes close to 100%. This will prevent ActiveMQ from paging commands to disk. 20
  • 21. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Command Processing ● Command processing rates:
 
 puppetlabs.puppetdb.mq:name=global.processing-time
 
 puppetlabs.puppetdb.storage:name=replace-facts-time
 puppetlabs.puppetdb.storage:name=replace-catalog-time
 puppetlabs.puppetdb.storage:name=store-report-time ● Additional processing threads can be added using the command-processing.threads setting. ● On a monolithic install, PuppetDB processing threads must be balanced against Puppet Server JRubies and the number of CPU cores available.
 21
  • 22. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PostgreSQL Query Performance ● PostgreSQL configuration can be found in:
 
 /opt/puppetlabs/server/data/postgresql/9.4/data/postgresql.conf ● Add settings to improve logging around slow queries:
 
 log_min_duration_statement = 3000ms
 log_temp_files = 0 ● If a temp file shows up in the logs, that means Postgres had to perform an operation outside of RAM; which is slow. Consider increasing the work_mem setting to be greater than the size of the temp files used. ● If query performance has been dropping over time, a database VACCUM may be needed:
 
 su - pe-postgres -s /bin/bash -c "vacuumdb --analyze --verbose --all" 22
  • 23. Resources This Slide Deck: https://goo.gl/ytzCA5 23
  • 24. Resources Logging: • Directing Output: http://logback.qos.ch/manual/appenders.html • Formatting Main Logs: http://logback.qos.ch/manual/layouts.html • Formatting Access Logs: http://logback.qos.ch/manual/layouts.html#logback-access JMX: • Configuration:
 https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html • Metric Polling Tool: https://github.com/jmxtrans/jmxtrans 24
  • 25. Resources Puppet Server: • Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html • Configuration Reference: https://docs.puppet.com/puppetserver/2.6/configuration.html • Direct Puppet Workflow: https://docs.puppet.compe/2016.4/direct_puppet_workflow.html PuppetDB: • Metrics Reference: https://docs.puppet.com/puppetdb/4.2/api/metrics/v1/mbeans.html • Configuration Reference: https://docs.puppet.com/puppetdb/4.2/configure.html • Backup Procedures: https://docs.puppet.com/pe/2016.4/maintain_console-db.html • PostgreSQL Maintenance: https://github.com/npwalker/pe_databases 25