SlideShare une entreprise Scribd logo
1  sur  41
Nagios XI
Best Practices
By Troy Lea
tlea@nagios.com
About Me
•Tech Support Contractor for Nagios
Enterprises
•Based in Australia
•Typically cover UTC+10 from 9am to 5pm
•Nagios & XI Dev (Box293)
•Nagios MVP3
What's Covered In This Talk
•Getting the most from Nagios XI
•Time saving information
•Configuration practices
•Object definitions
•Backend setup
•Performance enhancements
Nagios XI
Server Internals
Nagios XI License Entitlements
•XI license entitles 3 instances:
•Production
•Test & Dev (T&D)
•Disaster Recovery (DR)
•License activation is tied to IP Address
of each XI host
Whats Monitoring Nagios XI?
•How would you know your XI server died?
•“Nagios XI Server” Monitoring Wizard
•DR instance monitors production instance
•Production instance is UP & HEALTHY
•Production Instance monitors DR instance
•DR instance is UP & HEALTHY
localhost services
•Do you know how your XI server is
performing?
•Basic local services are included in XI base
•You should ideally be monitoring:
•Service Status (check_init_service)
•crond, httpd, mysql, ndo2db, npcd, ntpd,
postgresql, snmptrapd, snmptt
localhost services
•File Counts (check_file_count)
•NPCD Perfdata spool directory
•xidpe spool directory
•Check results folder
•snmptt spool folder
•nagios user account has not expired
•(check_pass_expire.pl)
localhost services
•root mailbox size
•(box293_check_mbox)
•MySQL / MariaDB
•Database tables crashed?
•(box293_check_mysql_table_status)
•Date/Time correct?
•(box293_check_mysql_date)
localhost services
•Overall Load (check_load)
•Memory Free – Physical (check_memory)
•Swap Usage (check_swap)
•Disk Free (check_disk)
Date and Timezone!
•Configure Timezone
•Admin > Manage System Config
•Sync with trusted time source
•VM? Don’t sync with hypervisor!
•Can be the source of confusing
problems
CPU
•CPU Cores vs Speed!
•Not everything is multi-threaded
•3.4 GHz vs 2.2 GHz
•Number of cores is still important
•Refer to XI hardware requirements
Memory
•Enough memory to cope in a major outage
•Event handlers consume memory quickly (+GB
in a matter of minutes in a major outage)
•Have at least 50% more memory than needed
•Refer to XI hardware requirements
RAM Disk
•Lots of little files created/deleted/updated
•Using a RAM Disk:
•Reduces disk I/O & load
•Speeds up processing of performance data
•Speeds up processing of spooled check results
•Speeds up nagios restarts
•Refer to official procedure
Solid State Disk (SSD)
•Greatly improves overall performance
•Compliments RAM Disk
•Helps read/writes with:
•Logs
•Database
•Performance Graphs
•Reports
SSD vs RAID ?
•SSD beats* a spinning disk RAID set
•*Depends on how much money you have
•Still need to RAID1 SSD for redundancy!
•SSD may not give you the required capacity
•3.8TB SAS SSD now available
!!!
rrdcached
•Enabling rrdcached accumulates the
spooled performance data, after x amount of
time it is processed into backend RRD files
•Reduces Disk I/O
•Can be a delay in data appearing in graphs
•Refer to official procedure
Offloaded MySQL / MariaDB
•Data constantly written to databases
•Historical and Configuration
•Offload to separate server to reduce load
•Don't forget to monitor offloaded server!!!
•Disk/CPU/Memory/Tables/Service
•Refer to earlier slides
•Refer to official procedure
Mod-Gearman
•Used for offloading plugins to workers
•Plugins need to be installed on all workers
•Be aware of plugins that use /tmp files!
•XI 2014 onwards uses Core 4
•Core 4 has it's own workers (only local
workers)
•nagios.cfg “check_workers” option
•Refer to official procedure
Disaster Recovey
•Failover and High Availability Solutions for
Nagios XI
•Andy Brist - NWC2014 – Failover & HA
•What is really important in disaster?
•Plan and test
Backups!!!
•Admin > System Backups
•Schedule backups of XI
•Location can be local, FTP, SSH
•Remote location recommended
•Manual Backups
•Local Backup Archives via Admin menu
•/usr/local/nagiosxi/scripts/backup_xi.sh
Restoring Backups
•Official Backup and Restore procedure
•Brings system back online with ease
•Great for migrating from old XI to new XI
•Also good for:
•DR
•Test & Dev
Configuration
Intervals - Host vs Services
•Host down HARD = service notifications
suppressed
•What happens when host and services
use the same check intervals?
•Unnecessary Notifications get sent :(
•Make host go down HARD quicker than
it’s services!
Service Dependencies
•When a master service goes down:
•Prevents notifications from being sent
•Prevents service checks from execution
•Make master service go down HARD
quicker than dependent services!
•Otherwise dependencies are pointless
•Master service e.g. - Ping or NRPE Version
Disable Service Checks ?
•host_down_disable_service_checks
•Nagios Core 4.1.x feature (XI 5)
•System wide setting
•Reduces load on XI host
•Think of it as automatic service dependencies
on their own hosts
•Service dependencies ignored if host is down
Check Intervals - Be Realistic
•Does it need to be checked every 5
minutes?
•Disk Free Space – every 60 minutes perhaps?
•Too long = no performance data
•Different intervals to spread the load
•3, 5, 7 minute intervals
•58, 60, 62 minute intervals
Notification & Check Intervals
•Nagios determines if it is allowed to send a
notification every service HARD state
•e.g. 15 minute check and 60 minute notification
•Internal scheduling may cause 14min 55sec
to pass, 4 x 14:55 = 59min 40sec … it’s <
60min!
•Notification not sent until 75min!
•Scheduling is geared +/- to reduce load!
Use Hostgroups!
•Assign ONE service to a hostgroup of
common servers
•Windows Servers
•Linux Servers
•Consistent monitoring, standards enforced!
•Directive changes - all hosts get updated
•Reduces management overhead
Use Contact Groups!
•Use contact groups in all definitions
•Makes it easy when staff join/leave
•Just add/remove the contact from groups
•Reduces administrative overhead
•Enforces your company policy
•Similar principle to host groups
Configuration Wizards
•Pros
•Great for getting up and running quickly
•No need to learn how a plugin works
•Cons
•Creates individual services
•More work later when enforcing “standards”
Templates
•Common settings applied to objects
•Helps enforce standards
•Reduces administrative overhead
•Layer multiple templates
•Can be additive or ignore inheritance
•XI Config Wizard objects use templates
•Example of common icmp check
User Macros – resources.cfg
•$USERx$ macros are good for common
items like a username or password
•Allows passwords with a ! exclamation
mark
•Values not visible in object definitions
•$USER1$
•/usr/local/nagios/libexec
Custom Object Variables
•Allows you to create your own variables
•Can be defined in host or service objects
•E.G. hosts have their own check_nt
password
•Define _CHECK_NT_PASSWORD in host object
•In command definitions reference it as:
•$_HOSTCHECK_NT_PASSWORD$
•VERY POWERFULL!
Other
MTRG Clean Configs
•Your MRTG configs may be collecting
more than what you think
•/etc/mrtg/conf.d/*.cfg files
•Created by Network Switch / Router Wizard
•Comment out unused ports
•About 37 lines per port
•Comment out unused non-interfaces
(VLANs)
Plugins – Compiled vs Scripts
•Compiled runs quicker
•Official nagios-plugins are compiled
•“Custom modifications” require re-compiling
•Scripts run slower, consume more
resources
•Perl plugins known to consume +CPU +RAM
•“nice” can reduce impact of plugins
•Check Profiler component by box293
Backend API - Read Only User
•API provides you with URLs for use in third
party products without needing user/pass
•Requires a user account to be created
•Account should be READ ONLY
Performance Data Tool
•Component developed by box293
•Allows you to manipulate RRD files
•Great for merging RRD data
•Can also delete old RRD files for old services
•View raw data in tables
•Find it in the Nagios Exchange
Thank you!
What Is Your Best Practice?
Any Questions?
end
done
fi esac
)
}
;
od
until
.

Contenu connexe

Tendances

Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...
Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...
Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...Amazon Web Services
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to TelegrafInfluxData
 
Nagios An Open Source Network Management System Powerpoint Presentation Slides
Nagios An Open Source Network Management System Powerpoint Presentation SlidesNagios An Open Source Network Management System Powerpoint Presentation Slides
Nagios An Open Source Network Management System Powerpoint Presentation SlidesSlideTeam
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreSanjay Willie
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) ExplainedMaciej Lasyk
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elkRushika Shah
 
Ansible: Infrastructure as Code for OpenShift
Ansible: Infrastructure as Code for OpenShiftAnsible: Infrastructure as Code for OpenShift
Ansible: Infrastructure as Code for OpenShiftIgnacio Sánchez Ginés
 
Zabbix - Company, Product and Services
Zabbix - Company, Product and ServicesZabbix - Company, Product and Services
Zabbix - Company, Product and ServicesZabbix
 
Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)Martin Markovski
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)Open Source Consulting
 
Zabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructureZabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructureArvids Godjuks
 
The AWS Shared Security Responsibility Model in Practice
The AWS Shared Security Responsibility Model in PracticeThe AWS Shared Security Responsibility Model in Practice
The AWS Shared Security Responsibility Model in PracticeAmazon Web Services
 
CI/CD Pipeline with Kubernetes
CI/CD Pipeline with KubernetesCI/CD Pipeline with Kubernetes
CI/CD Pipeline with KubernetesMukesh Singh
 

Tendances (20)

Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...
Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...
Releasing Software Quickly and Reliably With AWS CodePipeline by Mark Mansour...
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to Telegraf
 
Nagios An Open Source Network Management System Powerpoint Presentation Slides
Nagios An Open Source Network Management System Powerpoint Presentation SlidesNagios An Open Source Network Management System Powerpoint Presentation Slides
Nagios An Open Source Network Management System Powerpoint Presentation Slides
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios Core
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
Edge architecture ieee international conference on cloud engineering
Edge architecture   ieee international conference on cloud engineeringEdge architecture   ieee international conference on cloud engineering
Edge architecture ieee international conference on cloud engineering
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Ansible: Infrastructure as Code for OpenShift
Ansible: Infrastructure as Code for OpenShiftAnsible: Infrastructure as Code for OpenShift
Ansible: Infrastructure as Code for OpenShift
 
Zabbix - Company, Product and Services
Zabbix - Company, Product and ServicesZabbix - Company, Product and Services
Zabbix - Company, Product and Services
 
Demystifying Service Mesh
Demystifying Service MeshDemystifying Service Mesh
Demystifying Service Mesh
 
Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Grafana
GrafanaGrafana
Grafana
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
 
Zabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructureZabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructure
 
The AWS Shared Security Responsibility Model in Practice
The AWS Shared Security Responsibility Model in PracticeThe AWS Shared Security Responsibility Model in Practice
The AWS Shared Security Responsibility Model in Practice
 
CI/CD Pipeline with Kubernetes
CI/CD Pipeline with KubernetesCI/CD Pipeline with Kubernetes
CI/CD Pipeline with Kubernetes
 

Similaire à Nagios XI Best Practices

be the captain of your connections deployment
be the captain of your connections deploymentbe the captain of your connections deployment
be the captain of your connections deploymentSharon James
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckLuis Guirigay
 
Soccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM ConnectionsSoccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM Connectionspanagenda
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsLetsConnect
 
Fixing Domino Server Sickness
Fixing Domino Server SicknessFixing Domino Server Sickness
Fixing Domino Server SicknessGabriella Davis
 
Moving Windows Applications to the Cloud
Moving Windows Applications to the CloudMoving Windows Applications to the Cloud
Moving Windows Applications to the CloudRightScale
 
Pre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyPre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyAntonios Chatzipavlis
 
Domino Server Health - Monitoring and Managing
 Domino Server Health - Monitoring and Managing Domino Server Health - Monitoring and Managing
Domino Server Health - Monitoring and ManagingGabriella Davis
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18BIWUG
 
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...Kim Greene
 
Citrix Synergy 2014: Going the CloudPlatform Way
Citrix Synergy 2014: Going the CloudPlatform WayCitrix Synergy 2014: Going the CloudPlatform Way
Citrix Synergy 2014: Going the CloudPlatform WayIliyas Shirol
 
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesApril, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesHoward Greenberg
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Perforce
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix
 
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...Nagios
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site ReviewECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site ReviewKenny Buntinx
 

Similaire à Nagios XI Best Practices (20)

be the captain of your connections deployment
be the captain of your connections deploymentbe the captain of your connections deployment
be the captain of your connections deployment
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
 
Soccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM ConnectionsSoccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM Connections
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
Fixing Domino Server Sickness
Fixing Domino Server SicknessFixing Domino Server Sickness
Fixing Domino Server Sickness
 
Moving Windows Applications to the Cloud
Moving Windows Applications to the CloudMoving Windows Applications to the Cloud
Moving Windows Applications to the Cloud
 
Pre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyPre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctly
 
Domino Server Health - Monitoring and Managing
 Domino Server Health - Monitoring and Managing Domino Server Health - Monitoring and Managing
Domino Server Health - Monitoring and Managing
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18
 
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
 
Citrix Synergy 2014: Going the CloudPlatform Way
Citrix Synergy 2014: Going the CloudPlatform WayCitrix Synergy 2014: Going the CloudPlatform Way
Citrix Synergy 2014: Going the CloudPlatform Way
 
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesApril, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
 
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...
Nagios Conference 2014 - Mike Merideth - The Art and Zen of Managing Nagios w...
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site ReviewECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
 

Plus de Nagios

Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewNagios
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The HoodNagios
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsNagios
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsNagios
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceNagios
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksNagios
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationNagios
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Nagios
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosNagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosNagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Nagios
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNagios
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - FeaturesNagios
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios
 
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment OptionsNagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment OptionsNagios
 

Plus de Nagios (20)

Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment OptionsNagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
 

Dernier

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 

Dernier (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 

Nagios XI Best Practices

  • 1. Nagios XI Best Practices By Troy Lea tlea@nagios.com
  • 2. About Me •Tech Support Contractor for Nagios Enterprises •Based in Australia •Typically cover UTC+10 from 9am to 5pm •Nagios & XI Dev (Box293) •Nagios MVP3
  • 3. What's Covered In This Talk •Getting the most from Nagios XI •Time saving information •Configuration practices •Object definitions •Backend setup •Performance enhancements
  • 5. Nagios XI License Entitlements •XI license entitles 3 instances: •Production •Test & Dev (T&D) •Disaster Recovery (DR) •License activation is tied to IP Address of each XI host
  • 6. Whats Monitoring Nagios XI? •How would you know your XI server died? •“Nagios XI Server” Monitoring Wizard •DR instance monitors production instance •Production instance is UP & HEALTHY •Production Instance monitors DR instance •DR instance is UP & HEALTHY
  • 7. localhost services •Do you know how your XI server is performing? •Basic local services are included in XI base •You should ideally be monitoring: •Service Status (check_init_service) •crond, httpd, mysql, ndo2db, npcd, ntpd, postgresql, snmptrapd, snmptt
  • 8. localhost services •File Counts (check_file_count) •NPCD Perfdata spool directory •xidpe spool directory •Check results folder •snmptt spool folder •nagios user account has not expired •(check_pass_expire.pl)
  • 9. localhost services •root mailbox size •(box293_check_mbox) •MySQL / MariaDB •Database tables crashed? •(box293_check_mysql_table_status) •Date/Time correct? •(box293_check_mysql_date)
  • 10. localhost services •Overall Load (check_load) •Memory Free – Physical (check_memory) •Swap Usage (check_swap) •Disk Free (check_disk)
  • 11. Date and Timezone! •Configure Timezone •Admin > Manage System Config •Sync with trusted time source •VM? Don’t sync with hypervisor! •Can be the source of confusing problems
  • 12. CPU •CPU Cores vs Speed! •Not everything is multi-threaded •3.4 GHz vs 2.2 GHz •Number of cores is still important •Refer to XI hardware requirements
  • 13. Memory •Enough memory to cope in a major outage •Event handlers consume memory quickly (+GB in a matter of minutes in a major outage) •Have at least 50% more memory than needed •Refer to XI hardware requirements
  • 14. RAM Disk •Lots of little files created/deleted/updated •Using a RAM Disk: •Reduces disk I/O & load •Speeds up processing of performance data •Speeds up processing of spooled check results •Speeds up nagios restarts •Refer to official procedure
  • 15. Solid State Disk (SSD) •Greatly improves overall performance •Compliments RAM Disk •Helps read/writes with: •Logs •Database •Performance Graphs •Reports
  • 16. SSD vs RAID ? •SSD beats* a spinning disk RAID set •*Depends on how much money you have •Still need to RAID1 SSD for redundancy! •SSD may not give you the required capacity •3.8TB SAS SSD now available !!!
  • 17. rrdcached •Enabling rrdcached accumulates the spooled performance data, after x amount of time it is processed into backend RRD files •Reduces Disk I/O •Can be a delay in data appearing in graphs •Refer to official procedure
  • 18. Offloaded MySQL / MariaDB •Data constantly written to databases •Historical and Configuration •Offload to separate server to reduce load •Don't forget to monitor offloaded server!!! •Disk/CPU/Memory/Tables/Service •Refer to earlier slides •Refer to official procedure
  • 19. Mod-Gearman •Used for offloading plugins to workers •Plugins need to be installed on all workers •Be aware of plugins that use /tmp files! •XI 2014 onwards uses Core 4 •Core 4 has it's own workers (only local workers) •nagios.cfg “check_workers” option •Refer to official procedure
  • 20. Disaster Recovey •Failover and High Availability Solutions for Nagios XI •Andy Brist - NWC2014 – Failover & HA •What is really important in disaster? •Plan and test
  • 21. Backups!!! •Admin > System Backups •Schedule backups of XI •Location can be local, FTP, SSH •Remote location recommended •Manual Backups •Local Backup Archives via Admin menu •/usr/local/nagiosxi/scripts/backup_xi.sh
  • 22. Restoring Backups •Official Backup and Restore procedure •Brings system back online with ease •Great for migrating from old XI to new XI •Also good for: •DR •Test & Dev
  • 24. Intervals - Host vs Services •Host down HARD = service notifications suppressed •What happens when host and services use the same check intervals? •Unnecessary Notifications get sent :( •Make host go down HARD quicker than it’s services!
  • 25. Service Dependencies •When a master service goes down: •Prevents notifications from being sent •Prevents service checks from execution •Make master service go down HARD quicker than dependent services! •Otherwise dependencies are pointless •Master service e.g. - Ping or NRPE Version
  • 26. Disable Service Checks ? •host_down_disable_service_checks •Nagios Core 4.1.x feature (XI 5) •System wide setting •Reduces load on XI host •Think of it as automatic service dependencies on their own hosts •Service dependencies ignored if host is down
  • 27. Check Intervals - Be Realistic •Does it need to be checked every 5 minutes? •Disk Free Space – every 60 minutes perhaps? •Too long = no performance data •Different intervals to spread the load •3, 5, 7 minute intervals •58, 60, 62 minute intervals
  • 28. Notification & Check Intervals •Nagios determines if it is allowed to send a notification every service HARD state •e.g. 15 minute check and 60 minute notification •Internal scheduling may cause 14min 55sec to pass, 4 x 14:55 = 59min 40sec … it’s < 60min! •Notification not sent until 75min! •Scheduling is geared +/- to reduce load!
  • 29. Use Hostgroups! •Assign ONE service to a hostgroup of common servers •Windows Servers •Linux Servers •Consistent monitoring, standards enforced! •Directive changes - all hosts get updated •Reduces management overhead
  • 30. Use Contact Groups! •Use contact groups in all definitions •Makes it easy when staff join/leave •Just add/remove the contact from groups •Reduces administrative overhead •Enforces your company policy •Similar principle to host groups
  • 31. Configuration Wizards •Pros •Great for getting up and running quickly •No need to learn how a plugin works •Cons •Creates individual services •More work later when enforcing “standards”
  • 32. Templates •Common settings applied to objects •Helps enforce standards •Reduces administrative overhead •Layer multiple templates •Can be additive or ignore inheritance •XI Config Wizard objects use templates •Example of common icmp check
  • 33. User Macros – resources.cfg •$USERx$ macros are good for common items like a username or password •Allows passwords with a ! exclamation mark •Values not visible in object definitions •$USER1$ •/usr/local/nagios/libexec
  • 34. Custom Object Variables •Allows you to create your own variables •Can be defined in host or service objects •E.G. hosts have their own check_nt password •Define _CHECK_NT_PASSWORD in host object •In command definitions reference it as: •$_HOSTCHECK_NT_PASSWORD$ •VERY POWERFULL!
  • 35. Other
  • 36. MTRG Clean Configs •Your MRTG configs may be collecting more than what you think •/etc/mrtg/conf.d/*.cfg files •Created by Network Switch / Router Wizard •Comment out unused ports •About 37 lines per port •Comment out unused non-interfaces (VLANs)
  • 37. Plugins – Compiled vs Scripts •Compiled runs quicker •Official nagios-plugins are compiled •“Custom modifications” require re-compiling •Scripts run slower, consume more resources •Perl plugins known to consume +CPU +RAM •“nice” can reduce impact of plugins •Check Profiler component by box293
  • 38. Backend API - Read Only User •API provides you with URLs for use in third party products without needing user/pass •Requires a user account to be created •Account should be READ ONLY
  • 39. Performance Data Tool •Component developed by box293 •Allows you to manipulate RRD files •Great for merging RRD data •Can also delete old RRD files for old services •View raw data in tables •Find it in the Nagios Exchange
  • 40. Thank you! What Is Your Best Practice? Any Questions?

Notes de l'éditeur

  1. Hello everyone and welcome to my talk on Nagios XI Best Practices. First a little about me. I’ve been an independent contractor for Nagios Enterprises since June 2014. I provide tech support for our customers through our support system, and in the forums. Generally speaking when the USA techs are finishing their day, I’m starting mine. I’ve been using Nagios since 2009 when Nagios XI was released. I develop various Nagios and Nagios XI related projects in my spare time, you will see them published under the box293 handle. Since then I’ve spoken at several conferences and have been lucky enough to have received the MVP award three times.
  2. This talk is going to be about best practices. I will cover a range of topics such as how to get the most out of XI, things you wish you knew, and configuration practices. Strictly speaking, a best practice is a flexible statement. Depending on your environment and your needs some topics won’t apply and other topics may be exactly what you are after. The information in this talk is a reflection of Nagios XI deployments in the wild. Additionally, monitoring is not all about metrics and thresholds, it can also be helpful to ensure standards are enforced. If a setting gets changed you can be notified about it instead of having to track it down during one of your troubleshooting adventures. This presentation will be available after the conference to download, so don’t feel like you have to write down everything on the screen … however I do admit to taking photos of slides in presentations as sometimes I just can’t wait that long. Hopefully you should be able to walk away from this talk learning something … if not you should come and work for Nagios Enterprises!
  3. We’ll start off with looking at the Nagios XI server and how it is configured in the back end.
  4. To get the ball rolling, what does a license of XI entail? Three running instances of Nagios XI are allowed. The caveat to this is that only your Production system can be used for actual monitoring. We are flexible and understand your needs to actively have a test system that runs alongside production. This allows you to implement new checks into production with the confidence that they will work. Disaster recovery is another important factor for customers, once again you can have this instance up and running as part of your license.
  5. Probably the most important part of a monitoring system is knowing that it’s actually working. Nagios XI comes with a “Nagios XI Server” monitoring wizard to ensure everything is A-OK. However there is no point in monitoring itself, I mean if it’s down you’re not going to hear about it. Utilize your DR instance to monitor the production instance. This way, if the production instance goes down, you’ll receive alerts about it. The same applies for monitoring the DR instance, make sure production monitors the DR instance to make sure it’s healthy. By using the “Nagios XI Server” wizard to monitor the other instance, you can have confidence knowing that when something goes wrong, you’ll really hear about it.
  6. Nagios XI comes bundled with a handful of localhost services, however there are some addition localhost services we think you should add. There are a lot of moving parts to an XI server, so we think these are the most important items to be monitoring. For services, this list of services should be monitored to ensure they are running. The snmptrapd and snmptt services are not installed on an XI server by default, but are when the official SNMP Trap procedure has been followed.
  7. Nagios XI generates a LOT of small files, constantly. These files are created/deleted/updated every millisecond. The more you are monitoring, the more disk I/O this generates. Sometimes a service can stop and once this happens certain folders start to spool these files and before you know it there could be 100,000 unprocessed files on your system. This can quickly exhaust the free inodes on your disk. These directories should be monitored to make sure the files don’t increase past a certain number. If they do, you’ll be on top of it before it becomes a major problem. In some customer installations, it’s possible that the nagios user account expires. This isn’t always that obvious to troubleshoot, so checking that it hasn’t expired is a good precautionary measure.
  8. If you’re not a Linux person then you probably don’t know about the system mailbox. This is a local mail system on the linux server where messages are sometimes sent. Certain components used in Nagios XI such as MRTG will send messages to this mailbox when it has a problem. An incorrect MRTG configuration can cause a message to be sent every five minutes as this is when MRTG runs. That’s about 288 messages a day. Over time the root mailbox can grow GB in size causing issue. I wrote a plugin which can report on this and let you know when it gets too big. The MySQL or MariaDB databases are important to the system. The change in name has to do with a split in the OpenSource community and MariaDB is present in CentOS and Redhat 7. If the tables are crashed and go undetected, this can have a severe impact on the system and you may not be storing important data and it may cause strange problems. It just so happens that I wrote a plugin that will alert you if this happens. Also, another problem can occur if the database engine runs on a different timezone to the local system. Once again here’s a plugin you can use to monitor this circumstance, which in this case is more of an auditing check.
  9. Load is important to keep tabs on, some services like NPCD will stop running when the system load exceeds a defined threshold. Physical Memory free is important and is tied to swap usage. If the system runs out of physical memory and starts swapping to disk, the system performance will be greatly impacted. Disk free space is very important. If you have different volumes mounted then you should be monitoring each one of these.
  10. Make sure your timezone is configured correctly AND it is synced with a trusted time source. This can cause issues with databases, log files, performance data.
  11. Having the right amount of CPU cores is important but so too is the speed of those cores. Not all plugins and processes are multi-threaded, so a higher speed CPU is going to benefit. A 3.4GHz CPU will do a lot more than a 2.2GHz one.
  12. How much memory do you need on an XI system? When all the hosts and services in XI are healthy, the amount of memory used is far less compared to a major system outage. When XI fires off event handlers they consume memory, if there is a major outage and a lot of event handlers are being executed, a lot of memory is being consumed. It doesn’t take long for 6GB of memory to be used. Generally speaking you should have at least 50% more memory than needed.
  13. While on the topic of memory, configuring Nagios XI with a RAM Disk is highly recommended as the number of monitored objects increase. The more things you are monitoring the more disk I/O occurs. By directing this traffic a RAM Disk, the time it takes for that I/O operation to complete is drastically faster. We have an official procedure for you to follow to implement this.
  14. A solid state disk will also provide a dramatic performance to your Nagios XI server. Keep in mind, a RAM Disk is still recommended as you want to minimize the amount of writes to the SSD.
  15. RAID allows for much larger disk capacities than SSD can provide, however it would be very hard for a spinning disk RAID set to beat the performance of SSD. Keep in mind if you implement SSD you should implement RAID1 sets for redundancy purposes.
  16. rrdcached is a way of accumulating the received performance data and then processing it in a batch job. It helps with larger installations and can reduce I/O, however it can also result with performance graphs lagging behind the realtime results. We have an official procedure for you to follow to implement this.
  17. On larger installations there can be a lot more data being written to the databases, which in turn can result in a lot of CPU usage directed away from actual monitoring. Offloading to a separate server will remove this CPU usage from your monitoring server. Of course, make sure you monitor the offloaded server! We have an official procedure for you to follow to implement this.
  18. Mod-Gearman is a way of offloading the plugin execution to separate workers instead of the monitoring engine doing it. A worker can be on the XI server itself OR on external hosts. On external hosts, it requires all the plugins to be installed that are going to be executed. Also, be aware of plugins that create temporary files, these don’t work well if the plugins are moving about the workers. You can also use host groups for directing checks to only be executed by specific workers, this is handy for multi-site setups. XI 2014 uses Core 4.x which now has it’s own workers. Using Mod-Gearman on an XI 2014 server just for the purpose of a local worker is not required, however if you need external workers then Mod-Gearman is the solution for you. We have an official procedure for you to follow to implement this.
  19. I won’t go into this topic in detail as Andy Brist did a great talk on it at last years conference. What you need to define is what is import to you in a disaster. Once you have clearly defined goals and outcomes you can plan appropriately and test.
  20. Have you scheduled your backups in Nagios XI? Storing them on storage that is not locale to the XI file system is important, make sure you can get to your backups if your XI server dies.
  21. The backup and restore procedure is very straight forward and allows for a full recovery of your Nagios XI system. Another good use of it is to migrate XI from one server to another.
  22. Now lets look at Nagios configuring practices.
  23. When a host goes down HARD, it will prevent service notifications from being sent, saving unnecessary alerts. A common mistake when setting up your monitoring intervals is to leave the host intervals the same as the service intervals. What this can lead to is the hosts service’s going into a HARD critical state before the HOST does. By making sure the HOST goes into a HARD down state before services ensures the service notifications will be suppressed.
  24. When a host goes down, the services still get executed and can result in services in an unknown or critical state. Nagios suppresses any notifications however they still appear as critical in the interface. Sometimes a host can be up but the monitoring agent can be down. An example of this is an NRPE agent. By using service dependencies, if the master service goes down, you prevent notifications be sent OR prevent checks from being executed. Either option simply pushes the next check or notification to the next interval. However, very similar to the previous slide, make sure your master service goes down HARD before the dependent services, use different check intervals or retries.
  25. An upcoming feature of Nagios Core 4.1.0 is a configuration directive called host_down_disable_service_checks. It’s best described as automatic service dependencies on their own hosts. This applies across the board, it is not granular. Can reduce the load on the XI host as plugins will not be executed if this host is down. Keep in mind that if the host is down then any defined service dependencies will be ignored.
  26. It can be very easy to setup your monitoring with the same intervals across the board. This can lead to peaks and troughs in load on the XI server as a lot of checks can occur in the same time windows. Have a think about what you are monitoring and how often do you really need to check it. Something like disk usage rarely runs out quickly, you can monitor this every hour and be confident you’ll be notified about the free disk space running low in a reasonable time. However if you are going to make it every hour, why not every 58 minutes or 61 minutes? Try to spread the load out a bit.
  27. Sometimes larger check intervals can have an adverse affect on notification intervals. The monitoring engine determines if it should send a notification every time a check result is received. Due to how the internal scheduling works, you might fall short of the notification window by a small time period like 20 seconds. This means it might be another 15 minutes until the next check is run, that’s when the notification will be sent.
  28. Using hostgroups in your service definitions is one of the most powerful features of Nagios. Common services generally have the same threshold for all hosts. Instead creating individual services for each host you monitor, a service can be assigned to multiple hosts using a hostgroup. What this means is that you only need to have the service defined once, and when you want to tweak the thresholds, you only need to change it in one location and all hosts will receive the updated thresholds. So if you have a host group called windows_servers, whenever you add a new windows server it’s just a matter of adding that server to the hostgroup and voila, that host gets that bunch of common checks. This is great for consistent monitoring and it ensures standards get applied.
  29. One of the most common support questions we get asked is how to add or remove a contact to a bunch of objects? If you don’t have the enterprise edition license then you don’t get access to the bulk modification tool that allows you to do this. However that approach is flawed. It’s very easy to make mistakes and before you know it a notification was not sent to the correct people. Using a contact group is a much better method. It’s so much easier to go in and add or remove a contact from a contactgroup and instantly all the objects that use this group will be updated. Even if there is only one member to a contact group it still makes administration so much easier. If you’ve not activated the trial of enterprise edition, this is a great way make use of the bulk modification tool to implement contactgroups and remove the individual contacts. Once your standard is in place, administration will be so much easier.
  30. Configuration wizards is how I really first got involved in Nagios XI. I really liked how you could step through monitoring a particular device and at the end of it all the configurations were created for you. I really saw that as an ice breaker for people who are new to nagios, you didn’t need to learn how a plugin worked or how to create a command definition followed by service definitions, you just pointed and clicked. The downside to wizards is that they create a lot of services. In a large scale monitoring environment, you might use services that are applied to hostgroups which reduces administrative overhead. Using wizards doesn’t really apply to these environments however they are a great primer for setting up initial services, from there you can modify them in CCM.
  31. Templates are very powerful when used for the right purposes. A really good example is how the XI Configuration Wizards use templates for the host objects. The host object template has a standard icmp up/down check. This means if you ever wanted to change the thresholds, you could change the template and then all hosts using that template will get the updated check. You can use multiple templates in a layering fashion. As Nagios core reads the object definition, it looks at the first template and obtains the settings. It then looks at the next template and layers those settings over the top of the previous settings. This continues and builds the final object. Object directives can be set to inherit that setting from a template, or ignore it. Other settings can be additive, like hosts, hostgroups, contacts, contactgroups. For example you might have a master template that defines the base settings all services should use. However you have a bunch of service checks that require a specific time period. Create a separate template that uses this time period and put that template at the top of the chain. The final service object that is created will use the specific time period. You can even create an empty template that uses a combination of other templates, this way you can use the master templates across all your objects and easily add / remove other templates to the master template, in turn reducing your administrative overhead. Be careful not to add more administrative overhead though.
  32. User macros are a way of storing and referencing common items such as usernames and passwords. Because you are referencing the objects as a macro, the actual value is not visible in the object definitions. It also allows special characters to be used like an exclamation mark. Normally when an exclamation mark is used in a command_name directive, it’s purpose is to split up the different arguments, so by storing it in a user macro it works around the problem.
  33. Custom object variables are one of the lesser known features of Nagios. It allows you to define you own variables to use in your object definitions, this makes Nagios very flexible. A good example is if each windows host had it’s own custom check_nt password. What you can do is store that password in the host object and then from your service objects you can reference the password. It also means that you can still have just one command that can be used my many hosts, reducing administrative overhead.
  34. Finally here are some other items which you may be interested in.
  35. In Nagios XI, the Network Switch / Router wizard uses MRTG to collect the monitoring data from the network device. The configuration files for MRTG are created with the program cfgmaker. While you may have selected to only monitor a handful of ports from your network device, MRTG will collect data from all the interfaces. This creates extra network traffic and I/O. You can edit these MRTG config files and comment out the ports for which you do not need data to be collected for. Each port consists of about 37 lines in total. Also you’ll find non-interfaces like VLANs, these can also be commented out, unless of course you want to monitor them.
  36. Plugins can either be complied or scripts, or perhaps a combination of both. The plugins that are installed as part of the nagios-plugins package are compiled. Basically this means that if you want to modify them you need to modify the source code and recompile them, which can be tedious. However these plugs generally consume less resources. A lot of plugins you can download off the Nagios Exchange are script based, like bash, perl, python etc. These plugins can be modified on the fly, but can consume a lot more resources which in turn can increase the load on your XI server. One method of reducing the impact of plugins is to pre-pend them with the nice command. Using nice will run a plugin at a lower priority, it will still consume the same amount of resources but not at the expense of other more important processes. Using Mod-Gearman is also another option to offload resource intensive plugins. Try the Check Profiler component I created, this gives a quick report of how long plugins take to run and also the latency.
  37. The Nagios XI backend API allows you to generate URLs to use in third party products to access Nagios without requiring a username or password. This requires a user account to be created to generate the URLs. Its important to create this user account as a read only account to prevent any unintended access to Nagios XI
  38. The Performance Data Tool is a component that I developed that allows you to manipulate and interrogate rrd files. It’s particularly useful for merging performance data from one service to another. Perhaps a service was renamed and the data is now stored in a different RRD file, this lets you merge the old data into the new file. Another great feature is that you can view the performance data in a table format, which sometimes is more useful than graphs. Finally it can be handy for finding old rrd files which could be deleted from the system to reclaim disk space. You can download it from the Nagios Exchange.
  39. That’s about it. How about some questions or perhaps tell us what your best practice is.