Greg Parmer, Information Technology Specialist
Jonas Bowersock, Information Technology Specialist
Alabama Cooperative Extension System
Auburn University
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
Network Monitoring with Icinga
1. NETWORK MONITORING WITH
ICINGA
Greg Parmer, Information Technology Specialist
Jonas Bowersock, Information Technology Specialist
Alabama Cooperative Extension System
Auburn University
2. Why Monitor?
Proactive Administration
• Better service
• Better coordination
• Better inventory of
services
• To reduce finger pointing
• To save lives!
• More time for other
things…like walks on the
beach
3. Others vs Nagios vs Icinga
Big Brother
• Years of good experiences
• Recognized need for Aruba
monitor in 2011
• Commercial “professional
edition”?
• What next?
(http://en.wikipedia.org/wiki/Compariso
n_of_network_monitoring_systems)
4. Others vs Nagios vs Icinga
Nagios
• 1996 – started by Ethan Galstad
• 1999 - released open source project “NetSaint”
• 2002 – trademark issues prompted rename to Nagios (“Nagios Ain't Gonna Insist On
Sainthood”)
• 2007 – Ethan founded Nagios Enterpises LLC
• Most downloaded monitoring software
• Large, active plug-in community
5. Others vs Nagios vs Icinga
Icinga
• 2009 - Nagios fork
• Open source community
project
• Many contributors from
Nagios project
(https://bugzilla.redhat.com/
show_bug.cgi?id=1054340)
• Backward compatible –
configs, plug-ins, add-ons
• 2014 – Icinga v2 due
6. Nagios & Icinga
2011 - Installed both Nagios and
Icinga
Ran both from same config files for
months
2012 – Use Nagios to monitor Icinga
Constant addition of service monitors
since
(Right: aNag screenshot on Android
phone)
7. What To Monitor?
Connectivity (check_ping)
Websites (check_http)
Disk usage (check_disk)
CPU usage (check_load)
Memory usage (check_swap)
Uptime (check_uptime)
File size (check_file_size.sh)
File age (check_file_age)
File shares (check_file_size)
Log files (check_log)
Non-standard ports (check_port)
Mail (check_smtp, check_mailq,more)
DNS (check_dns)
Certificate expirations (check_http)
Backup software (check_proc)
AV software (check_proc)
Check on printer (check_snmp)
Search: “Nagios plugins”
8. Example Definitions
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
define host{
name generic-switch ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, monitored round the clock
check_interval 5 ; every 5 minutes
retry_interval 2 ; Schedule host check retries at 2 minute intervals
max_check_attempts 6 ; Check each switch 6 times (max)
check_command check-host-alive ; check if routers are "alive“ (ping)
notification_period workhours_sans_au_holidays
notification_interval 160 ; Resend notifications
notification_options d,f,r,u ; d=dwn,u=unreach,r=recov,f=flap,s=sch dwntm,n=none
contact_groups helpdesk-plus
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
9. Example Definition
define host{
use generic-switch ; Inherit default values from a template
host_name 4hcenter ; The name we're giving to this switch
hostgroups t1s ; Host groups this switch is associated with
}
define host{
use generic-switch ; Inherit default values from a template
host_name chiltonrec ; The name we're giving to this switch
hostgroups t1s ; Host groups this switch is associated with
}
Rinse and repeat…
15. Our To Do List
Monitoring all websites
Monitoring all WordPress installs
Monitoring web page load times
Monitoring SQL server response times
Monitoring bandwidth to remote sites (better ideas?)
Monitoring the service which produces the next unexpected phone call
We’ve got over 100 locations, each with VPN connections back to campus.
We’ve got a few dozen servers…mostly virtual, some physical, some SAN equipment, and lots of services.
We don’t like surprises.
When someone has a connectivity problem, we can usually determine the severity, and pretty much know when it is fixed.
Coordination of repairs. While something is down, acknowledgments and comments let your team know the current status…and why.
We have services that are so seldom used that it may be weeks before someone complains if it breaks.
Good monitors help isolate problems. When you can say, “every connection from this ISP went down at 1am on Sunday” it helps everyone know where to look and reduces ISP’s tendency to blame our equipment.
Know the MRI machine is down before a patient requires it.
Ultimately, be able to spend time away, knowing that services are maintained. Jonas wouldn’t share his vacation pics!
We used “Big Brother” for years. Like so many other packages, it seemed to be going commercial. We considered buying it, but…how would we expand it? We needed something that could monitor our new Aruba devices/connections which don’t answer a ping (for us as configured by central IT). Google didn’t find a BB plugin.
Started as a simple “ping.” Grew into a real monitor. Over 8 million downloads. Nagios is so widely used, and so adaptable that someone had already written a plugin the check Aruba connections. Perfect, almost…
Open source communities tend to split when they feel taken advantage of by a commercial entity. A group of folks became very dissatisfied with the service/progress of the Nagios team and created their own project in May 2009.
Exact same config files.
I liked the Icinga UI. Only thing I don’t like is not being able to say “ee-tskee-na”, a zulu word.
aNag is one of several mobile device apps. See also iNag, Mobile Admin, NagMonDroid, and others. Notice: “Nagios/Luluwatch” is a Nagios install while “Production” is an Icinga install. 100% compatibility thus far.
Just about anything! Typical plugin is check_xxxxx.
Check_ping is one of the default plugins.
We used Notes and Actions used to tie in other databases. Each of those is unique.
Server notes include a bunch of configuration information. Disk configs, exports, etc.
This makes it easy to find things like the office phone number and the ISP.
We also keep a collections of notes, instructions, and photos for each site. Very useful when troubleshooting remotely via phone.
Based on your definitions, you can receive e-mail when something bad happens, or when someone starts working the problem. Here we’ve added Notes and Actions to the email template for ease of use.
…your connection is terrible. We fixed that in April!