Andreas Ericsson's presentation on using Nagios with Merlin.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
6. Need For Distributed Monitoring
Reliability
Reliability
24x7 SLA for Mission-Critical Services
Availability and SLA Reporting
7. Need For Distributed Monitoring
Performance
Performance
High Number of Active System checks
Limitations of the Operating System
Growth
8. Need For Distributed Monitoring
Distributed Monitoring
Distributed Monitoring
Locations
Combinations of Network
IP Address Conflicts
Security
9. Why Merlin?
Other solutions for redundancy are clunky at best
Redundancy is important
Load-balancing and automatic fail-over
Network functionality (as its users see it) is hardly ever
measurable from a single place
10. Key Features
Redundancy
– Ensure availability
– Ensure availability
Performance
– Handle larger networks
– Handle larger networks
Load balancing
– Share the workload
– Share the workload
Distributed monitoring
– Geographical coverage
– Geographical coverage
Our solution for a scalable monitoring refers to an:
Easy to use system
Capable of constantly changing to fit the needs of your business
Give stability and performance
12. op5 Merlin Open Source Project
Merlin - Module for Effortless Redundancy
and Load balancing in Nagios
For setting up distributed Nagios
installations
13. Brief project info
Started 2006 as a prototype for a huge installation
First used as redundancy engine 2009
Used in production at +800 installations
Largest production installation has 3 masters and 14
pollers
v2.0.0 (with Nagios 4 support) to be released officially
next week
Current bleeding edge is v2.0.0-beta2-p10
14. Key design concepts
Peer loadbalancing is 100% transparent
Pollers take care of one or more hostgroups
Pollers can be (and often are) peered
Binary protocol for extreme performance
32-bit and 64-bit machines can't play together :-/
Object config of two peers must be identical
Pollers must never know about objects they're not
responsible for
17. Peered Setup
Scalability / High Availability
The backend allows a variety of Peer
high availability setups and allows
almost infinite scalability by adding Peer
more "peers"
Config
Peer
Check results
Poll/check
Monitored objects
18. Master/Poller Setup
Remote Modules
Master
Remote modules allow the
monitoring of individual services
and devices using a dedicated, but
centrally managed monitoring
system
Poller Poller Cloud Poller
Config
Check results
Poll/check
Monitored objects Monitored objects Monitored objects
21. Configuration And Management
Merlin automatically distributes object config
Split and Push Config-in master / poller
configurations
Straight-up sync for peers when needed
22. Early Adopters
Mogul Services AB
Hosts critical services for operators, call-centers
banking, online media and emergency broadcast
channels
Very early implementation (beta-stage in POC-deal)
Quite complex setup (peered masters, multiple
pollers)
Very high availability demands
24. Performance
Peer
Add more if needed to scale out
Peer
performance monitoring rather than to
scale up on hardware
Growth of the monitoring system with
the requirements of the company
Peer Peer
Peer
Peer
25. Reliability
Peer
Through dynamic distribution of
Peer
service checks the individual nodes are
peered. This setup also provides
redundancy
Peer Peer
Peer
Peer
26. Security
Safety zones in the network
Master Monitoring as a Service
DMZ
Branch offices with “one-way”
availability
Poller Poller Poller
Monitored objects Monitored objects Monitored objects
DMZ Customer Network Secure Network
27. Cloud Monitoring
Master
Monitoring of publicly
available services
Cloud Poller
Services outside their own Poller
network monitor
Monitored objects Monitored objects
DMZ
29. Customer case study: Merlin Ahoy!
Company
Since late 1959, the Viking Line ships sail daily from
Finland to Sweden. The shipping company Viking Line
Abp based in Mariehamn, the capital of the autonomous
Åland Islands in Finland.
Challenge: “There have been improvements in functionality
Unreliable network uplinks to the core system, the when we communicate via satellite links.
It is easier then before to have the server on
change between cable connections, wireless and board to communicate with our main servers
satellite networks, depending on the location of the ship on shore”
Jonas Lindroos, IT department at Viking Line
make it difficult to monitor all on-board IP services such
as IPTV, VoIP, WiFi hotspot and infotainment.
Solution:
On each ship an op5 Monitor instance was installed. It
allows distributed monitoring of Viking Line, monitoring
all services on all vessels and provides centrally
managed monitoring.
If changes are made in the top headline, the Agenda have to Manuel be changed.
7 Consecutive Years of Growth 10+ New International Business Partners per year 700+ Customers in 6 years and growing 98% Renewal of subscription 100% Reference Accounts
In this page there is room for custom changes to fit your specific needs with the presentation. Explanation for each point: Why Scale - Referring to the common problems, issues regarding scaling monitoring. Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for company’s Scale out, not up – When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures Load-balancing – Se above Distributed Monitoring over several location – When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design. How we monitor cloud applications – Room for personal changes Several NOC´s - Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc
op5 3 Key features for a scalable monitoring solution. 1.Redundancy – ensure availability of the IT network 2. Load balancing – Share the workload 3. Distributed Monitoring . Geographical coverage Context op5 Monitor introduces a flexible affordable system that can scale to the needs of your business and adapt to the ever-changing challenges of the IT environment, regardless if it consists of small business critical IT to large enterprise monitoring needs with tens of thousands of services. By our standards a scalable Monitoring Solution refers to an easy to use system that is capable of constantly changing to fit the needs of your business without sacrificing stability or performance.
In this page there is room for custom changes to fit your specific needs with the presentation. Explanation for each point: Why Scale - Referring to the common problems, issues regarding scaling monitoring. Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for company’s Scale out, not up – When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures Load-balancing – Se above Distributed Monitoring over several location – When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design. How we monitor cloud applications – Room for personal changes Several NOC´s - Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc
In this page there is room for custom changes to fit your specific needs with the presentation. Explanation for each point: Why Scale - Referring to the common problems, issues regarding scaling monitoring. Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for company’s Scale out, not up – When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures Load-balancing – Se above Distributed Monitoring over several location – When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design. How we monitor cloud applications – Room for personal changes Several NOC´s - Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc