SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Monitoring Clusters and Load Balancers
Ensuring High Availability- A White Paper

                    Monitoring Clusters and Load Balancers




Introduction

Enterprises employ several mechanisms to optimize the performance of their networks
inorder to ensure high availability. Clustering for failover (Active /Passive mode) and
load-balancing (Active / Active mode) is a commonly adopted technique that supports
redundancy, session or database replication, and load balancing requests across the
servers in the cluster. Some networks have software or hardware load-balancers even
outside the cluster to increase horizontal scalability.

Most businesses happen over the Internet and Enterprises prefer clusters or invest in
load-balancers because of the fault-tolerant architecture. Critical servers and
applications such as database servers, exchange servers etc are hosted in clustered
environment for high availability. Further, load balancers like Big IP are plugged into the
network to distribute the load on servers to address the primary need of un-interruped
availability at all times. Service interruptions, planned or unplanned, are costly affairs
and are unacceptable for businesses of any size. Administrators exhaust almost every
resource within their means to keep their networks happy and available.


Service outage and its business impact

We are all huge consumers of various services the internet supports. Imagine a situation
where you are accessing your bank account to transfer funds to a friend's for an
emergency. And the site simply says, 'Oops! Sorry, the service is unavailable!'. The
message concludes politely requesting you to try accessing after a little while. As end
users, we would hate to be in this situation. That you get to call the customer service
and bombard at random and look out for an ATM or a branch office is some respite. The
damage caused to the bank is beyond economic repairs and the man behind the show,
the administrator/network engineer gets to listen to variety music for this huge goof-up.
This despite having invested in a cluster for a reliable service!


Forewarned is forearmed

The job is only half-done by employing clustering for failover or load-balancing. Unless
an administrator has a clear visibility of the good, bad, and the ugly components of his
network including the ones in that critical cluster, or the important resources on the
expensive load-balancers, its impossible to have an alert-free holiday!




                                                                                         1
Example 1: A two-node SQL cluster in an enterprise:




The reasons for an Active node to failover to a standby could be either a system or an
application failure, or both. An administrator must have a clear visibility into the system
and application performance, which is possible only when proactively monitored. In the
scenario discussed above, it is possible that the clustering controller instance has failed
causing the whole system to fall apart! Or, despite the Active node successfully failing-
over to the Passive, the Passive too fails due to insufficient resources! This mix-up could
have been avoided or identified a little earlier to reduce the damage by monitoring the
basic resources on these systems.

Example 2: A load-balancer distributing requests across a few servers:




Despite reduntant servers set up for load-balancing, imagine a hardware resource failure
on the load-balancer leading to the service unavailability! The user requests never make
it to the server even when the service is up and running fine!



                                                                                         2
Reduce the damage

The purpose of clustering is lost if the resources are not constantly monitored. Even as
the administrator tries to ensure that the end-users do not 'feel' any service failure, he
must quickly identify the cause for the failover from active to the passive node, or why
the load exerted on a particular server is on the high. So, all the components or
resources that need to run for the clustering to work well, needs monitoring. This
includes the Cluster service on the nodes, the dependent services, the system resources
on the load-balancer, response time of the individual devices etc.Contant automated
monitoring of key components helps reduce the damage and helps realize the goal of
ensuring high availability at all times.

The key resources include:

Availability of the nodes: A detailed availability report indicating if the node is
unavailable due to a dependent device failure or if the node is pulled down for
maintenance.




Response time of the nodes: Response time of the nodes at any given time, and its
average response time indicating the load on it.




Services availability and response time: Availability of the cluster service and its
related services on the nodes.




System Resource utilizations: A constant check on the performance of the hardware
resources because the last thing you want is insufficient resources rendering a critical
service unavailable!




                                                                                        3
Service Parameters: Critical parameters of a service that can lead to a potential
failure.




System Events pertaining to the cluster: Keeping a tab on the system events
including the application events so that there are no sudden surprises and all avenues of
fault are watched.




Availability and Performance of the load balancer: Ensureing the basic availability
and responsiveness of the load balancer.




System resources on the load balancer: Monitoring the critical resources on the
load-balancers to identify and problem indicators much ahead.




                                                                                       4
Cluster Groups (Business Views): A holistic view of the nodes in a cluster with an
ability to drill down to the root cause. This provision to visualize a cluster helps to
understand the health of the cluster at a glance.




                                            -

Summary

ManageEngine OpManager is a network monitoring software that monitors all the
resources on your LAN and WAN. The performance and fault management capability of
OpManager helps identify performance bottlenecks quickly. Its ability to drill-down to the
root cause of a fault and the huge custom-capability, makes OpManager a preferred
solution among thousands of network administrators world-wide. A few useful plug-ins
and add-ons such as NCM Plug-in, NetFlow Plug-in, VoIP add-on, and the provision to
easily integrate with other applications in the management suite such as ServiceDesk
Plus, makes it a one-stop shop for all your network and IT management needs.
Visit www.opmanager.com to test drive your 30 days free trial versions of ManageEngine
OpManager.




                                                                                        5

Contenu connexe

Tendances

Server and application monitoring webinars [Applications Manager] - Part 3
Server and application monitoring webinars [Applications Manager] - Part 3Server and application monitoring webinars [Applications Manager] - Part 3
Server and application monitoring webinars [Applications Manager] - Part 3ManageEngine, Zoho Corporation
 
Application-aware Network Performance Management with OpManager
Application-aware Network Performance Management with OpManagerApplication-aware Network Performance Management with OpManager
Application-aware Network Performance Management with OpManagerManageEngine, Zoho Corporation
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2ManageEngine, Zoho Corporation
 
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow Analyzer
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow AnalyzerIT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow Analyzer
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow AnalyzerManageEngine, Zoho Corporation
 
The Changing Landscape in Network Performance Monitoring
The Changing Landscape in Network Performance Monitoring The Changing Landscape in Network Performance Monitoring
The Changing Landscape in Network Performance Monitoring Savvius, Inc
 
Role of OpManager in event and fault management
Role of OpManager in event and fault managementRole of OpManager in event and fault management
Role of OpManager in event and fault managementManageEngine
 
Grid Control
Grid ControlGrid Control
Grid Controlbcole23
 
Dot Net performance monitoring
 Dot Net performance monitoring Dot Net performance monitoring
Dot Net performance monitoringKranthi Paidi
 
IRJET- An Improved Weighted Least Connection Scheduling Algorithm for Loa...
IRJET-  	  An Improved Weighted Least Connection Scheduling Algorithm for Loa...IRJET-  	  An Improved Weighted Least Connection Scheduling Algorithm for Loa...
IRJET- An Improved Weighted Least Connection Scheduling Algorithm for Loa...IRJET Journal
 
Mafiree Services 2016 (1)
Mafiree Services 2016 (1)Mafiree Services 2016 (1)
Mafiree Services 2016 (1)linyashaalu
 
Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...ManageEngine, Zoho Corporation
 
Network access protection ppt
Network access protection pptNetwork access protection ppt
Network access protection pptDasarathi Dash
 
Ten questions to ask before choosing SCADA software
Ten questions to ask before choosing SCADA softwareTen questions to ask before choosing SCADA software
Ten questions to ask before choosing SCADA softwareTrihedral
 
Network Access Protection
Network Access ProtectionNetwork Access Protection
Network Access ProtectionZernike College
 
Availability tactics
Availability tacticsAvailability tactics
Availability tacticsahsan riaz
 

Tendances (20)

Server and application monitoring webinars [Applications Manager] - Part 3
Server and application monitoring webinars [Applications Manager] - Part 3Server and application monitoring webinars [Applications Manager] - Part 3
Server and application monitoring webinars [Applications Manager] - Part 3
 
Application-aware Network Performance Management with OpManager
Application-aware Network Performance Management with OpManagerApplication-aware Network Performance Management with OpManager
Application-aware Network Performance Management with OpManager
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2
 
OpManager - Technical overview
OpManager - Technical overviewOpManager - Technical overview
OpManager - Technical overview
 
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow Analyzer
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow AnalyzerIT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow Analyzer
IT Solutions Provider in Kosovo uses Bandwidth monitoring, NetFlow Analyzer
 
The Changing Landscape in Network Performance Monitoring
The Changing Landscape in Network Performance Monitoring The Changing Landscape in Network Performance Monitoring
The Changing Landscape in Network Performance Monitoring
 
Role of OpManager in event and fault management
Role of OpManager in event and fault managementRole of OpManager in event and fault management
Role of OpManager in event and fault management
 
5 reasons to use OpManager Plus
5 reasons to use OpManager Plus5 reasons to use OpManager Plus
5 reasons to use OpManager Plus
 
Grid Control
Grid ControlGrid Control
Grid Control
 
Dot Net performance monitoring
 Dot Net performance monitoring Dot Net performance monitoring
Dot Net performance monitoring
 
IRJET- An Improved Weighted Least Connection Scheduling Algorithm for Loa...
IRJET-  	  An Improved Weighted Least Connection Scheduling Algorithm for Loa...IRJET-  	  An Improved Weighted Least Connection Scheduling Algorithm for Loa...
IRJET- An Improved Weighted Least Connection Scheduling Algorithm for Loa...
 
Mafiree Services 2016 (1)
Mafiree Services 2016 (1)Mafiree Services 2016 (1)
Mafiree Services 2016 (1)
 
Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...
 
5 reasons why you need a network monitoring tool
5 reasons why you need a network monitoring tool5 reasons why you need a network monitoring tool
5 reasons why you need a network monitoring tool
 
Improving User Experience with Applications Manager
Improving User Experience with Applications ManagerImproving User Experience with Applications Manager
Improving User Experience with Applications Manager
 
Network access protection ppt
Network access protection pptNetwork access protection ppt
Network access protection ppt
 
Ten questions to ask before choosing SCADA software
Ten questions to ask before choosing SCADA softwareTen questions to ask before choosing SCADA software
Ten questions to ask before choosing SCADA software
 
Leading Indian IT Services Company uses OpManager
Leading Indian IT Services Company uses OpManagerLeading Indian IT Services Company uses OpManager
Leading Indian IT Services Company uses OpManager
 
Network Access Protection
Network Access ProtectionNetwork Access Protection
Network Access Protection
 
Availability tactics
Availability tacticsAvailability tactics
Availability tactics
 

En vedette (8)

Storage School 1
Storage School 1Storage School 1
Storage School 1
 
Best Practices for Planning your Datacenter
Best Practices for Planning your DatacenterBest Practices for Planning your Datacenter
Best Practices for Planning your Datacenter
 
RAID CONCEPT
RAID CONCEPTRAID CONCEPT
RAID CONCEPT
 
Firewall presentation
Firewall presentationFirewall presentation
Firewall presentation
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)
 
Web Servers (ppt)
Web Servers (ppt)Web Servers (ppt)
Web Servers (ppt)
 
Firewall
Firewall Firewall
Firewall
 
Firewall presentation
Firewall presentationFirewall presentation
Firewall presentation
 

Similaire à Monitoring Clusters and Load Balancers

PriyaDharshini distributed operating system
PriyaDharshini distributed operating systemPriyaDharshini distributed operating system
PriyaDharshini distributed operating systemPriyadharshiniVS
 
Vanmathy distributed operating system
Vanmathy distributed operating system Vanmathy distributed operating system
Vanmathy distributed operating system PriyadharshiniVS
 
Contrasting High Availability, Fault Tolerance, and Disaster Recovery
Contrasting High Availability, Fault Tolerance, and Disaster RecoveryContrasting High Availability, Fault Tolerance, and Disaster Recovery
Contrasting High Availability, Fault Tolerance, and Disaster RecoveryMaryJWilliams2
 
Resiliency vs High Availability vs Fault Tolerance vs Reliability
Resiliency vs High Availability vs Fault Tolerance vs  ReliabilityResiliency vs High Availability vs Fault Tolerance vs  Reliability
Resiliency vs High Availability vs Fault Tolerance vs Reliabilityjeetendra mandal
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL ServerBob Roudebush
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsHimanshu Sahu
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET Journal
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET Journal
 
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...CA Technologies
 
"up.time" New Release from uptime software - May, 2010
"up.time" New Release from uptime software - May, 2010"up.time" New Release from uptime software - May, 2010
"up.time" New Release from uptime software - May, 2010guesta93734
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architectureFaren faren
 
An Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud ComputingAn Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud Computingijtsrd
 
Getting the Most Value from VM and Compliance Programs white paper
Getting the Most Value from VM and Compliance Programs white paperGetting the Most Value from VM and Compliance Programs white paper
Getting the Most Value from VM and Compliance Programs white paperTawnia Beckwith
 
NFV resiliency whitepaper - Ali Kafel, Stratus Technologies
NFV resiliency whitepaper - Ali Kafel, Stratus TechnologiesNFV resiliency whitepaper - Ali Kafel, Stratus Technologies
NFV resiliency whitepaper - Ali Kafel, Stratus TechnologiesAli Kafel
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Introdunction to Network Management Protocols - SNMP & TR-069
Introdunction to Network Management Protocols - SNMP & TR-069Introdunction to Network Management Protocols - SNMP & TR-069
Introdunction to Network Management Protocols - SNMP & TR-069William Lee
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGLOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGIRJET Journal
 

Similaire à Monitoring Clusters and Load Balancers (20)

PriyaDharshini distributed operating system
PriyaDharshini distributed operating systemPriyaDharshini distributed operating system
PriyaDharshini distributed operating system
 
Vanmathy distributed operating system
Vanmathy distributed operating system Vanmathy distributed operating system
Vanmathy distributed operating system
 
Contrasting High Availability, Fault Tolerance, and Disaster Recovery
Contrasting High Availability, Fault Tolerance, and Disaster RecoveryContrasting High Availability, Fault Tolerance, and Disaster Recovery
Contrasting High Availability, Fault Tolerance, and Disaster Recovery
 
Resiliency vs High Availability vs Fault Tolerance vs Reliability
Resiliency vs High Availability vs Fault Tolerance vs  ReliabilityResiliency vs High Availability vs Fault Tolerance vs  Reliability
Resiliency vs High Availability vs Fault Tolerance vs Reliability
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
 
Maximize the efficiency of your server farm
Maximize the efficiency of your server farmMaximize the efficiency of your server farm
Maximize the efficiency of your server farm
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
 
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...
Pre-Con Education: What Is CA Unified Infrastructure Management and what's ne...
 
"up.time" New Release from uptime software - May, 2010
"up.time" New Release from uptime software - May, 2010"up.time" New Release from uptime software - May, 2010
"up.time" New Release from uptime software - May, 2010
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
An Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud ComputingAn Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud Computing
 
Getting the Most Value from VM and Compliance Programs white paper
Getting the Most Value from VM and Compliance Programs white paperGetting the Most Value from VM and Compliance Programs white paper
Getting the Most Value from VM and Compliance Programs white paper
 
NFV resiliency whitepaper - Ali Kafel, Stratus Technologies
NFV resiliency whitepaper - Ali Kafel, Stratus TechnologiesNFV resiliency whitepaper - Ali Kafel, Stratus Technologies
NFV resiliency whitepaper - Ali Kafel, Stratus Technologies
 
Atifalhas
AtifalhasAtifalhas
Atifalhas
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Introdunction to Network Management Protocols - SNMP & TR-069
Introdunction to Network Management Protocols - SNMP & TR-069Introdunction to Network Management Protocols - SNMP & TR-069
Introdunction to Network Management Protocols - SNMP & TR-069
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGLOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING
 

Dernier

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 

Dernier (20)

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 

Monitoring Clusters and Load Balancers

  • 2. Ensuring High Availability- A White Paper Monitoring Clusters and Load Balancers Introduction Enterprises employ several mechanisms to optimize the performance of their networks inorder to ensure high availability. Clustering for failover (Active /Passive mode) and load-balancing (Active / Active mode) is a commonly adopted technique that supports redundancy, session or database replication, and load balancing requests across the servers in the cluster. Some networks have software or hardware load-balancers even outside the cluster to increase horizontal scalability. Most businesses happen over the Internet and Enterprises prefer clusters or invest in load-balancers because of the fault-tolerant architecture. Critical servers and applications such as database servers, exchange servers etc are hosted in clustered environment for high availability. Further, load balancers like Big IP are plugged into the network to distribute the load on servers to address the primary need of un-interruped availability at all times. Service interruptions, planned or unplanned, are costly affairs and are unacceptable for businesses of any size. Administrators exhaust almost every resource within their means to keep their networks happy and available. Service outage and its business impact We are all huge consumers of various services the internet supports. Imagine a situation where you are accessing your bank account to transfer funds to a friend's for an emergency. And the site simply says, 'Oops! Sorry, the service is unavailable!'. The message concludes politely requesting you to try accessing after a little while. As end users, we would hate to be in this situation. That you get to call the customer service and bombard at random and look out for an ATM or a branch office is some respite. The damage caused to the bank is beyond economic repairs and the man behind the show, the administrator/network engineer gets to listen to variety music for this huge goof-up. This despite having invested in a cluster for a reliable service! Forewarned is forearmed The job is only half-done by employing clustering for failover or load-balancing. Unless an administrator has a clear visibility of the good, bad, and the ugly components of his network including the ones in that critical cluster, or the important resources on the expensive load-balancers, its impossible to have an alert-free holiday! 1
  • 3. Example 1: A two-node SQL cluster in an enterprise: The reasons for an Active node to failover to a standby could be either a system or an application failure, or both. An administrator must have a clear visibility into the system and application performance, which is possible only when proactively monitored. In the scenario discussed above, it is possible that the clustering controller instance has failed causing the whole system to fall apart! Or, despite the Active node successfully failing- over to the Passive, the Passive too fails due to insufficient resources! This mix-up could have been avoided or identified a little earlier to reduce the damage by monitoring the basic resources on these systems. Example 2: A load-balancer distributing requests across a few servers: Despite reduntant servers set up for load-balancing, imagine a hardware resource failure on the load-balancer leading to the service unavailability! The user requests never make it to the server even when the service is up and running fine! 2
  • 4. Reduce the damage The purpose of clustering is lost if the resources are not constantly monitored. Even as the administrator tries to ensure that the end-users do not 'feel' any service failure, he must quickly identify the cause for the failover from active to the passive node, or why the load exerted on a particular server is on the high. So, all the components or resources that need to run for the clustering to work well, needs monitoring. This includes the Cluster service on the nodes, the dependent services, the system resources on the load-balancer, response time of the individual devices etc.Contant automated monitoring of key components helps reduce the damage and helps realize the goal of ensuring high availability at all times. The key resources include: Availability of the nodes: A detailed availability report indicating if the node is unavailable due to a dependent device failure or if the node is pulled down for maintenance. Response time of the nodes: Response time of the nodes at any given time, and its average response time indicating the load on it. Services availability and response time: Availability of the cluster service and its related services on the nodes. System Resource utilizations: A constant check on the performance of the hardware resources because the last thing you want is insufficient resources rendering a critical service unavailable! 3
  • 5. Service Parameters: Critical parameters of a service that can lead to a potential failure. System Events pertaining to the cluster: Keeping a tab on the system events including the application events so that there are no sudden surprises and all avenues of fault are watched. Availability and Performance of the load balancer: Ensureing the basic availability and responsiveness of the load balancer. System resources on the load balancer: Monitoring the critical resources on the load-balancers to identify and problem indicators much ahead. 4
  • 6. Cluster Groups (Business Views): A holistic view of the nodes in a cluster with an ability to drill down to the root cause. This provision to visualize a cluster helps to understand the health of the cluster at a glance. - Summary ManageEngine OpManager is a network monitoring software that monitors all the resources on your LAN and WAN. The performance and fault management capability of OpManager helps identify performance bottlenecks quickly. Its ability to drill-down to the root cause of a fault and the huge custom-capability, makes OpManager a preferred solution among thousands of network administrators world-wide. A few useful plug-ins and add-ons such as NCM Plug-in, NetFlow Plug-in, VoIP add-on, and the provision to easily integrate with other applications in the management suite such as ServiceDesk Plus, makes it a one-stop shop for all your network and IT management needs. Visit www.opmanager.com to test drive your 30 days free trial versions of ManageEngine OpManager. 5