1. Event Management and Monitoring
Program
Strategy
Prepared by: Jim Gingras, Event Management and
Monitoring Manager
2. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
2
Table of Contents
1. Event Management and Monitoring Strategy.................................................................... 3
1.1 Event Management and Monitoring Overview ............................................................... 3
1.2 Stakeholders ................................................................................................................... 4
1.3 Event Management Program Processes........................................................................ 5
1.3.1 Event Management Process....................................................................................... 6
1.3.2 Event Monitoring ......................................................................................................... 7
1.3.3 Designing Manageable Applications .......................................................................... 8
1.4 Event Management Metrics............................................................................................ 8
1.5 Roadmap......................................................................................................................... 9
1.5.1 Current State to Future State.................................................................................... 10
Appendix A: ABusiness Value proposition for Event Management................................... 12
3. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
3
1. Event Management and Monitoring Strategy
The strategyfor EventManagementandMonitoringisto take advantage of the existingeventand
monitoringprocessesandtoolsandbuildonthemto propel ITat CORPORATEto the nextlevelof IT
capabilities,demonstratingbusinessvalue throughmanagementandmonitoringof ITservices.
Appendix_A showsanexampleof howEventManagementdemonstratesbusinessvalue.
The strategyinvolvesthe creation of anEventManagementprogramand the associatedprojects that
are executed overthe nexttwoyears.
The remainder of thisdocumentdescribesthe EventManagement andProgramandthe supporting
activitiesrequiredtoensure it isoperatingasdesigned. These include:
Define EventManagementandMonitoring
Define the stakeholdersforeventmanagementandmonitoring
Define the highlevel processesandassociatedactivitiesinthe EventManagementProgram
Define metricsfordeterminingthe statusof the processes
Define aroadmapof the actionable andmeasureable projects/initiatives requiredtoestablish
the eventmanagementprogram
EventManagementProgram Definition: The EventManagementProgramisresponsible forthe
managementof the EventManagementprojectsandmonitoringsystemsrequiredtodeterminethe
statusof the servicesITprovides.
EventManagementandMonitoringDefinition:EventManagementandMonitoringisthe processof
managingIT systemandusereventstoprovide the appropriate control actionwhileprovidinganear
real-time viewof the statusof the IT services.
1.1 EventManagementand MonitoringOverview
EventManagement’svalue tothe businessis notdirectinthatit cannotgenerate income forthe
business. The mostrelevantmeasurementstothe businessare:
DecreasedMeanTime To Repair– decreaseddowntime whenincidents/problemsoccurdue to
the notificationof personnel withthe appropriate skill-level soonerandwiththe correct
informationtoresolve issues,wheneverpossible,before theyoccur.
IncreasedMeanTime BetweenFailures –analyzingtrendedeventinformation todetermine
upcomingoutagesandremediate thembefore theyoccur(predictivemonitoring)
Service Level Agreementsare metorexceeded –due todecreaseddowntime
DecreasedITsupportcost – due to appropriate personnel beingnotified,betteruse of
knowledge fromeventsinthe environment,andfewerpersonnelrequiredtoresolve
incidents/problems
4. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
4
EventManagementisthe vital hubon whichall processand tool integrationisdeveloped. Event
monitoringencompassesall the activitiesthatare requiredtoensure adevice or ConfigurationItem1
(CI)
isworkingcorrectlyregardlessof whetheritisgeneratingevents.
The foundational elementsforeventmanagement are the systemsandusereventsthatare createdby
CIs or monitoringtools. Inordertoenable monitoringITservicesthese events are mappedtoall the
relatedCIs of a specificITService.Goingforwardaservice view will be availabletoall managementand
service/supportpersonneltoshowthe statusand configurationinformation inaneasyto understand
format.
1.2 Stakeholders
Position Name Description
Event Management Event Management Process Activities
IncidentManagement Automated Incident Management for events
Problem Management Troubleshootingand enhancements for Known Errors
Availability Management Monitoring Requirements
Capacity Management MonitoringRequirements
Operations MonitoringRequirements
Steering Committee Program Management and Reporting
IT Instrumentation MonitoringTools and Reporting
InfrastructureHosting MonitoringRequirements, MonitoringTools
Software Solutions and Support Systems Administration
Architecture Instrumentation of Internal Applicationsand RJSF
design and ServiceModel
Security MonitoringRequirements for Security
Service Management MonitoringRequirements
Product Management Service Model and MonitoringRequirements
Release Management Service Model
Configuration Management Service Model
1 Configuration Items includeservices,applications,or components as per CORPORATE servicemodel in the CMDB
5. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
5
Service Level Mgmt. Service Model and Service Level Requirements
Software Solutions and Support Internal Software Development, RJSF
Table 1: IT Event Management Stakeholders
The stakeholdersforthe EventManagementProgramare managementandthe processownersforthe
ITIL processesof availability,capacity,incident,problem,andeventmanagement. Additional
stakeholdersincludethe administrative groupswhomustmanage the toolsthatare requiredtodeliver
the eventmanagementservicesanddevelopmanageable applications.Allstakeholdersare requiredto
agree on service viewsthatprovide accurate andrelevantservice statustoservice/supportpersonnelin
supportof the business.
1.3 EventManagementProgramProcesses
The Event Managementprogramisresponsibleforthe EventManagementprocessandforthe direction
of the EventMonitoringenvironment. Italsointegrateswiththe ITILService Designprocessesof
Availability,CapacityandSecurityManagementformonitoringrequirementsandcapabilities,andinthe
IncidentandProblemManagementprocessesasinputsandoutputsforautomatedremediation or
notificationactivitiesbasedon significantevents.EventManagementplaysasignificantrole inthe
ContinuousServiceImprovementprocessesasa pointof research,auditandverification.
ConsiderationsforEventManagement are alsorequiredaspartof applicationdevelopmentprocesses
(e.g.RJSF),startingwithapplicationdesignand development.Additionally,the creationand
managementof service basedviewsenablesthe nextgenerationof eventmonitoringforservice status
events.
6. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
6
1.3.1 Event Management Process
The Event Managementprocessisthe processthatmonitorsall eventsthatoccur throughthe IT
Infrastructure toallowfornormal operationandalsoto detectandescalate exceptionconditions.
Figure 1: ITIL V3 Event Management Process
The figure showsthatthe eventmanagementprocessisresponsible fordetection,filtering,triggering,
alerting,automatedresponse andreviewingactions. The triggersandautomatedresponse will control
the scope of the workrequiredbythe eventmanagementprocess. Inotherwords,the more triggers
and automatedresponsesthatare required,the more workmustbe accomplishedtoautomate the
response andincrease the businessvalue.
One of the keystoa successful EventManagementprogramistodefine whichactionstriggerthe event
managementprocessandmanagingthe numberandpriorityof those events. Triggersinclude:
Exceptionstoanylevel of ConfigurationItem(CI) performancedefinedindesignspecifications,
SLAs,OLAs andSOPs
Exceptionstoan automatedprocedure orprocess – monitoringanautomatedworkflow
ExceptionwithinaITprocessthat isbeingmonitored –(e.g.serverbuild)
The completionof anautomatedtaskor job
7. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
7
A statuschange in a device ordatabase record dependingonthe granularityof the monitoring
requirements
Accessof an applicationordatabase bya useror automatedprocedure orjob
A situationwhere adevice,database,orapplication,orservice hasreachedapre-defined
performance threshold.
For thecurrent statethe mostimportantaspectof the EventManagementprocessisthat all typesof
alerts will result in an incident being opened in the Service Desk.
1.3.2 Event Monitoring
EventMonitoringcoversa broadspectrumof all the monitoringcapabilitiesacrossthe CORPORATEIT
enterprise.The EventManagementarchitecture deployedtodayusesaManagerOf Managers(MOM) to
gathereventsfromall the IT ManagementDomains. The majorIT Domainsare Application,Database,
End User,Facilities,Network,Security,ServerPlatform(whichincludesvirtual),Storage,Telephony,and
Workload.
Figure 2: Manager of Managers Architecture
Althoughall eventsare monitored,onlysignificanteventsare managedbecause theyare meaningful.
Thisis accomplishedthroughfilteringatthe ITDomainlevel toidentifyeventsthatare recognizedas
affectingthe statusof ConfigurationItems(CI) (i.e.Service,Application, andComponent),automation
processesorothersignificantoccurrence. The Managerof Managers thencorrelatesthe eventsfrom
each of the IT Domainsdeterminesthe course of actionandexecutesanautomatedresponse. Forall
significanteventsanincidentwill automaticallybe opened,assignedandprioritizedinthe Service Desk.
A majorportionof the EventManagementProgramincludes creatinginterfacesthatenable monitoring
at the serviceslevel. The bestapproachisto start witha few significantservices todemonstrate the
businessvalue of monitoringservices.
8. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
8
1.3.3 Designing Manageable Applications
In orderto optimize operational managementof applicationsthe applicationsmustbe designedwith
operationsinmind. Thisrequires thatmonitoringrequirements are identifiedduringthe application
designphase of the applicationlifecycle andinstrumentedduringthe applicationdevelopmentcycle.
One of the keydeliverablesthatenablesthistype of monitoringisthe “healthmodel”whichrelatesthe
statusof individual componentstothe statusof the overall applicationorservice. Forinternally
developedapplications CORPORATEhasembraced the use of managementpacksasa meansof ensuring
the supportabilityof applications. Thisinitiative isinline withMicrosoft’sDesignforOperations
methodology.
1.4 EventManagementMetrics
Once EventManagementisin place a baseline mustbe establishedastothe currentperformance levels
and value tothe organizationintermsof optimizingoperationsactivitiesandMeanTime To Repair. The
followingmetricsare recommendedbyITILv3:
1. Numberof eventspercategory – IT Domain,Service,Application
2. Numberof eventsbysignificance –Exception(Critical orMajor),Warning(Minor),or
Informational (non-exception/warningapplicationmessages)
3. Numberandpercentage of eventsthatrequiredhumaninterventionandwhetherthiswas
performed –incidentsare notopened
4. Numberandpercentage of eventsthatresulted inincidentsorchanges
5. Numberandpercentage of eventscausedbyexistingproblemsorKnownErrors
6. Numberandpercentage of replicatedorduplicatedevents
7. Numberandpercentage of eventsindicating performance issues
8. Numberandpercentage of eventsindicatingpotentialavailabilityissues
9. Numberandpercentage of eachtype of eventperplatformor application
10. Numberandratio of eventscomparedwiththe numberof incidents
Furtherresearchmustbe done to determine how toderive thesemetricsandassociatedreportswith
the existingmonitoringtools. Service Deskandthe Managerof Managers are goodplacesto beginthis
work. These metricswill enablethe “tuning”of the eventmanagementsystemthroughthe adjustment
of the filtersandcorrelationengine inthe domainmanagersandManagerof Managers,respectively.
9. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
9
1.5 Roadmap
Figure 3: Event Management Program Roadmap
The highlevel roadmapforthe IT EventManagementProgram has eightprojects:
1. Define the EventManagement strategyand programincludingdeliverables:
a. EventManagementStrategy
b. EventManagementProcess
c. EventHandlingPoliciesandStandards
i. Notification/EscalationpoliciesandStandards
d. EventManagementprojects/initiatives
e. Eventmanagementprogramroadmap
2. Establish EventManagementProgramthrough:
a. Ratificationof the eventmanagementandmonitoringprocessesandactivities
i. Ratificationof eventhandlingpoliciesandstandards
b. Communicate andgathersupportforeventmanagementprogramactivitiesin
collaborationwithstakeholders toagree ondeliverables
c. Establishatimeline forcompletingthe workactivities anddeliverables
3. Integrationwith ITILotherITIL managementprocessesincluding:
a. IncidentManagementforautomation of incidentmanagementprocessactivitieswhere
applicable.
i. Automaticallymanage incidentsfromuserevents(transactions)
10. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
10
ii. Automaticallymanage incidentsfromsystemevents
iii. Automaticallymanage incidentsfrom service events
b. AvailabilityManagement–foravailabilitymonitoringrequirementsof service
components
c. CapacityManagement – for the capacitymonitoringrequirementsof the service
components
d. ProblemManagement –for eventinformationinthe KnownErrorDatabase and for
verificationandauditof the rootcause of problems.
4. Integrationwith applicationdesignanddevelopment –for internallydevelopedapplications
throughIT architecture andSoftware Engineering
a. Adoptionof managementpacksformonitoringapplications
i. MicrosoftManagementPacksfor internallydevelopedapplicationsonthe
Windowsplatform
b. Propagationof configurationandstatusinformationtoservice views basedonthe
service andhealthmodels.
5. Integrationwiththird-partyapplications
a. Adoptionof managementpack methodologyformanagementof eventsfromthird-
party applications
i. Create deliverablesthatare platformdependent
ii. Coordinate withinstrumentationandsystemsupportforinstrumentation
lifecycle (design,develop,test,deploy)
6. Consolidation/Correlation of Domainlevel events
a. Completionof integrationof critical,majorandminoreventsacrossall ITDomains to
the Manager of Managers.
b. Implementationof correlationpolicies/rulestoforwardsignificanteventsforincidents
and alerts.
7. Integrationwiththe ITService/Supportgroupsthroughthe creationandmanagementof service
viewsandrelatedconfigurationitems.
a. Role basedservice dashboardsforusergroups
8. Continuousprocessimprovement
a. Auditandverifyqualityandefficiencyof existingeventmanagement andmonitoring
systemsandadjustfiltersandcorrelationenginestostreamlineautomation.
1.5.1 Current State to Future State
EventManagementandMonitoringhas beeninplace foryearsat CORPORATE. It has maturedto a level
where eventsare triggeringworkloadandotherautomation/remediation,aswell as,automated
notification/escalation. Asfaras a maturitylevel, CORPORATEisbetweenreactive andproactive. There
are specificcaseswhere we are atthe predictive level (monitoringbatch),butthisisthe exception.
There are manymanagement/monitoringtoolsinplace acrossall the IT Domains. The two majortasks
that mustbe accomplishedinorderforthe EventManagementprogramtobe successful are:
11. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
11
Mature IT monitoringfromareactive/proactivelevel toaproactive/predictivematuritylevel
throughautomationof eventresponses forall significantevents.
Consolidate andcorrelate all the eventsintomeaningful statusinformationfor CIslike
applications,systems andITservices.
The biggestenablergoingforwardisthe use of ITIL v3 as the frameworkformanagingIT. Thisprovides
a commonvernacularand helpsestablish acceptedgovernance processesforeventmanagementand
monitoring. Use of a frameworkcombinedwiththe use of the servicesconstructtorepresentITvalue
to the business provides anewlevel of eventmanagementand monitoringforCORPORATE.
12. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
12
Appendix A: A Business Value proposition for Event
Management2
In simple termseventmanagementenablesreal timemonitoringof the infrastructure (i.e.listeningfor
thingsthat are wrong),anduseseventcorrelationtofilter,de-duplicate andcombine eventstodetect
more seriousissues. EventManagementisimportantbecause itwill:
Improve time toresolve throughcause identification
Improve visibilitytoreal time
Enable proactive managementof impacttothe business(ITcallsthe business)
Improve SecurityManagement
Studiesshowthatfaultdetectionandroot-cause analysis are the mostimportantsystems management
capabilities. Studiesalsoshowthatthe mosttime-consumingsystems managementtasksare diagnosis
and troubleshooting. EventManagementenablesproactive responsestoeventsandenablesautomatic
trackingand resolutionformost systemevents. The scenariosbelow show the difference whenevent
managementisimplementedandwhenitisnot3
.
2 Taken from Data Network Event Management and ITIL, CISCO, Keith SInclair
3 The scenarios belowusea network device issueas the example. CORPORATE is monitoringall infrastructure
domains atsome level as described in the Event monitoringsection of this document.
13. Event Managementand Monitoring Program Strategy
Proprietary and Confidential
13
Figure 4: Scenario Situation normal (w/o Event Management)
Figure 5: Scenario - Situation with Event Management
The bottom line isthatEventmanagementallowsITtoresolve issuesbefore the usersare affected.
Armedwithreportsthatshowthe effectivenessof EventManagement,ITcanshow the businesshow
effectivetheyare anddemonstrate real businessvalue.
Appendix_A_Back