The document discusses modernizing IT operations management for DevOps success. It describes how organizations are adopting a "bi-modal" approach with both traditional and agile systems. This requires bridging between traditional IT management and cloud-native approaches. It provides examples of how roles, processes, tools, and capabilities need to evolve and integrate to support both traditional and cloud-native environments. Specifically, it outlines how event management, topology mapping, machine learning, collaboration, and automation can help enable this transformation.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Modernize and Simplify IT Operations Management for DevOps Success
1. Modernize and Simplify IT Operations Management
for DevOps Success
Robert Hodges
Solution Architect – IBM Cloud Expert Labs
hodgesr@us.ibm.com
Kristian Stewart
STSM - Cloud Event Management and Analytics
kristian@uk.ibm.com
2. Agenda
22018 IBM Corporation
Proven Patterns in
Operations
Management
Market Drivers for
Transformation
Enablers and
Challenges for
Operations
throughout the
transformation
Questions and
Answers
3. “Bi-Modal” IT requires a new management approach
Organizations are moving to a more flexible way of transforming their business, shifting focus to rapid Devops evolvement of
Systems of Engagement that touch clients, supported by less dynamic, but still crucial Systems of Record.
Systems of Record
Operational Excellence
Systems of Engagement
Transformation & Differentiation
Agile Management
Traditional Management
Traditional Model Agile Model
Some, big IT projects Many, small
2-3 years Time to go live 2-3 months
Lower Change rate Higher
Centralized Governance Decentralized
Cloud-ready, on-prem Tools Cloud-Native
ITIL, CMMI Processes DevOps, Lean
Hybrid Ops
Hybrid Apps
Source: The agile CIO: Mastering digital disruption. http://blog.kpmg.ch/the-agile-cio-mastering-digital-disruption/
5
4. Bridging from traditional IT Management to Cloud and DevOps
1. Re-invent ITIL for cloud native
2. Modernize ITIL for cloud enabled
3. Integrate across to enable a seamless experience that results in confidence
Traditional OPS
Process Driven (e,g ITIL)
DevOps
Tool Chain
Service Transition
• Change
• Release Service Operations
• Monitoring / Event
• Incident
• Problem
• Service Request
• Security and Compliance
For example:
Build monitoring scripts Deploy scripts Monitor
Still need to accomplish the same goals – but the method is different.
Embedded and automated
Shared Responsibility
What used to be done purely in Operations, is now accomplished across the
DevOps lifecycle
Information needs to be integrated across both old and new
Service Strategy
Service Design
Service Improvement
Think Code Deliver Run Manage Learn
6
5. Traditional Roles Change
With workloads, management organizations & processes are impacted by a transition to cloud, and tools need to integrate
across both environments.
Enterprise
Traditional IT
Large Enterprise
Moving to Cloud
Small
Startup (Skunkworks)
Dev – App development, deploy, test…
Note: QA team omitted to keep simple
Ops – Services & middleware: Request driven
provisioning, configuration, patch..
Ops – infrastructure: Procurement,
provisioning, OS patch, etc.
Enterprise DevOps teams – delivering
Microservices as FULL stack deployments.
Design tor resiliency, testing, QA.
Full Stack DevOps Team
Site Reliability Engineers - incoporate scalability,
reliability, and performance right into the software
code. Centralized policy, governance, compliance,
audit. Interface with cloud provider. Define
acceptance criteria for new Microservices, automated
tests, audits.
First Responders – rapid restoration of service
First & Third party cloud provider (ex: IBM) Third party cloud provider (ex: IBM)
co-exist
Application
Services / Middleware
Infrastructure
Notes:
DevOps personas need a view of their project
SRE / Environment Ops personas need a view across multiple DevOps teams
Traditional IT Ops needs view across both traditional and cloud environments 7
6. Traditional IT Cloud-Enabled IT Cloud-Native IT
Manual Change & Release
Some automation
Automated build / deploy
of VMs & Containers
Stage Gates, Co-ordinated Releases
Continuous Integration
Continuous delivery to production
Cloud-native runtimes (node.js)
Pipeline per micro-service
Traditional Processes Change
Change & Release Management <-> Automated Continuous Delivery
CAB
Assessment
&
Approval
Change Record
Audit reports
CMDB(s)
/ CMS
Topology / Relationship Graph to
Identify the “blast radius” of a configuration change.
Updates pushed
automatically
into the graphCluster / Container-
Manger push task info
automatically into the
graph
Discovery
„Scanners“
CAB
8
7. Traditional Tools Change
Example: ChatOps – More than just a chat tool; a cultural shift towards collaboration, between Humans & Tools
Instant Collaboration between SMEs …
• Various Operations roles
• Developers
• Vendor / Provider
… and between Humans and Applications
(ITSM, DevOps, etc.) through Bots
Persistent audit of communication
Traditional
Help Desk
tools
L1
L2
L3, SMEs
Modern
ChatOps
tools
Plus
• Email
• Phone
• Bridge Calls
• Instant Messaging
(Skype, WhatsApp)
• …
Traditional IT Cloud-Native
9
8. One Example:
Incident
Management
Incident Management is
optimized to restore the normal
service operations as quickly as
possible, thus ensuring the best
levels of service quality and
availability are maintained.
Learn more at:
ibm.com/devops/method/conten
t/architecture/serviceManageme
ntArchitecture#0_1
Monitor Analyze Plan Execute
Dashboards & Reporting
Monitoring
Logging
Event
Management
Ticket & Trending
Notification Collaboration
Runbooks
New Relic
Prometheus
CAM
APM
ElasticStack
LogMet
NOI
ASM
PagerDuty
ANS
Slack
Hipchat
RBA
Rundeck
ICD
ServiceNow
Jira
IBM Cloud Console DASH Grafana
11
10. Foundational capabilities for successful transformation
Line of Business & DevOps Teams Central IT Ops & Infrastructure/Domain Ops
Operations
Management
Intelligent Operations and
Machine Learning
Collaboration & Automation
Event
Consolidation
Metrics
Dynamic
Topology
On Premises Cloud Hybrid
IoT
PolicyCorrelation
Events
SDN&NFV
Dashboards
Logs
ChatOps
Infrastructure
Mgnt
Integration
Notification
Run Books
11. Core Event Management
142018 IBM Corporation
On Premises
• Event Collection from virtually any
source
• Event Enrichment from virtually any
source
• Event Correlation and Reduction
• Automatic Response
• Integration and Navigation
• Interactive Event Dashboards
• Scale to accommodate any
environment
Event
Consolidation
On Premises Cloud Hybrid
IoT
PolicyCorrelation
SDN&NFV
Infrastructure
Mgnt
Integration
12. Machine Learning for Intelligent Operations
152018 IBM Corporation
On Premises
• Analyze metrics and events from
thousands of different sources
• Reduce monitoring false positives
• Warn early on emerging
performance anomalies
• Learn event and anomaly patterns
• Automatically correlate common and
rare event groups
• Pinpoint probable cause
Intelligent Operations and
Machine Learning
Metrics
On Premises Cloud Hybrid
IoT
Events
SDN&NFV
Dashboards
Logs
13. Dynamic Topology
162018 IBM Corporation
On Premises
• Near Real-Time & Historical Topology &
State
• View in context for DevOps, Operations,
SME, and Business
• Multi-domain topology
(IT, Network, Storage, App,
Orchestration++)
• Ease of Integration with any topology
source via Observers & APIs
• Rapid Time-to-value
• Cloud native technology
14. Collaboration and Automation
172018 IBM Corporation
- Alert Notification
- Runbook Automation
On Premises
• Notify the right people about an
emerging Incident
• Account for team calendars and
contact preferences (chat, email,
SMS… )
• Escalate automatically
• Leverage collaboration tools
• Support maturity growth in Incident
response:
• Manual
• Machine-assisted
• Automatic
15. Chat
push
Collaborative Operations - Notification
182018 IBM Corporation
Notification
Incident
My App is
Unresponsive!
I’m getting a
coffee.Route
Policy / Calendar
App owner
Operator
16. Chat
SM
S
Chat
e-mail
Collaborative Operations - Escalation
192018 IBM Corporation
Notification
Incident
Escalate
Looks like Brock
is on it
Ouch. Olivia’s
app is
unresponsive
Re-route
Escalation Policy
App owner
Operator
App owner
Snr
Operator
18. Chat
SM
S
Chat
e-mail
Collaborative Operations - Remediation
212018 IBM Corporation
Notification
Incident
Chat
Run Books
Run
Rollback
That’s done.
Has the app
recovered?
Escalate
App owner
Operator
App owner
Snr
Operator
20. Chat
SM
S
Chat
e-mail
Collaborative Operations – Efficiency Improvement
232018 IBM Corporation
Notification
Incident
Chat
Run Books
Run
Rollback
Run App
Check
Escalate
Define
I’ll automate
This for next time
App owner
Operator
App owner
Snr
Operator
21. Collaborative Operations - Automation
242018 IBM Corporation
Incident Resolved
Run Books
Run rollback
Run App Check
22. Sources of Additional Information
252018 IBM Corporation
• Cloud Service Management and Operations (http://ibm.biz/csmo_arch)
• IBM Cloud IT Operations Management (https://www.ibm.com/cloud/hybrid-it-management/it-operations-
management)
• IBM Cloud Application Management (https://www.ibm.com/cloud/hybrid-it-management/application-
management)
• IBM Cloud Event Management (https://www.ibm.com/cloud/event-management)
• IBM Cloud Private (https://www.ibm.com/cloud/private)