Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
1. AI Ops for Interconnected Cities and IoT
Digipolis Meetup – 27 June 2018
Kristof Renders
Global Architect Dynatrace Services
2.
3. On average, a single transaction uses 82 different types of technology
Browser
Multi-geo
Mobile Network
Code
Hosts
Logs
Sensors
Actors
3rd parties
Services
Cloud SDN
Containers
Application complexity
Gateways
Edge
7. Mobile
Application
Code
Database
Network
Container
Micro-service
Browser
Synthetic
Server
Mainframe
Log & Events
API
Cloud
high fidelity, full stack data
All transactions, all the time
Connected end-to-end
PurePath + Smartscape
Real-time dependency detectionAuto instrumentation
Automated workflows
Automated problem detection
Automated root cause analysis
Causation gives answers
Massive Automation
Automate the effort
Natural language interface
Automated business impact
OneAgent
Advanced analytics
Expert knowledge built-in
Self learning A.I.
Better data makes Dynatrace artificial intelligence and automation possible
9. Multi tier IoT
Data aggregation
Measurement
Control
Processing
Analytics
Machine Learning
Analytics
Machine Learning
Management
Archive
Integration
Business Logic
Sensing
Acting
Connecting physical
and virtual world
Sensors
Actuators
Gateways
Access Points
Sensor Hubs
Edge IT
10. On average, a single transaction uses 82 different types of technology
Browser
Multi-geo
Mobile Network
Code
Hosts
Logs
Sensors
Actors
3rd parties
Services
Cloud SDN
Containers
Application complexity
Gateways
Edge
11. Edge devices
Infrastructure
Compute
Platform
Applications
Solutions
IoT Gateway
MS Azure Cloud FoundryIBM SAPAWS
Logistics
Transportation
Application
Monitoring
(Cloud) Platform
Monitoring
Application and
Device Monitoring
Business
Monitoring
SmartMetering
FaultDetection
Industry4.0
Manufacturing
SmartCity
Fleet
Management
Home
Automation
Traffic&
Lighting
Control
SmartHome
Entertainment
OpenShift
Energy
PowerSupply
Predictive
Maintenance
Azure IoT AWS IoT Bosch IoTGE PredixThingworxIBM Watson
IoT
Platform
Communication &
Transaction
Monitoring
Google
Google IoT Core
ConnectedCar
Automotive
Patient&
Hospital
Management
Healthcare
SAP Leonardo C3 IoT
16. Mobile
Application
Code
Database
Network
Container
Micro-service
Browser
Synthetic
Server
Mainframe
Log & Events
API
Cloud
high fidelity, full stack data
All transactions, all the time
Connected end-to-end
PurePath + Smartscape
Real-time dependency detectionAuto instrumentation
Automated workflows
Automated problem detection
Automated root cause analysis
Causation gives answers
Massive Automation
Automate the effort
Natural language interface
Automated business impact
OneAgent
Advanced analytics
Expert knowledge built-in
Self learning A.I.
28. Mobile
Application
Code
Database
Network
Container
Micro-service
Browser
Synthetic
Server
Mainframe
Log & Events
API
Cloud
high fidelity, full stack data
All transactions, all the time
Connected end-to-end
PurePath + Smartscape
Real-time dependency detectionAuto instrumentation
Automated workflows
Automated problem detection
Automated root cause analysis
Causation gives answers
Massive Automation
Automate the effort
Natural language interface
Automated business impact
OneAgent
Advanced analytics
Expert knowledge built-in
Self learning A.I.
36. Mobile
Application
Code
Database
Network
Container
Micro-service
Browser
Synthetic
Server
Mainframe
Log & Events
API
Cloud
high fidelity, full stack data
All transactions, all the time
Connected end-to-end
PurePath + Smartscape
Real-time dependency detectionAuto instrumentation
Automated workflows
Automated problem detection
Automated root cause analysis
Causation gives answers
Massive Automation
Automate the effort
Natural language interface
Automated business impact
OneAgent
Advanced analytics
Expert knowledge built-in
Self learning A.I.
38. “If you have an AI tool
analyzing data, it will produce
great output on any input.”
“Input data and algorithms
matter. AI cannot find what is
not in the data. Better input
provides better output.”
Myth Reality
39. AIOps
“AIOps platforms combine big data and machine learning functionality to
support IT operations.” – Gartner
understand dependencies in complex environments
reduce incident noise
find and resolve issues faster
40. Most AIOps tools are failing!
AI used to learn about dependencies
Too slow for modern environments
No root cause information
41. Why is Dynatrace different?
better data quality
stronger context and semantics
smarter algorithms
44. confidential
MachineLearning
Algorithm&Model
A: M obile app
B: Service
C: Container
Reality 2nd
Gen APM + Monitoring AIOps metric based
A
B
C
A: M obile app
B: Service
C: Container
Reality 2nd Gen APM + Monitoring AIOps event based
MachineLearning
Algorithm&Model
A
B
C
45. confidential
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
A: M obile app
B: Service
C: Container
Reality 2nd
Gen APM + Monitoring AIOps metric based
A: M obile app
B: Service
C: Container
Reality 2nd Gen APM + Monitoring AIOps event based
A: M obile app
B: Service
C: Container
Reality Dynatrace AI based monitoring
A: M obile app
B: Service
C: Container
Logs
A
B
C
User sessions
M etrics
PurePath
TCP Connections
Events
A
B
C
46. confidential
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
A: M obile app
B: Service
C: Container
Reality 2nd
Gen APM + Monitoring AIOps metric based
A: M obile app
B: Service
C: Container
Reality 2nd Gen APM + Monitoring AIOps event based
A: M obile app
B: Service
C: Container
Reality Dynatrace AI based monitoring
A: M obile app
B: Service
C: Container
Logs
A
B
C
User sessions
M etrics
PurePath
TCP Connections
Events
D
D
D
D
D
A
B
C
47. Dynatrace AI leverages external data
Reality Dynatrace AI based monitoring
Your extensions
3rd party solution
Dynatrace API
Your DevOps teams
Dynatrace Partners
Dynatrace Services
49. Scaling to very large problems
820 Billion
dependencies
Network Problem
Mushroom cloud effect
50. confidential
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
MachineLearning
Algorithm&Model
A: M obile app
B: Service
C: Container
Reality 2nd
Gen APM + Monitoring AIOps metric based
A: M obile app
B: Service
C: Container
Reality 2nd Gen APM + Monitoring AIOps event based
A: M obile app
B: Service
C: Container
Reality Dynatrace AI based monitoring
A: M obile app
B: Service
C: Container
Logs
A
B
C
User sessions
M etrics
PurePath
TCP Connections
Events
D
D
D
D
D
Root cause
Business impact
Dependency detection
Event noise reduction
Value
Root cause
Business impact
Dependency detection
Event noise reduction
Value
Root cause
Business impact
Dependency detection
Event noise reduction
Value
A
B
C
51. better data quality
stronger context and semantics
smarter algorithms
real time
business impact & root cause detection
53. Mobile
Application
Code
Database
Network
Container
Micro-service
Browser
Synthetic
Server
Mainframe
Log & Events
API
Cloud
high fidelity, full stack data
All transactions, all the time
Connected end-to-end
PurePath + Smartscape
Real-time dependency detectionAuto instrumentation
Automated workflows
Automated problem detection
Automated root cause analysis
Causation gives answers
Massive Automation
Automate the effort
Natural language interface
Automated business impact
OneAgent
Advanced analytics
Expert knowledge built-in
Self learning A.I.
54. confidential
Mission: Bringing the Dynatrace culture to the world
releases
per year
production bugs
reported by
customers
aws
instances
daily deployments
7%
100
%
26
2
450
5 0
500
2011 2017 2011 2017 2011 2017 2011 2017
60. confidential
The Journey towards Autonomous Clouds
Tool Consolidation PaaS Migration Unbreakable Pipeline
Information Democracy
Runbook Automation
Self Protecting
Biz Automation
Biz Impact
Security Detection
Virtual Operations
Free up
resources
Enable
self-driving
Implement
safety nets
Change culture
Automation
Self Healing
61. confidential
4 key initiatives
Enterprise cloud: enable innovation w/ break, shift, re-platform!
Shift-left: engage Dev with earlier & automated feedback
Shift-right: empower Ops to deliver to users with lower risk
Self-heal: transform from traditional-to-zero dashboard NOC
62. confidential
Shift-Right: Bringing it faster to your customer in a safer way!
Enterprise cloud: enable innovation w/ break, shift, re-platform!
Shift-left: engage Dev with earlier & automated feedback
Shift-right: empower Ops to deliver to users with lower risk
Self-heal: transform from traditional-to-zero dashboard NOC
64. confidential
Shift-right: empower Ops to deliver to users with lower risk
Shift-right: perfecting the
management of the a new,
and complex, IT landscape
Mark Kaplan, BARBRI
• Automatic discovery
• Automatic baselining
• Automatic problem detection
65. confidential
Shift-right: empower Ops to deliver to users with lower risk
Upping agility in Ops:
web operations
transformation
Nestor Antonio Zapata
Citrix
• Enhanced communication and high customer satisfaction
• Improved ticket SLAs, faster MTTR, less outage
• Better work-life balance – more time for the good stuff!
• Stopped starting and started finishing
• More efficient processes including emergency response
66. confidential
Shift-right: empower Ops to deliver to users with lower risk
Evolving and sharpening
performance engineering
- on-premise cloud
Ramachandran, REI
• Integrate Dynatrace into CI/CD
• Integrate Feedback into Slack
• Monitor Cloud & Containers
67. confidential
docker run –e DT_TAGS=BLUE
dtcli tag srv FinanceService BLUE
dtcli evt push host .*demo
version=123 source={code_deploy}
dtcli evt push pg tomcat1
desc=JVMMemIncr hint=+100MB
Dynatrace Smartscape
Release
Automation
Dynatrace Automation
API, CLI, Auto-Detection
Shift-right: empower Ops to deliver to users with lower risk
72. … research says .... Customer Service
2%
Merchandising
3%
Operations
4%
Project management
4%
Product management
12%
Usability
12%
Marketing
12%
Business Analysis
20%
IT/development/infrastructure
24%
Other
7%
IN WHAT FUNCTIONAL AREA DO YOU FEEL YOU ARE UNDERSTAFFED THE MOST?
Base: 64 eBusiness and channel strategy professionals
(percentages do not total 100 because of rounding)
Source: Forrester’s Q4 2015 Global eBusiness And Channel Strategy Professional Online Survey
74. The standup
How are we doing
from an
operational
perspective?
Do I need to adjust the
priorities of the team?
Anything
important for my
team?
75. The board meeting
Is our digital business
growing?
Are our digital
processes working
well for customers ?
Are we meeting
availability, technical
requirements and costs?
Are customers
adopting new
functionalities as
expected?
77. confidential
• Dynatrace Problem notification
• Dynatrace Detected Problems are mapped to Cis
• Smartscape acts as data source for Service Map
• Dynatrace Feeds Service Now ITOM Module
82. confidential
Self-healing: Smart auto-remediation vs reactive War Rooms
Enterprise cloud: enable innovation w/ break, shift, re-platform!
Shift-left: engage Dev with earlier & automated feedback
Shift-right: empower Ops to deliver to users with lower risk
Self-heal: transform from traditional to zero-dashboard NOCs
83. Self-heal: path to NoOps with smart auto-remediation
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
?Escalate at 2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5 Still ongoing? Initiate Rollback!
Escalate
? Still ongoing?
5
1
2
3
4
Mark Bad Commits
Update Dev Tickets
…
…
Impact Mitigated??