Contenu connexe Similaire à Using AIOps to reduce incidents volume (20) Plus de Amazon Web Services (20) Using AIOps to reduce incidents volume2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Using AIOps to Reduce Incidents
Volume
Itai David Njanji
Senior Consultant, OI
AWS Professional Services
S e s s i o n I D : O P S 1
Paul Ferguson
Global Practice Manager
AWS Professional Services
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Poll
How many of you are using AWS CloudWatch?
How many of you are using third party monitoring tools?
How many of you are using ITSM tools for Incident Management?
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
• Introduction: Challenges of IT Ops
• Review ITIL Guidelines
• IT Ops to AIOps
• AIOps
• Summary
5. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data overload
Gaming IoT sensorsDevices
External
systems
and
applications
Web content
Logs, logs, and
more logs …
Databases Servers NetworkingStorage
Internal
systems
and
applications
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
IT Ops
Drowning in
tickets and
noise
More
Automation
Time to detect
and fix issues is
too long
Need more
agile, roll out
changes faster
Challenges of IT Ops
8. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Sustain
“Traditional Ops”
Grow
“DevOps”
Optimize
“CloudOps”
OperationsEngineering
PlatformApplications
DevOpsDev Team /
COTS
Cloud Platform
Engineering
ITSM
OperationsEngineering
PlatformApplications
DevOpsDevOps
Teams
ITSM
Cloud Platform Engineering
OperationsEngineering
PlatformApplications
DevOpsDev Team /
COTS
Cloud
Platform
Engineering
Cloud
Operations
App
Ops
Platform
Ops
ITSM
Transitional Strategic Strategic
Ticketing System (ITSM) in Operating Models
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Scale
“AIOps”
AI OperationsEngineering
PlatformApplications
DevOps
DevOps
Teams
ITSM
ML
Cloud Platform Engineering
Strategic
AIOps Operating Model
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
PROCESS
Core
Operations
Functions
Platform Architecture &
Governance
Event & Incident
Management
Provisioning &
Configuration
Management
Availability &
Continuity
Management
Security &
Control
Functions
Change Management
Resource Inventory
Management
Identity & Access
Management
Security Management
Business
Management
Functions
Financial Management
Capacity Planning &
Forecasting
Organizational Change
Management
Vendor Management
Supporting
Functions
Reporting & Analytics
Continuous
Improvement
Application Lifecycle
Management
IT Processes
12. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Best practices for modern application development
• Enable experimentation by creating a culture of ownership
• Componentize applications using microservices
• Update applications and infrastructure quickly by automating the release
pipeline
• Model and provision application resources using infrastructure as code
• Simplify infrastructure management with serverless technologies
• Improve application performance by increasing observability
• Secure the entire application lifecycle by automating security
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
DNA of Modern Apps
• Large Data Volume
• Connected (APIs etc.)
• Evolves quickly (CI/CD)
Increased Incident Volume
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Javascript to process
notifications
ITSM Tool
Tickets/CMDB Update
Incidents: AWS-ITSM Integration
AWS Config Amazon CloudWatch
Publish to SNS topicAlarmDiscovery
Amazon Simple
Notification Service
HTTPS notification
Topic
AWS Cloud
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
More Monitoring Tools
ITSM Data Overload
AIOps can reduce noise and increase insights:
• Alert Clustering
• Anomaly Detection
• Neural Feedback
ITSM Hygiene
Incident and Event Management
17. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Step 1. Set up for
Observability
Platform Logs
Events
Application Logs
Code telemetry
API and Users
Infrastructure Logs
Amazon
CloudWatch
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Step 2. Set up delivery, Collection, and Storage
ingestion
Amazon CloudWatch Amazon Kinesis
Amazon CloudWatch Amazon EMR Amazon Elastic
Search
Amazon Redshift Amazon Simple Storage
Service (S3)
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Step 3. Querying and patternsmining
Amazon CloudWatch Amazon Athena Amazon Kinesis Data
Analytics
Amazon SageMaker
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Step 4.
Alerting, Notification, andremediation
Amazon CloudWatch Amazon Simple
Notification Service
AWS Lambda
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Step 5.
Integrate with ITSM Tooling
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AIOps
Predictive &
Preventive Insights
Patterns,
tools & data
ingestion
Planned events
Budgets
Threat intel
Ticket Reductions
Decisive
Response
Timely
Recovery
New Pattern
discovery
Continuous Learning
Infrastructure Logs
Platform Logs
Application Logs
Code telemetry
Events
24. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Marketplace Solutions
• Turn Key solutions - Less time to production
• Address people challenges: Expert guidance
• Proven solutions and more ..
Event Noise
Filtering
Incident Detection
Entropy
Time
Proximity
Logical
Topology
Linguistic Proximity
Behavior Feedback
Neural
Feedback
26. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Old Way vs. AIOps Way
Enterprise Event Correlation Old Way AIOps
Approach Rules Algorithms and automated workflow
Dependency Human Human & Machine
Configuration Definition Rule Logic + Inputs Outcomes
Configuration Amount 1000+ Rules < 10 Definitions
Configuration Time Days & Continuous Mins & One Time
Correlation Technique Hard Matching Fuzzy Matching
Machine Learning None Supervised & Unsupervised
Accuracy 20%? 80% percentile +
Tolerates App/Infra Changes No Yes
Maintenance Resources 2-3 people 1 part-time
Cost of Ownership High Low
28. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I TS U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Itai David Njanji
injanji@amazon.com