The document discusses the need for autonomous cloud management to reduce mean time to innovation and remediation by automating operations, deployment, monitoring, and quality using tools like Keptn. Keptn is a control plane that uses a declarative GitOps-based approach with standardized CloudEvents to define delivery and operations processes to enable continuous delivery and operations. It integrates with various tools to automate testing, deployment, monitoring and remediation through event-driven workflows.
3. 3
MTTI
Mean Time to Innovation
MTTR
Mean Time to Remediate
4.8 days
4 hours
~ 10min
12.5 days 2 days ~ 1 hour
The reality and evidence supports the need for ACM!
https://dynatrace.ai/acsurvey
Only < 5% are “Cloud Native”
4. 4
Increase Quality &
Level of Automation
Increase Speed &
Reduce Costs
Automated
Testing
Continuous
Performance
Auto Quality
Gates
AUTOMATE
OPERATIONS
AUTOMATE
DEPLOYMENT
AUTOMATE
MONITORING
Feature
Flagging
Adaptive
Scaling
Auto
Roll-Back
Canary
Releases
Blue /
Green
Deploymen
ts
Auto-
Remediation
AUTOMATE
QUALITY
Strategically Used as Pipeline Feature
Building Blocks for ACM/Cloud Natives!
5. 5
That is why we are building
Because cloud native delivery and operations is a BIG challenge for enterprises!
Cloud Native
6. 6
Which problem does solve?
6
CI/CD Pipeline
• This example: 350+ lines
• Information about
• Target platform (k8s, …)
• Environments (dev, hardening, …)
• Tools (Terraform, Helm, hey, …)
• Process (build, deploy, test, evaluate, …)
Pipelines seem to be becoming our new future
unmanageable legacy code!
8. 8
Which problem does solve?
Challenge
• Add hardening stage?
• Use different tool for deployment?
• Add notifications to all steps?
• Enforce manual approval before
promoting to production for a
period of time?
8
9. 9
How does solve that?
Keptn enables you to
• Define application delivery and operations processes declaratively
• Use predefined CloudEvents to separate the process from the tools
• Integrate and easily switch between different tools
9
13. 1313
Challenge Challenge accepted
• Add hardening stage? Add stage in Shipyard.
• Use different tool for deployment? Switch tool in Uniform.
• Add notifications to all steps? Add tool in Uniform on all events.
• Enforce manual approval before Change approval in Shipyard.
promoting to production for a
period of time?
How does solve that?
16. 16
Continuous Delivery – Launch control
Launch operations are supervised and
controlled from several control rooms (also
known as a firing room). The controllers are
in control of pre-launch checks, the booster
and spacecraft. Once the rocket has cleared
the launch tower (usually within the first
10–15 seconds), is when control is switched
over to the Mission Control Center
17. 17
Continuous Operations – Mission Control
A mission control center (MCC, sometimes
called a flight control center or operations
center) is a facility that manages space flights,
usually from the point of launch until landing
or the end of the mission. It is part of
the ground segment of spacecraft operations.
A staff of flight controllers and other support
personnel monitor all aspects of the mission
using telemetry, and send commands to the
vehicle using ground stations
18. 18
Quote: “We spend more time in manual communicaton than remediating issues”
Second: has Continuous Operations at its Core!
ENGAGE TRIAGE FIND & ASSEMBLE RESOLVE RESTORE
MANUAL
COMMUNICATION
MANUAL
COMM
Before
After
RESTORERESOLVE
NUMBER
OF ISSUES BEFORE: mostly manual
AFTER: mostly automated
19. 19
Mission Control
“Automated Operations” = Day 2 Ops
Launch Control
“Continuous Deployment” = Day 1 Ops
keptn accelerates building autonomous clouds
Event-driven runbook automation
Productionproblemscan beautomaticallyremediatedin
real-timebyexecutingrunbooksthatrequirenomanual
intervention.
Self-healing blue/green deployments
Deploymentsthatfollowthe“Operationsas Code”
paradigmautomaticallyremediateproblemsandget
yourdeploymentpipelineworkingagain inundera
minute.
Automated multistage unbreakable delivery
pipelines
GitOps-enableddeliverypipelineswithautomated
qualitygates supportautomatedtestingandmonitoring-
as-a-service.
20. 20
Designed for modern applications
GitOps-based collaboration
AllkeptnworkflowsarebasedontheGitOps
paradigm.
Operator patterns for all logic
components
Logiccomponentscan bereusedforother
operationaltasks.
Monitoring and operations as code
Developer-friendlydefinitionofmonitoringand
operationaltasks.
Built on and for Kubernetes
Builtformoderncloud-nativeenvironments.
Event-driven and serverless
Powerfulwitha minimal
resourcefootprint.
Pluggable tooling
Alltoolsleveragedbykeptn
can bereplacedbased
onyourtoolpreferences.
21. 21
Example: Automated Operations
SLI.yml
SLO.aml
REM.yml
Git provider
Monitoringprovider
+
1
(1) Add operation instructions
(3) Setup & configure monitoring
(4) Monitor services
(5) Detect issues based on SLO
(6) Alert Keptn
(7) Find remediation action
(8) Execute remediation action
(9) Receive monitoring feedback (success)
CPU
exhausted!
Scale up Cope with
load
(2) Store & version files
25. 25
Roadmap
• Extend and collaborate on CloudEvent specification
• Enable easy interoperability between common CNCF tools
• Add additional cloud-native practices (canary, feature-flag based self-healing)
• Enhance user interface and observability
• Extend the UI and implement W3C trace-context conform tracing
• Build support for uniforms and Keptn’s wardrobe (service registry)
• Integrate a number of tools and provide a public registry
• Improved self-healing and auto-remediation capabilities
• Handle common problems out-of-the-box?
26. 26
MTTI
= Mean Time to Innovation
MTTR
Mean Time to Remediate
4.8 days
4 hours
~ 10min
12.5 days 2 days ~ 1 hour
We are building Keptn to re-shape this reality
GROW this number!
28. 28
Config ChatOps IT Auto.
Deploy Test Observe
Keptn core is fundamentally event driven:
0.5.0
• sh.keptn.internal.event.project.create
• sh.keptn.internal.event.service.create
• sh.keptn.event.configuration.change
• sh.keptn.event.monitoring.configure
• sh.keptn.events.deployment-finished
• sh.keptn.events.evaluation-done
• sh.keptn.events.tests-finished
• sh.keptn.events.problem
1. Do Something
2. Send Event
1. Consume Event
2. Do Something
1. Consume Event
2. Do Something
3. Send Event
1. Consume Event
2. Do Something
1. Consume Event
2. Do Something
29. 29
Some example Keptn Contribution Use Cases
• Send new-artifact event when new build artifact is generated
• Consume all or specific events for notification purposes
• Consume deployment-finished events to begin tests
• Consume configuration-changed events to execute automated tasks (i.e. Jenkins)
• Consume problem events to begin execute remediation action (i.e. ServiceNow, XMatters, Ansible)
• Disregard events entirely and provide additional source of metrics to Pitometer! (i.e. Prometheus)
30. 30
Example 1: JIRA Service
• Subscribes to:
• sh.keptn.events.evaluation-done
• Creates JIRA ticket upon failed Pitometer deployment evaluation
• Ticket includes:
• Failed deployment stage (i.e. Dev, Staging, Prod)
• Failed service
• Total Pitometer Score
• Pitometer Score Pass Threshold
• Pitometer Score Warning Treshold
• Pitometer Indicator ID
• Pitometer Evaluation Result
http://github.com/keptn-contrib/jira-service
1. Consume Event
2. Do Something
31. 31
Example 2: Neotys
• Subscribes to:
• sh.keptn.events.deployment-finished
• Publishes to:
• sh.keptn.events.test-finished
• Executes a NeoLoad test stored in source control alongside application source
• Consumes test-strategy from Shipyard file
• Pitometer NeoLoad source allows use of NeoLoad performance test results for build validation in
Pitometer
https://github.com/keptn-contrib/neoload-service
https://github.com/neotyskeptn/pitometer-source-neoload
1. Consume Event
2. Do Something
3. Send Event
32. 32
Example 3: UFO Service
• Subscribes to:
• sh.keptn.events.new-artifact
• sh.keptn.events.deployment-finished
• sh.keptn.events.tests-finished
• sh.keptn.events.evaluation-done
• new-artifact event = blue LEDs
• deployment-finished event = purple LEDs
• evaluation-done Pass = green LEDs
• evaluation-done Fail = red LEDs
https://github.com/keptn-contrib/ufo-service
1. Consume Event
2. Do Something
33. 33
Relevant events by provider use-case:
• Testing Tools/Services:
• Subscribe to sh.keptn.events.deployment-finished
• Pitometer source for test results
• Consumes test-strategy from Shipyard file
• Monitoring Tools/Services:
• Pitometer source
• Continuous Integration or Build Tools/Services:
• Publish to sh.keptn.event.configuration-changed
• ChatOps Tools/Services
• Subscribe to ALL events
• Automation Tools/Services
• Subscribe to sh.keptn.events.problem to remediate a problem
• Subscribe to sh.keptn.events.configuration-changed to execute additional deployment tasks