Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult to achieve team objectives.
Join this talk to learn:
How to safely delegate the management of your cloud deployment (to developers and other end users) with self-service operations.
How to create powerful runbooks with guardrails that leverage existing scripting languages, infrastructure, and tools to remove bottlenecks that form around repetitive tasks.
Strategies for getting started with self-service.
Self Service Cloud Operations: Safely Delegate the Management of your Cloud Operations with Rundeck
1. Shape Up
Skills Builder - September 4th, 2020
Confidential
Self Service Cloud Operations
Safely Delegate the Management of your
Cloud Operations with Rundeck.
2. Before we begin...
● Attendee lines are muted for audio
quality
● Use Q&A in Zoom navigation
(see right)
● Questions answered at the end
● You’ll receive a link to the webinar
after it concludes
5. But let’s be real...
Cost
Architectural
Complexity
Tooling
Sophistication
6. ‘Everything-as-Code’ has Consequences
Technical Expertise for the Cloud is Limited
● Public Cloud services are intended for those with
scripting and development skills.
● Tooling built for the cloud assumes familiarity with
the public-cloud services.
7. Status Quo:
● Operations and Cloud teams are
inundated with manual requests from
other teams (provisioning,
config-changes, data-snapshots).
● Interruptions prevent focus on high
value work.
Achieving Maximum Velocity
9. Achieving Maximum Velocity
How can we maximize the “promises of the
cloud,” given the technical complexity and
disparities in cloud-acumen?
10. Case for Self-Service
How can we reduce the burden on Cloud experts
and empower developers, business-users and
other engineers?
Simplify and delegate!
11. Case for Self-Service
Self-Service Operations:
● Give developers, business-users and other
engineers the ability to utilize cloud resources
● Allow the Cloud experts to maintain a set of
standards and practices for accessing secure
internal operations.
12. Self-Service Runbook Automation
● Enable anyone to have self-service
access to operations tasks that were
only available to subject matter experts.
● Makes existing automation more
secure, auditable, and easier to run.
13. Getting Started: What to consider.
Consider the following when beginning the Self-Service Operations journey:
● Manual and/or repetitive activities.
● Works with existing tooling.
● Operational Frameworks: access, security, version-control.
● Usability across teams of varying expertise.
14. Automation Operational Maturity Model
Documented manual
procedures.
Customer struggles to
document and update
manual procedures into
Runbooks.
Automation is only available
and safe to run for experts
who built it.
Request tickets and
incidents almost always
escalate to SMEs and
developers.
Automation in islands.
Individuals and teams
automate some of their own
procedures with a variety of
tools: build, provisioning,
orchestration, maintenance.
Automation is still only
available and safe for
experts.
Tickets and Incidents and
tickets still escalate to SMEs
and developers.
Standardized and
delegated automation.
One or more teams delegate
self-service operations
through a runbook
automation platform.
Automated procedures are
standardized allowing less
expert parties to safely use
functions.
Many recurring requests and
incidents can be resolved
through self-service
invocations.
Event-driven operations.
Organizations further
eliminate wait times and
speed up operations with
event-triggered automation.
Common incidents resolve
faster, often without human
intervention.
Frequent operations are
automatically scheduled or
invoked by business
processes.
Percentage of requests
invoked through automation
is also tracked against goals.
Manual Reactive Responsive Proactive
Reduction of Work and Toil
17. Thank You!
See a demo of Rundeck
Enterprise at
rundeck.com/see-demo
18. Persona: Public Cloud Services
In the past, enterprise architects wandered the hallways looking for system development projects
that did not comply with platform and development standards.
The cloud makes it extremely easy to pick whatever type, brand, and function of a service you feel is
best of breed. Dev teams are encouraged to work fast with aggressive sprints, decoupled from other
dev teams and from any centralized IT compliance.
David Linthicum
Chief Cloud Strategy Officer, Deloitte
Consulting
20. 2021 Prediction for the Cloud:
“Worldwide end-user spending on
public cloud services is forecast to
grow 18.4%”
- Gartner Predictions for 2021
21. Complexity is the New Normal
Visual representation of
mid-size public SaaS
services
22. Status Quo:
● When a business users need a Cloud
resource spun up, they fill out a ticket
and assign it to the Ops / Cloud team
Managing a Cloud Deployment at Scale
Biz User
Ops/ Cloud
Team
23. Benefits of Self-Service
Reducing TOIL activities
Save time and money
Reducing organizational
silos
Leverage current tooling
for automation
Reducing organizational
silos
Leverage current tooling
for automation
24. Example Automation Tasks
No-Impact
High-Impact
Simple Sophisticated
Change action that could break
things or impact performance
Non-change action with no
performance impact
Single-step with no
options
Multi-step, multi-node workflow with input options,
dependencies, and conditionals
Healthchecks
Incident
Enrichment
Diagnostics
Diagnostics
(resource intensive)
Simple
Restart
Multi-Service
Rolling Restart
Rollback and
Redeploy
Failover
Fetch Logs
Performance
Check
Emergency
Firewall Change
Config
Change
Emergency
Database
Change
Add/Remove
Capacity
Multi-Step
Restart
Healthchecks
25. Example Automation Tasks
No-Impact
High-Impact
Simple Sophisticated
Change action that could break
things or impact performance
Non-change action with no
performance impact
Single-step with no
options
Multi-step, multi-node workflow with input options,
dependencies, and conditionals
Crawl
Walk
Run
Incident
Enrichment
Diagnostics
Diagnostics
(resource intensive)
Simple
Restart
Multi-Service
Rolling Restart
Rollback and
Redeploy
Failover
Fetch Logs
Multi-Step
Restart
Performance
Check
Emergency
Firewall Change
Config
Change
Emergency
Database
Change
Add/Remove
Capacity
Healthchecks
27. Agenda
1 Status Quo for Operations
2 Case for Self-Service Operations
3 How to create runbooks that leverage existing scripting languages
4 Strategies for getting started
5 Demo