SlideShare une entreprise Scribd logo
1  sur  145
Télécharger pour lire hors ligne
Clearing the Way
For SRE in the Enterprise
Damon Edwards
@damonedwards
Community
Ops Improvement
DevOps
Ops Tools
Damon Edwards
Digital
Agile
DevOps
CI/CD
Cloud
Docker
Kubernetes
Microservices
CHANGE
Wow
That is cool
I wish I could
work there
OpsBusiness
Idea
Shorter Time-to-Market
Fast Feedback
from Users
Dev Ops
Running
Services
Improved Quality
Digital and DevOps
Availability Auditing
Security Compliance
"Go faster!"
“Open up!”
“Lock it down!”
“Great for Dev, but what about Ops?”
Our transformation has largely
ignored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Jane Doe
Systems Administrator
Jane Doe
Systems Administrator
We have
SysAdmins
Jane Doe
Systems Administrator
They should be
SREs!
Jane Doe
SRE
They should be
SREs!
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much,
and break too often!
Executive
View
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much,
and break too often!
Executive
View
(False) SRE
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
Our transformation has largely
ignored Ops. Any ideas?
Have you h
Google
Everything takes too
long, cost too much,
and break too often!
Executive
View
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Principles of SRE are what set SRE apart
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Principles of SRE are what set SRE apart
Stephen Thorne
At DevOps Enterprise Summit
London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Forces That Undermine SRE Principles
Silos Queues
Excessive Toil Low Trust
Forces That Undermine SRE Principles
Silos Queues
Excessive Toil Low Trust
Silos
Backlog Information
PrioritiesTools
Backlog Information
I need X
PrioritiesTools
Silos
Backlog Information
I need X
PrioritiesTools
Silos
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
Silos cause disconnects and mismatches
Backlog Information
I need X
PrioritiesTools
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
1
2
3
Silos Interfere with feedback loops
1
2
3
Silos Interfere with feedback loops
Producer Consumer
Ops
Ops
Ops
Function A
Function B
Function C
Silos create labor pools of functional specialists
Requests fulfilled by semi-
manual or manual effort

Primary management focus is
on protecting team capacity
Silos Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Silos Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Disjointed silos make meaningful SLOs and shared
responsibility almost impossible
X
Silos Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Disjointed silos make meaningful SLOs and shared
responsibility almost impossible
X
Siloed labor pools, disconnected processes and tools, and slow
feedback loops tend to consume all available capacity
X
Silos Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Disjointed silos make meaningful SLOs and shared
responsibility almost impossible
X
Siloed labor pools, disconnected processes and tools, and slow
feedback loops tend to consume all available capacity
X
Struggling to keep up with demand and unable to protect capacityX
Forces That Undermine SRE Principles
Silos Queues
Toil Low Trust
How do we cover for our cross-silo disconnects and mismatches?
Silo A Silo B
How do we cover for our cross-silo disconnects and mismatches?
Silo A Silo B
Ticket
Queue
??
Silo A Silo B
We all know how well that works
Ticket
Queue
Request queues are an expensive way to manage work
Ticket
Queue
Queues Create…
Longer Cycle Time
Increased Risk
More Variability
More Overhead
Lower Quality
Less Motivation
Adapted from Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development
What do queues do to value streams?
What do queues do to value streams?
Queue
A
Queue
B
What do queues do to value streams?
Queue
A
Queue
B
Queues disintegrate and
obfuscate value streams
Tickets queues become “snowflake makers”
??
Silo A Silo B
Ticket
Queue
Tickets queues become “snowflake makers”
??
Silo A Silo B
Ticket
Queue
Snowflakes
(each unique, technically acceptable but unreproducible and brittle)
Ticket Queues Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Ticket Queues Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Tickets reinforce siloed behaviors and obfuscate the value
stream
X
Ticket Queues Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Tickets reinforce siloed behaviors and obfuscate the value
stream
X
Longer cycle time, more variability, more overhead, lower quality, and
more snowflakes consume available capacity
X
Ticket Queues Undermine SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Tickets reinforce siloed behaviors and obfuscate the value
stream
X
Longer cycle time, more variability, more overhead, lower quality, and
more snowflakes consume available capacity
X
Queues obfuscate the pressure being put on request fulfillersX
Forces That Undermine Operations
Silos Queues
Toil Low Trust
Toil is the enemy of SRE
Toil is the enemy of SRE
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau

Google
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Excessive toil prevents fixing the system
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive toil prevents fixing the system
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Excessive Toil Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Buried in toil keeps team from contributing engineering work
to uphold their end of the shared responsibility deal
X
Excessive Toil Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Buried in toil keeps team from contributing engineering work
to uphold their end of the shared responsibility deal
X
Buried in toil… no capacity for engineering work to reduce toil.X
Excessive Toil Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Buried in toil keeps team from contributing engineering work
to uphold their end of the shared responsibility deal
X
Buried in toil… no capacity for engineering work to reduce toil.X
Buried in toil… no capacity for engineering work to reduce toil.X
Forces That Undermine Operations
Silos Queues
Toil Low Trust
Where are decisions made? Who can take action?
escalate
1° 2° 3° 4°
escalate escalateor
Decisions made here
All work is contextual
John
Allspaw
All work is contextual
rm -rf $PATHNAME
John
Allspaw
All work is contextual
rm -rf $PATHNAME Is this dangerous?
John
Allspaw
All work is contextual
rm -rf $PATHNAME
John
Allspaw
All work is contextual
rm -rf $PATHNAME
John
Allspaw
All work is contextual
rm -rf $PATHNAME
Is this dangerous?
John
Allspaw
All work is contextual
rm -rf $PATHNAME
John
Allspaw
All work is contextual
rm -rf $PATHNAME
Answer is always
“it depends”
John
Allspaw
escalate
1° 2° 3° 4°
escalate escalateor
Context
Where are decisions made? Who can take action?
Low trust + approvals = illusion of control
Ticket
System
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
How many got rejected?
Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
How many got rejected?
Low Trust Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Low Trust Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Cultures of low trust have a really difficult time with shared
responsibility
X
Low Trust Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Cultures of low trust have a really difficult time with shared
responsibility
X
People closest to problems know what to fix but tasking, priorities,
and decisions are largely out of their control
X
Low Trust Undermines SRE Principles
1. Org has Service Level Objectives, with consequences?
2. SREs have time to make tomorrow better than today?
3. SRE teams have the ability to regulate their workload?
Cultures of low trust have a really difficult time with shared
responsibility
X
People closest to problems know what to fix but tasking, priorities,
and decisions are largely out of their control
X
People aren’t trusted to plan or design their own workX
Forces That Undermine Operations
Silos Queues
Toil Low Trust
So what can we do differently?
Lean on Lean to find what to fix
PD
TS
W
EP M
M
M
TS
?
PD
TS
W
EP M
M
M
TS
?
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Map the end-to-end flow of information and artifacts (using a recent delivery or event)
Identify what slows lead times, undermines quality, and impacts flow
1
2
3 Identify countermeasures and create improvement storyboards (justification/plan)
Lean on Lean to find what to fix
PD
TS
W
EP M
M
M
TS
?
PD
TS
W
EP M
M
M
TS
?
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Map the end-to-end flow of information and artifacts (using a recent delivery or event)
Identify what slows lead times, undermines quality, and impacts flow
1
2
3 Identify countermeasures and create improvement storyboards (justification/plan)
All processes should be studied with
an improvement disciple
Lean on Lean to find what to fix
PD
TS
W
EP M
M
M
TS
?
PD
TS
W
EP M
M
M
TS
?
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Map the end-to-end flow of information and artifacts (using a recent delivery or event)
Identify what slows lead times, undermines quality, and impacts flow
1
2
3 Identify countermeasures and create improvement storyboards (justification/plan)
All processes should be studied with
an improvement disciple
Incidents are just as much a
“process” as delivery
Lean on Lean to find what to fix
PD
TS
W
EP M
M
M
TS
?
PD
TS
W
EP M
M
M
TS
?
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Map the end-to-end flow of information and artifacts (using a recent delivery or event)
Identify what slows lead times, undermines quality, and impacts flow
1
2
3 Identify countermeasures and create improvement storyboards (justification/plan)
All processes should be studied with
an improvement disciple
Incidents are just as much a
“process” as delivery
Look to Lean for proven
improvement techniques (value
stream mapping, waste analysis,
improvement kata)
Lean on Lean to find what to fix
PD
TS
W
EP M
M
M
TS
?
PD
TS
W
EP M
M
M
TS
?
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Countermeasure
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Lorem ipsum dolor
In aliquet rhoncus urna. Proin
eget diam volutpat.
Map the end-to-end flow of information and artifacts (using a recent delivery or event)
Identify what slows lead times, undermines quality, and impacts flow
1
2
3 Identify countermeasures and create improvement storyboards (justification/plan)
All processes should be studied with
an improvement disciple
Incidents are just as much a
“process” as delivery
Look to Lean for proven
improvement techniques (value
stream mapping, waste analysis,
improvement kata)
Make it a part of your organization’s
discipline
Get rid of as many silos as possible
Old Silo A Old Silo B Old Silo C Old Silo D
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Get rid of as many silos as possible
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Get rid of as many silos as possible
Key 1: get rid of as many
handoffs as possible
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Get rid of as many silos as possible
Key 2: “Horizontal”
shared responsibility, not
everyone do everything!
Key 1: get rid of as many
handoffs as possible
Shared responsibility matters more than org model
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
Shared responsibility matters more than org model
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
Shared responsibility matters more than org model
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
Same
high-quality,
high-velocity
results!
Why focus on getting rid of handoffs?
Why focus on getting rid of handoffs?
1. Your people are your most valuable assets
Why focus on getting rid of handoffs?
1. Your people are your most valuable assets
2. The SRE skillset is expensive
Why focus on getting rid of handoffs?
1. Your people are your most valuable assets
2. The SRE skillset is expensive
3. Stay out of their way!
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
Observe
Orient
Decide
Action
SRE
OODA
Loop
Reduce friction:
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
Invest in the right
instrumentation
Observe
Orient
Decide
Action
SRE
OODA
Loop
Reduce friction:
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
Invest in the right
instrumentation
Invest in
collaboration,
checklists,
investigatory tools
Observe
Orient
Decide
Action
SRE
OODA
Loop
Reduce friction:
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
Invest in the right
instrumentation
Invest in
collaboration,
checklists,
investigatory tools
Empower them to
make decisions!
Observe
Orient
Decide
Action
SRE
OODA
Loop
Reduce friction:
SREs are expensive, stay out of their way!
Ticket
Queue ✅Ticket
Queue
Ticket
Queue
Ticket
Queue
Backlog
Ticket
Queue
Ticket
Queue ✅
Backlog
Not this:
This:
Invest in the right
instrumentation
Invest in
collaboration,
checklists,
investigatory tools
Empower them to
make decisions!
Empower them to
take action!
Observe
Orient
Decide
Action
SRE
OODA
Loop
Reduce friction:
What about the handoffs you can’t get rid of?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
What about the handoffs you can’t get rid of?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
Ticket
Queue
Ticket
Queue
Ticket
Queue
What about the handoffs you can’t get rid of?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
Ticket
Queue
Ticket
Queue
Ticket
Queue
Ticket
Queue
Ticket
Queue Ticket
Queue
Operations as a Service: Turn handoffs into self-service
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(builds & operates)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Operations as a Service: Works with any org model
Operations as a Service: Popular Uses for SRE
Environment
"I could fix it, if I could get to it”
Operations as a Service: Popular Uses for SRE
Environment
"I could fix it, if I could get to it”
Environment
O
a
a
S
Operations as a Service: Popular Uses for SRE
“Avoiding the dogpile”
I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”“$ top”
Operations as a Service: Popular Uses for SRE
“Avoiding the dogpile”
I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”“$ top”
I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“Healthcheck
store2 - all”
OaaS
“I don’t read wikis. I’m an expert.”
docs
Service has changed. This flag is now
required or bad things will happen!
Pause monitoring first or we
all get woken up!
“restart -doit -now”
I’ve done this before. I’ve got this.
Environment
docs
Later…
Operations as a Service: Popular Uses for SRE
“I don’t read wikis. I’m an expert.”
docs
Service has changed. This flag is now
required or bad things will happen!
Pause monitoring first or we
all get woken up!
“restart -doit -now”
I’ve done this before. I’ve got this.
Environment
docs
Later…
OaaS
Service has changed. This flag is now
required or bad things will happen!
Pause monitoring first or we
all get woken up!
“restart”
I’ve done this before. I’ve got this.
Environment
Later…
Update
Restart Job
✅
OaaS
Operations as a Service: Popular Uses for SRE
Operations as a Service: Popular Uses for SRE
“Uneven and hidden skills”
I don’t know
how to do X.
I know how
to do X.
I know how
to do Y.
I don’t know
how to do Y.
Operations as a Service: Popular Uses for SRE
“Uneven and hidden skills”
I don’t know
how to do X.
I know how
to do X.
I know how
to do Y.
I don’t know
how to do Y.
OaaS
“Do X”
“Define Y
Procedure”
“Define X
Procedure”
“Do Y”
“Do X+Y”
“Let me do that for you again… and again”
Done.
I need you to
do X
Later…
Ticket
Other
work
Done.
I need you to
do X
Later…
Ticket
Other
work
Sigh..Done.
I need you to
do X
Ticket
Other
work
Operations as a Service: Popular Uses for SRE
“Let me do that for you again… and again”
Done.
I need you to
do X
Later…
Ticket
Other
work
Done.
I need you to
do X
Later…
Ticket
Other
work
Sigh..Done.
I need you to
do X
Ticket
Other
work
OaaS
Do X
Later…
Other
work 1
Later…
Other
work 2
Other
work 3
Do X
Do X
OaaS
OaaS
Operations as a Service: Popular Uses for SRE
Use tickets only for what they are good for
Ticket
System
Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
Ticket
System
Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
2.Routing for necessary approvals
Ticket
System
Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
2.Routing for necessary approvals
Not as a general purpose work management system!
Ticket
System
But won’t Security or Compliance stop you?
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Build-in
Security
Here
Build-in
Compliance
Here
But what about ITIL®
?
But what about ITIL®
?
• Ask ITIL people and they say SRE is ITIL compatible
But what about ITIL®
?
• Ask ITIL people and they say SRE is ITIL compatible
• Ask people who have seen ITIL implemented and they say “how?”
But what about ITIL®
?
• Ask ITIL people and they say SRE is ITIL compatible
• Ask people who have seen ITIL implemented and they say “how?”
• Agile+DevOps+SRE have self-regulation and shared responsibility
features that seem to undermine ITIL command and control nature
But what about ITIL®
?
• Ask ITIL people and they say SRE is ITIL compatible
• Ask people who have seen ITIL implemented and they say “how?”
• Agile+DevOps+SRE have self-regulation and shared responsibility
features that seem to undermine ITIL command and control nature
• ITIL “Standard Change” is often focus of discussion, but it still
implies approval model
But what about ITIL®
?
• Ask ITIL people and they say SRE is ITIL compatible
• Ask people who have seen ITIL implemented and they say “how?”
• Agile+DevOps+SRE have self-regulation and shared responsibility
features that seem to undermine ITIL command and control nature
• ITIL “Standard Change” is often focus of discussion, but it still
implies approval model
• Straight talk: are we doing contortions to defend a sunk cost?
“Shift Left” the ability to take action
escalate
1° 2° 3° 4°
escalate escalateor
“Shift Left” the ability to take action
Push the ability to take action this direction
escalate
1° 2° 3° 4°
escalate escalateor
“Shift Left” the ability to take action
Push the ability to take action this direction
escalate
1° 2° 3° 4°
escalate escalateor
OaaS Enablement and tooling
Reduce Toil
Reduce Toil
1. Track toil levels for each team
Reduce Toil
1. Track toil levels for each team
2. Set toil limits for each team
Reduce Toil
1. Track toil levels for each team
2. Set toil limits for each team
3. Fund efforts to reduce toil (with emphasis on teams over toil limits)
Start a book club
Recap
SRE is more than a title
Leverage the Operations as a
Service design pattern
“Shift-Left” control and decision
making.
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Focus on removing silos and
queues
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Reduce toil to create capacity
to change
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Understand the forces
undermining SRE
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
Let’s talk…
@damonedwards
damon@rundeck.com
https://www.rundeck.com/oaas
Dive Deeper Into Operations as a Service:

Contenu connexe

Tendances

Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Rundeck
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Operations: The Last Mile
Operations: The Last Mile Operations: The Last Mile
Operations: The Last Mile Rundeck
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Rundeck
 
Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Rundeck
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps Rundeck
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
 
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Rundeck
 
Self-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensSelf-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensRundeck
 
Operations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsOperations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsRundeck
 
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Helping Ops Help You:  Development’s Role in Enabling Self-Service OperationsHelping Ops Help You:  Development’s Role in Enabling Self-Service Operations
Helping Ops Help You: Development’s Role in Enabling Self-Service OperationsRundeck
 
Mainframe Solutions Introduction
Mainframe Solutions IntroductionMainframe Solutions Introduction
Mainframe Solutions IntroductionMicro Focus
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and ArchitectureAdrian Cockcroft
 
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)dev2ops
 
8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go NutsEduards Sizovs
 
My History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioMy History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioAtlassian
 
examkiller 000-938
examkiller 000-938examkiller 000-938
examkiller 000-938jimenoon
 
The 7 Principles of DevOps and Cloud Applications
The 7 Principles of DevOps and Cloud ApplicationsThe 7 Principles of DevOps and Cloud Applications
The 7 Principles of DevOps and Cloud ApplicationsSolarWinds
 

Tendances (20)

Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Operations: The Last Mile
Operations: The Last Mile Operations: The Last Mile
Operations: The Last Mile
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises
 
Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
 
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)
 
Self-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensSelf-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still Happens
 
Operations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsOperations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOps
 
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Helping Ops Help You:  Development’s Role in Enabling Self-Service OperationsHelping Ops Help You:  Development’s Role in Enabling Self-Service Operations
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
 
Mainframe Solutions Introduction
Mainframe Solutions IntroductionMainframe Solutions Introduction
Mainframe Solutions Introduction
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and Architecture
 
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
 
8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts
 
My History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioMy History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to Studio
 
examkiller 000-938
examkiller 000-938examkiller 000-938
examkiller 000-938
 
The 7 Principles of DevOps and Cloud Applications
The 7 Principles of DevOps and Cloud ApplicationsThe 7 Principles of DevOps and Cloud Applications
The 7 Principles of DevOps and Cloud Applications
 

Similaire à Clearing the Way For SRE In the Enterprise

2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1Jorn Knuttila
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...Jorn Knuttila
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management Rundeck
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellAgile Software Community of India
 
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...India Scrum Enthusiasts Community
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7NUS-ISS
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityAndy Norton
 
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...IBM Rational software
 
DEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOpsDEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOpsAmazon Web Services
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQServiceRocket
 
Devops for Large Enterprises
Devops for Large EnterprisesDevops for Large Enterprises
Devops for Large EnterprisesMarcio Sete
 
Having the Correct Context for an Agile Transformation
Having the Correct Context for an Agile TransformationHaving the Correct Context for an Agile Transformation
Having the Correct Context for an Agile TransformationDerek Huether
 
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!LeanKanbanIndia
 
The Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseThe Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseAaron Griffith
 
Microservices, Microfrontends and Feature Teams
Microservices, Microfrontends and Feature TeamsMicroservices, Microfrontends and Feature Teams
Microservices, Microfrontends and Feature TeamsGiulio Roggero
 
Why Agile is Failing in Large Enterprises And What You Can Do About It
Why Agile is Failing in Large Enterprises And What You Can Do About ItWhy Agile is Failing in Large Enterprises And What You Can Do About It
Why Agile is Failing in Large Enterprises And What You Can Do About Itwjperez0629
 

Similaire à Clearing the Way For SRE In the Enterprise (20)

2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management
 
AgileCamp 2014 Track 1: Accelerating Agile Enterprise Adoption with Scaled Ag...
AgileCamp 2014 Track 1: Accelerating Agile Enterprise Adoption with Scaled Ag...AgileCamp 2014 Track 1: Accelerating Agile Enterprise Adoption with Scaled Ag...
AgileCamp 2014 Track 1: Accelerating Agile Enterprise Adoption with Scaled Ag...
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
 
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...
Opening the Mainframe world to Mobile Ecosystem in a seamless and beneficial ...
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agility
 
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...
Foundations of the Scaled Agile Framework: Be Agile. Scale Up. Stay Lean. And...
 
DEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOpsDEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOps
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
Devops for Large Enterprises
Devops for Large EnterprisesDevops for Large Enterprises
Devops for Large Enterprises
 
Having the Correct Context for an Agile Transformation
Having the Correct Context for an Agile TransformationHaving the Correct Context for an Agile Transformation
Having the Correct Context for an Agile Transformation
 
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!
Kanban India 2023 | Ravishankar N | Don’t implement SRE like this!
 
The Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseThe Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars Universe
 
Microservices, Microfrontends and Feature Teams
Microservices, Microfrontends and Feature TeamsMicroservices, Microfrontends and Feature Teams
Microservices, Microfrontends and Feature Teams
 
Why Agile is Failing in Large Enterprises And What You Can Do About It
Why Agile is Failing in Large Enterprises And What You Can Do About ItWhy Agile is Failing in Large Enterprises And What You Can Do About It
Why Agile is Failing in Large Enterprises And What You Can Do About It
 

Plus de Rundeck

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationRundeck
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckRundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckRundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4 Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuRundeck
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Rundeck
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Rundeck
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings Rundeck
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 

Plus de Rundeck (20)

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process Automation
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & Ansible
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control Policies
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in Rundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + Sensu
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 

Dernier

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Dernier (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Clearing the Way For SRE In the Enterprise

  • 1. Clearing the Way For SRE in the Enterprise Damon Edwards @damonedwards
  • 4. OpsBusiness Idea Shorter Time-to-Market Fast Feedback from Users Dev Ops Running Services Improved Quality Digital and DevOps Availability Auditing Security Compliance "Go faster!" “Open up!” “Lock it down!” “Great for Dev, but what about Ops?”
  • 5. Our transformation has largely ignored Ops. Any ideas? Have you heard of SRE? Google does it.
  • 10. ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com
  • 11. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View
  • 12. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View (False) SRE Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Our transformation has largely ignored Ops. Any ideas? Have you h Google Everything takes too long, cost too much, and break too often! Executive View
  • 13. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 14. Principles of SRE are what set SRE apart
  • 15. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences
  • 16. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences
  • 17. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 18. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops
  • 19. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences
  • 20. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 21. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 22. Principles of SRE are what set SRE apart Stephen Thorne At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 23. Forces That Undermine SRE Principles Silos Queues Excessive Toil Low Trust
  • 24. Forces That Undermine SRE Principles Silos Queues Excessive Toil Low Trust
  • 26. Backlog Information I need X PrioritiesTools Silos
  • 27. Backlog Information I need X PrioritiesTools Silos Backlog I do X Requests for X Silo A Information Priorities Silo B Tools
  • 28. Silos cause disconnects and mismatches Backlog Information I need X PrioritiesTools Backlog I do X Requests for X Silo A Information Priorities Silo B Tools Context Context Process Process Tooling Tooling Capacity Capacity
  • 29. 1 2 3 Silos Interfere with feedback loops
  • 30. 1 2 3 Silos Interfere with feedback loops Producer Consumer Ops Ops Ops
  • 31. Function A Function B Function C Silos create labor pools of functional specialists Requests fulfilled by semi- manual or manual effort Primary management focus is on protecting team capacity
  • 32. Silos Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload?
  • 33. Silos Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Disjointed silos make meaningful SLOs and shared responsibility almost impossible X
  • 34. Silos Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Disjointed silos make meaningful SLOs and shared responsibility almost impossible X Siloed labor pools, disconnected processes and tools, and slow feedback loops tend to consume all available capacity X
  • 35. Silos Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Disjointed silos make meaningful SLOs and shared responsibility almost impossible X Siloed labor pools, disconnected processes and tools, and slow feedback loops tend to consume all available capacity X Struggling to keep up with demand and unable to protect capacityX
  • 36. Forces That Undermine SRE Principles Silos Queues Toil Low Trust
  • 37. How do we cover for our cross-silo disconnects and mismatches? Silo A Silo B
  • 38. How do we cover for our cross-silo disconnects and mismatches? Silo A Silo B Ticket Queue
  • 39. ?? Silo A Silo B We all know how well that works Ticket Queue
  • 40. Request queues are an expensive way to manage work Ticket Queue Queues Create… Longer Cycle Time Increased Risk More Variability More Overhead Lower Quality Less Motivation Adapted from Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development
  • 41. What do queues do to value streams?
  • 42. What do queues do to value streams? Queue A Queue B
  • 43. What do queues do to value streams? Queue A Queue B Queues disintegrate and obfuscate value streams
  • 44. Tickets queues become “snowflake makers” ?? Silo A Silo B Ticket Queue
  • 45. Tickets queues become “snowflake makers” ?? Silo A Silo B Ticket Queue Snowflakes (each unique, technically acceptable but unreproducible and brittle)
  • 46. Ticket Queues Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload?
  • 47. Ticket Queues Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Tickets reinforce siloed behaviors and obfuscate the value stream X
  • 48. Ticket Queues Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Tickets reinforce siloed behaviors and obfuscate the value stream X Longer cycle time, more variability, more overhead, lower quality, and more snowflakes consume available capacity X
  • 49. Ticket Queues Undermine SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Tickets reinforce siloed behaviors and obfuscate the value stream X Longer cycle time, more variability, more overhead, lower quality, and more snowflakes consume available capacity X Queues obfuscate the pressure being put on request fulfillersX
  • 50. Forces That Undermine Operations Silos Queues Toil Low Trust
  • 51. Toil is the enemy of SRE
  • 52. Toil is the enemy of SRE “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau Google
  • 53. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 54. Excessive toil prevents fixing the system Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 55. Excessive toil prevents fixing the system Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 56. Excessive Toil Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload?
  • 57. Excessive Toil Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Buried in toil keeps team from contributing engineering work to uphold their end of the shared responsibility deal X
  • 58. Excessive Toil Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Buried in toil keeps team from contributing engineering work to uphold their end of the shared responsibility deal X Buried in toil… no capacity for engineering work to reduce toil.X
  • 59. Excessive Toil Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Buried in toil keeps team from contributing engineering work to uphold their end of the shared responsibility deal X Buried in toil… no capacity for engineering work to reduce toil.X Buried in toil… no capacity for engineering work to reduce toil.X
  • 60. Forces That Undermine Operations Silos Queues Toil Low Trust
  • 61. Where are decisions made? Who can take action? escalate 1° 2° 3° 4° escalate escalateor Decisions made here
  • 62. All work is contextual John Allspaw
  • 63. All work is contextual rm -rf $PATHNAME John Allspaw
  • 64. All work is contextual rm -rf $PATHNAME Is this dangerous? John Allspaw
  • 65. All work is contextual rm -rf $PATHNAME John Allspaw
  • 66. All work is contextual rm -rf $PATHNAME John Allspaw
  • 67. All work is contextual rm -rf $PATHNAME Is this dangerous? John Allspaw
  • 68. All work is contextual rm -rf $PATHNAME John Allspaw
  • 69. All work is contextual rm -rf $PATHNAME Answer is always “it depends” John Allspaw
  • 70. escalate 1° 2° 3° 4° escalate escalateor Context Where are decisions made? Who can take action?
  • 71. Low trust + approvals = illusion of control Ticket System
  • 72. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and
  • 73. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”)
  • 74. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”)
  • 75. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”)
  • 76. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”)
  • 77. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”)
  • 78. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”) How many are you left with?
  • 79. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”) How many are you left with? How many were the right call?
  • 80. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”) How many are you left with? How many were the right call? How many got rejected?
  • 81. Low trust + approvals = illusion of control Ticket System Add up the total number of approval requests and …subtract the info radiators (“I need to be in the loop”) …subtract the CYAs (“Prove you followed the process”) …subtract the too removed to judge (“mostly guessing”) How many are you left with? How many were the right call? How many got rejected?
  • 82. Low Trust Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload?
  • 83. Low Trust Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Cultures of low trust have a really difficult time with shared responsibility X
  • 84. Low Trust Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Cultures of low trust have a really difficult time with shared responsibility X People closest to problems know what to fix but tasking, priorities, and decisions are largely out of their control X
  • 85. Low Trust Undermines SRE Principles 1. Org has Service Level Objectives, with consequences? 2. SREs have time to make tomorrow better than today? 3. SRE teams have the ability to regulate their workload? Cultures of low trust have a really difficult time with shared responsibility X People closest to problems know what to fix but tasking, priorities, and decisions are largely out of their control X People aren’t trusted to plan or design their own workX
  • 86. Forces That Undermine Operations Silos Queues Toil Low Trust
  • 87. So what can we do differently?
  • 88. Lean on Lean to find what to fix PD TS W EP M M M TS ? PD TS W EP M M M TS ? Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Map the end-to-end flow of information and artifacts (using a recent delivery or event) Identify what slows lead times, undermines quality, and impacts flow 1 2 3 Identify countermeasures and create improvement storyboards (justification/plan)
  • 89. Lean on Lean to find what to fix PD TS W EP M M M TS ? PD TS W EP M M M TS ? Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Map the end-to-end flow of information and artifacts (using a recent delivery or event) Identify what slows lead times, undermines quality, and impacts flow 1 2 3 Identify countermeasures and create improvement storyboards (justification/plan) All processes should be studied with an improvement disciple
  • 90. Lean on Lean to find what to fix PD TS W EP M M M TS ? PD TS W EP M M M TS ? Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Map the end-to-end flow of information and artifacts (using a recent delivery or event) Identify what slows lead times, undermines quality, and impacts flow 1 2 3 Identify countermeasures and create improvement storyboards (justification/plan) All processes should be studied with an improvement disciple Incidents are just as much a “process” as delivery
  • 91. Lean on Lean to find what to fix PD TS W EP M M M TS ? PD TS W EP M M M TS ? Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Map the end-to-end flow of information and artifacts (using a recent delivery or event) Identify what slows lead times, undermines quality, and impacts flow 1 2 3 Identify countermeasures and create improvement storyboards (justification/plan) All processes should be studied with an improvement disciple Incidents are just as much a “process” as delivery Look to Lean for proven improvement techniques (value stream mapping, waste analysis, improvement kata)
  • 92. Lean on Lean to find what to fix PD TS W EP M M M TS ? PD TS W EP M M M TS ? Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Countermeasure Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Lorem ipsum dolor In aliquet rhoncus urna. Proin eget diam volutpat. Map the end-to-end flow of information and artifacts (using a recent delivery or event) Identify what slows lead times, undermines quality, and impacts flow 1 2 3 Identify countermeasures and create improvement storyboards (justification/plan) All processes should be studied with an improvement disciple Incidents are just as much a “process” as delivery Look to Lean for proven improvement techniques (value stream mapping, waste analysis, improvement kata) Make it a part of your organization’s discipline
  • 93. Get rid of as many silos as possible Old Silo A Old Silo B Old Silo C Old Silo D
  • 94. Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Get rid of as many silos as possible
  • 95. Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Get rid of as many silos as possible Key 1: get rid of as many handoffs as possible
  • 96. Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Get rid of as many silos as possible Key 2: “Horizontal” shared responsibility, not everyone do everything! Key 1: get rid of as many handoffs as possible
  • 97. Shared responsibility matters more than org model Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Development Team 1 Development Team 2 Development Team n SRE Team Clear handoff requirements Error budget consequences “Netflix" Model “Google” Model
  • 98. Shared responsibility matters more than org model Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Development Team 1 Development Team 2 Development Team n SRE Team Clear handoff requirements Error budget consequences “Netflix" Model “Google” Model
  • 99. Shared responsibility matters more than org model Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Development Team 1 Development Team 2 Development Team n SRE Team Clear handoff requirements Error budget consequences “Netflix" Model “Google” Model Same high-quality, high-velocity results!
  • 100. Why focus on getting rid of handoffs?
  • 101. Why focus on getting rid of handoffs? 1. Your people are your most valuable assets
  • 102. Why focus on getting rid of handoffs? 1. Your people are your most valuable assets 2. The SRE skillset is expensive
  • 103. Why focus on getting rid of handoffs? 1. Your people are your most valuable assets 2. The SRE skillset is expensive 3. Stay out of their way!
  • 104. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This:
  • 105. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This: Observe Orient Decide Action SRE OODA Loop Reduce friction:
  • 106. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This: Invest in the right instrumentation Observe Orient Decide Action SRE OODA Loop Reduce friction:
  • 107. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This: Invest in the right instrumentation Invest in collaboration, checklists, investigatory tools Observe Orient Decide Action SRE OODA Loop Reduce friction:
  • 108. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This: Invest in the right instrumentation Invest in collaboration, checklists, investigatory tools Empower them to make decisions! Observe Orient Decide Action SRE OODA Loop Reduce friction:
  • 109. SREs are expensive, stay out of their way! Ticket Queue ✅Ticket Queue Ticket Queue Ticket Queue Backlog Ticket Queue Ticket Queue ✅ Backlog Not this: This: Invest in the right instrumentation Invest in collaboration, checklists, investigatory tools Empower them to make decisions! Empower them to take action! Observe Orient Decide Action SRE OODA Loop Reduce friction:
  • 110. What about the handoffs you can’t get rid of? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities
  • 111. What about the handoffs you can’t get rid of? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities Ticket Queue Ticket Queue Ticket Queue
  • 112. What about the handoffs you can’t get rid of? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities Ticket Queue Ticket Queue Ticket Queue Ticket Queue Ticket Queue Ticket Queue
  • 113. Operations as a Service: Turn handoffs into self-service Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist
  • 114. Development Team 1 Development Team 2 Development Team n Ops/SRE Team Operations as a Service On Demand On Demand On Demand On Demand Ops (builds & operates) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Operations as a Service: Works with any org model
  • 115. Operations as a Service: Popular Uses for SRE Environment "I could fix it, if I could get to it”
  • 116. Operations as a Service: Popular Uses for SRE Environment "I could fix it, if I could get to it” Environment O a a S
  • 117. Operations as a Service: Popular Uses for SRE “Avoiding the dogpile” I think its a problem with dbcluster07-store2.uswest.acme dbcluster07- store2.uswest. acme “$ top” “$ top” “$ top” “$ top” “$ top” “$ top”“$ top”
  • 118. Operations as a Service: Popular Uses for SRE “Avoiding the dogpile” I think its a problem with dbcluster07-store2.uswest.acme dbcluster07- store2.uswest. acme “$ top” “$ top” “$ top” “$ top” “$ top” “$ top”“$ top” I think its a problem with dbcluster07-store2.uswest.acme dbcluster07- store2.uswest. acme “$ top” “Healthcheck store2 - all” OaaS
  • 119. “I don’t read wikis. I’m an expert.” docs Service has changed. This flag is now required or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this. Environment docs Later… Operations as a Service: Popular Uses for SRE
  • 120. “I don’t read wikis. I’m an expert.” docs Service has changed. This flag is now required or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this. Environment docs Later… OaaS Service has changed. This flag is now required or bad things will happen! Pause monitoring first or we all get woken up! “restart” I’ve done this before. I’ve got this. Environment Later… Update Restart Job ✅ OaaS Operations as a Service: Popular Uses for SRE
  • 121. Operations as a Service: Popular Uses for SRE “Uneven and hidden skills” I don’t know how to do X. I know how to do X. I know how to do Y. I don’t know how to do Y.
  • 122. Operations as a Service: Popular Uses for SRE “Uneven and hidden skills” I don’t know how to do X. I know how to do X. I know how to do Y. I don’t know how to do Y. OaaS “Do X” “Define Y Procedure” “Define X Procedure” “Do Y” “Do X+Y”
  • 123. “Let me do that for you again… and again” Done. I need you to do X Later… Ticket Other work Done. I need you to do X Later… Ticket Other work Sigh..Done. I need you to do X Ticket Other work Operations as a Service: Popular Uses for SRE
  • 124. “Let me do that for you again… and again” Done. I need you to do X Later… Ticket Other work Done. I need you to do X Later… Ticket Other work Sigh..Done. I need you to do X Ticket Other work OaaS Do X Later… Other work 1 Later… Other work 2 Other work 3 Do X Do X OaaS OaaS Operations as a Service: Popular Uses for SRE
  • 125. Use tickets only for what they are good for Ticket System
  • 126. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions Ticket System
  • 127. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions 2.Routing for necessary approvals Ticket System
  • 128. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions 2.Routing for necessary approvals Not as a general purpose work management system! Ticket System
  • 129. But won’t Security or Compliance stop you? Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Build-in Security Here Build-in Compliance Here
  • 130. But what about ITIL® ?
  • 131. But what about ITIL® ? • Ask ITIL people and they say SRE is ITIL compatible
  • 132. But what about ITIL® ? • Ask ITIL people and they say SRE is ITIL compatible • Ask people who have seen ITIL implemented and they say “how?”
  • 133. But what about ITIL® ? • Ask ITIL people and they say SRE is ITIL compatible • Ask people who have seen ITIL implemented and they say “how?” • Agile+DevOps+SRE have self-regulation and shared responsibility features that seem to undermine ITIL command and control nature
  • 134. But what about ITIL® ? • Ask ITIL people and they say SRE is ITIL compatible • Ask people who have seen ITIL implemented and they say “how?” • Agile+DevOps+SRE have self-regulation and shared responsibility features that seem to undermine ITIL command and control nature • ITIL “Standard Change” is often focus of discussion, but it still implies approval model
  • 135. But what about ITIL® ? • Ask ITIL people and they say SRE is ITIL compatible • Ask people who have seen ITIL implemented and they say “how?” • Agile+DevOps+SRE have self-regulation and shared responsibility features that seem to undermine ITIL command and control nature • ITIL “Standard Change” is often focus of discussion, but it still implies approval model • Straight talk: are we doing contortions to defend a sunk cost?
  • 136. “Shift Left” the ability to take action escalate 1° 2° 3° 4° escalate escalateor
  • 137. “Shift Left” the ability to take action Push the ability to take action this direction escalate 1° 2° 3° 4° escalate escalateor
  • 138. “Shift Left” the ability to take action Push the ability to take action this direction escalate 1° 2° 3° 4° escalate escalateor OaaS Enablement and tooling
  • 140. Reduce Toil 1. Track toil levels for each team
  • 141. Reduce Toil 1. Track toil levels for each team 2. Set toil limits for each team
  • 142. Reduce Toil 1. Track toil levels for each team 2. Set toil limits for each team 3. Fund efforts to reduce toil (with emphasis on teams over toil limits)
  • 143. Start a book club
  • 144. Recap SRE is more than a title Leverage the Operations as a Service design pattern “Shift-Left” control and decision making. Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Focus on removing silos and queues Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Reduce toil to create capacity to change Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Understand the forces undermining SRE ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com