SlideShare une entreprise Scribd logo
1  sur  113
Télécharger pour lire hors ligne
SRE for Everyone:
Making Tomorrow Better Than Today

Damon Edwards

@damonedwards
2019
Not that far away, maybe in a company just like yours…
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Ticket
Ticket
Project A
···
Project B
···
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Waiting in ticket queues for everything.
Not that far away, maybe in a company just like yours…
Waiting in ticket queues for everything.
Ticket
Not that far away, maybe in a company just like yours…
Waiting in ticket queues for everything.
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Not that far away, maybe in a company just like yours…
Things break. Break again. And again.
Later…
Later…
same
same
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Not that far away, maybe in a company just like yours…
Everyone is busy, but it doesn’t get any better.
Improvement
Project
Business
Delivery
Incidents
Business
Delivery
Business
Delivery
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Everything takes too long, costs
too much, and breaks too often!
Executives

Have you heard of SRE?
Google does it.
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
SRE
Have you heard of SRE?
Google does it.
Jane Doe
Systems Administrator
Jane Doe
Systems Administrator
We have
SysAdmins
Jane Doe
Systems Administrator
They should be
SREs!
Jane Doe
SRE
They should be
SREs!
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
Sys
Admin
CAB CALENDAR
our new title is SRE.
w write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive

View
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive

View
SRE (new name)
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
Our transformation has largely
ignored Ops. Any ideas?
Have you h
Google
Everything takes too
long, cost too much, and
break too often!
Executive

View
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Not SRE
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
SRE is a rethinking of how Operations work gets
done.
Principles are what makes SRE different
Principles are what makes SRE different
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO takes priority!!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Toil: Name For a Problem We’ve All Felt
Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau

Google
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Downward spiral is inevitable!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
Example:
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
“?!?”
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Where to start (the practical approach)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Why focus on reducing toil?
Why focus on reducing toil?
1. Lots of value independent of “SRE”
2. Your people are you most expensive assets

… stay out of their way!
Why focus on reducing toil?
1. Lots of value independent of “SRE”
Start reducing toil today
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Track toil levels for each team
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
• Don’t get lost in time tracking weeds!
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
2. Set toil limit for each team (50% is conventional wisdom)
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Michael Kehoe

Todd Palino 

(LinkedIn)

At SREcon Americas 2019

Example
Process
“Code Yellow”
Where to focus?
Toil
Where to focus?
Toil
Reduce
Technical Debt
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Eliminate Interruptions
Eliminate Waiting
Eliminate Interruptions
Eliminate Waiting
Self-Service
Do X.
Eliminate Interruptions
Eliminate Waiting
Self-Service
Do X.
… and a lot less toil
How to enable self-service?
Empower teams to spot and fix the anti-patterns.
“Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
After
I’ve got this!
Environment
Self-
Service
“The dog-pile.”
!!
I think its a problem with
db07-store2.uswest.acme
“$ top”
“$ top”
db07store2.
uswest.acme
“$ top”
“$ top”
“$ top”
!!
“$ top”
!!
!!
!!
healthcheck
store2 -all
db07store2.
uswest.acme
Self-Service
1.
2.
3.
I think its a problem with
db07-store2.uswest.acme
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart”
Environment
Later…
Update
Restart Job
✅
I’ve done this before.
I’ve got this.
Self-Service
Self-Service
After
“Known issue… doesn’t get permanent fix”
“Known issue… doesn’t get permanent fix”
Self-Service Operations Design Pattern (in a nutshell)
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Build in security
and compliance
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
3.Dynamic resource model

(Up-to-date details of your environment)
Self-Service can also be a foundation
for strategic initiatives
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
• Reduced MTTR by 92%

• Reduced escalations by 50%

• Reduced overall support costs by 55%
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
12 months: 

• Saved 28 person years of time

• 13,000+ ops tasks in privileged environments that
didn’t require a review

• ~200 less customer impacting events
Recap: Make Tomorrow Better Than Today
SRE is more than a title
Be practical and start focusing
on toil
Find and fix toil anti-patterns
Error Budgets and Toil Limits
Apply Self-Service Operations
design pattern
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
SRE is a new way to think
about Ops work
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
1. SRE needs Service Level
Objectives, with consequences

2. SREs have time to make
tomorrow better than today

3. SRE teams have the ability to
regulate their workload
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
Consumer of
Ops Capabilities
Self-Service Operation
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Toil
Let’s talk…
@damonedwards
damon@rundeck.com

Contenu connexe

Tendances

SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)Hussain Mansoor
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationDr Ganesh Iyer
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisClément Michaud
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewDr Ganesh Iyer
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLADr Ganesh Iyer
 
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...Alan McSweeney
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceFranklin Angulo
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkMichae Blakeney
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRESquadcast Inc
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...Red Gate Software
 

Tendances (20)

SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE Paris
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...
The First 100 Days for a New CIO - Using the Innovation Value Institute IT Ca...
 
SRE 101
SRE 101SRE 101
SRE 101
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning Talk
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...
Everything You Need to Know About the 2019 DORA Accelerate State of DevOps Re...
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 

Similaire à SRE for Everyone: Making Tomorrow Better Than Today

SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today  SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise Rundeck
 
Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Rundeck
 
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1Jorn Knuttila
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...Jorn Knuttila
 
Getting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxGetting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxAmauryMarque
 
Clean Code - 5
Clean Code - 5Clean Code - 5
Clean Code - 5Don Kim
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back togetherKris Buytaert
 
Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides SlideTeam
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellAgile Software Community of India
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
EVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentEVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentRandy Stuart
 
Agile v agility_v4_md
Agile v agility_v4_mdAgile v agility_v4_md
Agile v agility_v4_mdMarc Danziger
 
Competitor analysis
Competitor analysisCompetitor analysis
Competitor analysisNileshShaw
 
The 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being AgileThe 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being Agilelazygolfer
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsJosh Atwell
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management Rundeck
 

Similaire à SRE for Everyone: Making Tomorrow Better Than Today (20)

SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today  SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise
 
Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise
 
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
 
Getting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxGetting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptx
 
Clean Code - 5
Clean Code - 5Clean Code - 5
Clean Code - 5
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back together
 
Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
 
Introduction to Agile
Introduction to AgileIntroduction to Agile
Introduction to Agile
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
EVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentEVO Energy Consulting Brand Development
EVO Energy Consulting Brand Development
 
Agile v agility_v4_md
Agile v agility_v4_mdAgile v agility_v4_md
Agile v agility_v4_md
 
Competitor analysis
Competitor analysisCompetitor analysis
Competitor analysis
 
The 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being AgileThe 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being Agile
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management
 

Plus de Rundeck

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationRundeck
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckRundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckRundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4 Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuRundeck
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Rundeck
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Rundeck
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings Rundeck
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 

Plus de Rundeck (20)

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process Automation
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & Ansible
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control Policies
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in Rundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + Sensu
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

SRE for Everyone: Making Tomorrow Better Than Today

  • 1. SRE for Everyone: Making Tomorrow Better Than Today Damon Edwards @damonedwards 2019
  • 2.
  • 3. Not that far away, maybe in a company just like yours…
  • 4. Not that far away, maybe in a company just like yours… Overloaded. Constant firefighting. Ticket Ticket Project A ··· Project B ··· Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket DUE: Yesterday! DUE: Tomorrow! Ticket Ticket Ticket
  • 5. Waiting in ticket queues for everything. Not that far away, maybe in a company just like yours…
  • 6. Waiting in ticket queues for everything. Ticket Not that far away, maybe in a company just like yours…
  • 7. Waiting in ticket queues for everything. Ticket Ticket Ticket Ticket Ticket Ticket Not that far away, maybe in a company just like yours…
  • 8. Things break. Break again. And again. Later… Later… same same Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Not that far away, maybe in a company just like yours…
  • 9. Everyone is busy, but it doesn’t get any better. Improvement Project Business Delivery Incidents Business Delivery Business Delivery Not that far away, maybe in a company just like yours…
  • 10. Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours…
  • 11. Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Everything takes too long, costs too much, and breaks too often! Executives Have you heard of SRE? Google does it.
  • 12.
  • 13. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops”
  • 14. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops” SRE
  • 15. Have you heard of SRE? Google does it.
  • 20. ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com Sys Admin CAB CALENDAR our new title is SRE. w write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com
  • 21. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View
  • 22. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View SRE (new name) Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Our transformation has largely ignored Ops. Any ideas? Have you h Google Everything takes too long, cost too much, and break too often! Executive View
  • 23. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 24. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 25. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems
  • 26. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 27. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Not SRE Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 28. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 29. Changing job titles or adding individual skills doesn’t make systems administrators SREs. SRE is a rethinking of how Operations work gets done.
  • 30. Principles are what makes SRE different
  • 31. Principles are what makes SRE different Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 32. Principles are what makes SRE different 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 33. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 34. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 35. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops
  • 36. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops SLO takes priority!!
  • 37. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences
  • 38. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 39. Toil: Name For a Problem We’ve All Felt
  • 40. Toil: Name For a Problem We’ve All Felt “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau Google
  • 41. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 42. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 43. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 44. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Downward spiral is inevitable!
  • 45. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 46. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 47. SRE teams have the ability to regulate their workload
  • 48. SRE teams have the ability to regulate their workload Example:
  • 49. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right?
  • 50. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”)
  • 51. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”) “?!?”
  • 52. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 53. Where to start (the practical approach)
  • 54. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 55. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!)
  • 56. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!)
  • 57. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 58. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 59. Why focus on reducing toil?
  • 60. Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 61. 2. Your people are you most expensive assets
 … stay out of their way! Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 62. Start reducing toil today Toil
  • 63. Start reducing toil today 1. Track toil levels for each team Toil
  • 64. Start reducing toil today 1. Track toil levels for each team Toil
  • 65. Track toil levels for each team
  • 66. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”)
  • 67. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling
  • 68. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling • Don’t get lost in time tracking weeds!
  • 69. Start reducing toil today 1. Track toil levels for each team Toil
  • 70. Start reducing toil today 1. Track toil levels for each team Toil 2. Set toil limit for each team (50% is conventional wisdom)
  • 71. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil
  • 72. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil Michael Kehoe Todd Palino (LinkedIn) At SREcon Americas 2019 Example Process “Code Yellow”
  • 75. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes
  • 76. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 77. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 78.
  • 82. How to enable self-service? Empower teams to spot and fix the anti-patterns.
  • 83. “Do this for me, do it again, then do it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 84. “Do this for me, do it again, then do it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 85. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt
  • 86. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt After I’ve got this! Environment Self- Service
  • 87. “The dog-pile.” !! I think its a problem with db07-store2.uswest.acme “$ top” “$ top” db07store2. uswest.acme “$ top” “$ top” “$ top” !! “$ top” !! !! !! healthcheck store2 -all db07store2. uswest.acme Self-Service 1. 2. 3. I think its a problem with db07-store2.uswest.acme
  • 88. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 89. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 90. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart” Environment Later… Update Restart Job ✅ I’ve done this before. I’ve got this. Self-Service Self-Service After
  • 91. “Known issue… doesn’t get permanent fix”
  • 92. “Known issue… doesn’t get permanent fix”
  • 93. Self-Service Operations Design Pattern (in a nutshell) Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 94. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 95. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 96. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 97. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 98. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Build in security and compliance Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 99. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 100. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI)
  • 101. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI) 2. “Guardrails” (Smart options that helpfully constrain)
  • 102. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI) 2. “Guardrails” (Smart options that helpfully constrain) 3.Dynamic resource model
 (Up-to-date details of your environment)
  • 103. Self-Service can also be a foundation for strategic initiatives
  • 104. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF
  • 105. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower
  • 106. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower
  • 107. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower • Reduced MTTR by 92% • Reduced escalations by 50% • Reduced overall support costs by 55%
  • 108. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg
  • 109. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated
  • 110. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated LOB #1 LOB #2 LOB #3 LOB …n Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Self-Service ComplianceConsistency
  • 111. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated LOB #1 LOB #2 LOB #3 LOB …n Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Self-Service ComplianceConsistency 12 months: • Saved 28 person years of time • 13,000+ ops tasks in privileged environments that didn’t require a review • ~200 less customer impacting events
  • 112. Recap: Make Tomorrow Better Than Today SRE is more than a title Be practical and start focusing on toil Find and fix toil anti-patterns Error Budgets and Toil Limits Apply Self-Service Operations design pattern Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) SRE is a new way to think about Ops work ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt Consumer of Ops Capabilities Self-Service Operation On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge Toil