SlideShare une entreprise Scribd logo
SysAdmin to SRE:
Creating Capacity to Make Tomorrow Better Than Today
How Runbook Automation for Incident Management, and Other Self-Service Operations Practices
Can Ignite the way to True SRE Outcomes
jorn knuttila
@jorn_knuttila
Not that far away, maybe in a company just like yours…
🔥
Overloaded. Constant firefighting.
Ticket
Ticket
Project A
···
Project B
···
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
🔥
Waiting in ticket queues for everything.
Not that far away, maybe in a company just like yours…
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
🧨
Things break. Break again. And again.
Not that far away, maybe in a company just like yours…
Later…
Later…
same
same
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
⁉
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Improvement
Project
Business
Delivery
Incidents
Business
Delivery
Business
Delivery
🔥
Overloaded. Constant firefighting.
🔥
Waiting in ticket queues for everything.
🔥
Things break. Break again. And again.
🔥
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Everything takes too long, costs
too much, and breaks too often!
Executives
SRE
Have you heard of SRE?
Google does it.
Jane Doe
Systems Administrator
We have
SysAdmins
Jane Doe
SRE
We have
SysAdmins
They should be
SREs!
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break again.
And again.
Everyone is busy, but it
doesn’t get any better.
Everything takes too
long, cost too much, and
break too often!
Executive
View
SRE (new name)
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break again.
And again.
Everyone is busy, but it
doesn’t get any better.
Everything takes too
long, cost too much, and
break too often!
Executive
View
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Not SRE
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
SRE is a rethinking of how Operations work gets
done.
Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google
At DevOps Enterprise Summit
London 2018
“Principles of
SRE”
https://youtu.be/c-w_GYvi0eA
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
DEV
BIZ
Ops
SLO takes priority!!
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.” -Vivek Rau (Google)
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Downward spiral is inevitable!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
“?!?”
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.
Everybody wins!
2. Your people are you most expensive assets
… stay out of their way!
Why focus on reducing toil?
1. Lots of value independent of “SRE”
Super easy to get started reducing toil
1. Track toil levels for each team
2. Set toil limit for each team (50% is conventional wisdom)
3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
↳ Refactor apps, tools, and processes
↳ Apply self-service design pattern
How to enable self-service?
Empower teams to spot and fix the anti-patterns.
“Do this for me, do it again, then do it again.”
Toil Toil
“I could fix it, but I can’t get to it.”
Toil Toil
“The dog-pile.”
Toil Toil
What do people read even less than
documentation?
“I’m an expert, I don’t read the wiki.”
Toil Toil
“Dev work is more expensive than Ops work”
Toil
Toil
Self-Service AutomationSelf-Service Automation
for
all
your
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Build in security
and compliance
Define “guardrails” to
provide work safety
Recap: Creating Capacity to Make Tomorrow Better Than Today
SRE is more than a title
Be practical and start focusing
on toil
Find and fix toil anti-patterns
Error Budgets and Toil Limits
Apply Self-Service Operations
design pattern
SRE is a new way to think
about Ops work
1. SRE needs Service Level
Objectives, with consequences
2. SREs have time to make
tomorrow better than today
3. SRE teams have the ability to
regulate their workload
Toil
Let’s talk…
@jorn_knuttila
📧 jorn@rundeck.rocks

Contenu connexe

Similaire à 2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1

It's a startup life: from idea to execution.
It's a startup life: from idea to execution.It's a startup life: from idea to execution.
It's a startup life: from idea to execution.Miet Claes
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityAndy Norton
 
The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)dev2ops
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanHarald Steindl
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanHarald Unger
 
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...dev2ops
 
If you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereIf you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereNicole Forsgren
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)Matt Mower
 
Devops at scale is a hard problem challenges, insights and lessons learned
Devops at scale is a hard problem  challenges, insights and lessons learnedDevops at scale is a hard problem  challenges, insights and lessons learned
Devops at scale is a hard problem challenges, insights and lessons learnedkjalleda
 
Its not about the tooling
Its not about the toolingIts not about the tooling
Its not about the toolingBram Vogelaar
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech TalentRecruitingDaily.com LLC
 
A Self Funding Agile Transformation
A Self Funding Agile TransformationA Self Funding Agile Transformation
A Self Funding Agile TransformationDaniel Poon
 
Chasingwindmills agile success
Chasingwindmills agile successChasingwindmills agile success
Chasingwindmills agile successPaul Boos
 
Agile Practice in a DevOps World
Agile Practice in a DevOps WorldAgile Practice in a DevOps World
Agile Practice in a DevOps WorldMagnus Hedemark
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back togetherKris Buytaert
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklPROIDEA
 
Devops is not about Tooling
Devops is not about ToolingDevops is not about Tooling
Devops is not about ToolingKris Buytaert
 

Similaire à 2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1 (20)

Agile Coach Retreat - Montreal - Sep-2013
Agile Coach Retreat - Montreal - Sep-2013Agile Coach Retreat - Montreal - Sep-2013
Agile Coach Retreat - Montreal - Sep-2013
 
It's a startup life: from idea to execution.
It's a startup life: from idea to execution.It's a startup life: from idea to execution.
It's a startup life: from idea to execution.
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agility
 
The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master Plan
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master Plan
 
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
 
If you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereIf you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get there
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)
 
Devops at scale is a hard problem challenges, insights and lessons learned
Devops at scale is a hard problem  challenges, insights and lessons learnedDevops at scale is a hard problem  challenges, insights and lessons learned
Devops at scale is a hard problem challenges, insights and lessons learned
 
Its not about the tooling
Its not about the toolingIts not about the tooling
Its not about the tooling
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
 
A Self Funding Agile Transformation
A Self Funding Agile TransformationA Self Funding Agile Transformation
A Self Funding Agile Transformation
 
The Future of Work
The Future of WorkThe Future of Work
The Future of Work
 
Chasingwindmills agile success
Chasingwindmills agile successChasingwindmills agile success
Chasingwindmills agile success
 
Agile Practice in a DevOps World
Agile Practice in a DevOps WorldAgile Practice in a DevOps World
Agile Practice in a DevOps World
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back together
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
 
Devops is not about Tooling
Devops is not about ToolingDevops is not about Tooling
Devops is not about Tooling
 

Dernier

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 

Dernier (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1

  • 1. SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today How Runbook Automation for Incident Management, and Other Self-Service Operations Practices Can Ignite the way to True SRE Outcomes jorn knuttila @jorn_knuttila
  • 2. Not that far away, maybe in a company just like yours… 🔥 Overloaded. Constant firefighting. Ticket Ticket Project A ··· Project B ··· Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket DUE: Yesterday! DUE: Tomorrow! Ticket Ticket Ticket
  • 3. 🔥 Waiting in ticket queues for everything. Not that far away, maybe in a company just like yours… Ticket Ticket Ticket Ticket Ticket Ticket
  • 4. 🧨 Things break. Break again. And again. Not that far away, maybe in a company just like yours… Later… Later… same same Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt
  • 5. ⁉ Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Improvement Project Business Delivery Incidents Business Delivery Business Delivery
  • 6. 🔥 Overloaded. Constant firefighting. 🔥 Waiting in ticket queues for everything. 🔥 Things break. Break again. And again. 🔥 Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Everything takes too long, costs too much, and breaks too often! Executives
  • 7. SRE
  • 8. Have you heard of SRE? Google does it.
  • 11. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Everything takes too long, cost too much, and break too often! Executive View SRE (new name) Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Everything takes too long, cost too much, and break too often! Executive View
  • 12. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 13. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000 Not SRE Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 14. Changing job titles or adding individual skills doesn’t make systems administrators SREs. SRE is a rethinking of how Operations work gets done.
  • 15. Principles are what makes SRE different 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 16. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 17. SLO and Error Budgets: Tools for Shared Responsibility DEV BIZ Ops SLO takes priority!! 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 18. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 19. Toil: Name For a Problem We’ve All Felt “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau (Google)
  • 20. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 21. Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Excessive Toil Prevents Fixing the System Downward spiral is inevitable!
  • 22. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 23. SRE teams have the ability to regulate their workload What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”) “?!?”
  • 24. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil. Everybody wins!
  • 25. 2. Your people are you most expensive assets … stay out of their way! Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 26. Super easy to get started reducing toil 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil ↳ Refactor apps, tools, and processes ↳ Apply self-service design pattern
  • 27. How to enable self-service? Empower teams to spot and fix the anti-patterns.
  • 28. “Do this for me, do it again, then do it again.” Toil Toil
  • 29. “I could fix it, but I can’t get to it.” Toil Toil
  • 31. What do people read even less than documentation?
  • 32. “I’m an expert, I don’t read the wiki.” Toil Toil
  • 33. “Dev work is more expensive than Ops work” Toil Toil
  • 35. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Build in security and compliance Define “guardrails” to provide work safety
  • 36. Recap: Creating Capacity to Make Tomorrow Better Than Today SRE is more than a title Be practical and start focusing on toil Find and fix toil anti-patterns Error Budgets and Toil Limits Apply Self-Service Operations design pattern SRE is a new way to think about Ops work 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Toil