SlideShare une entreprise Scribd logo
1  sur  39
Lessons
Learned
from a
Parallel Universe
David N. Blank-Edelman
@otterbook
photo: Will Langenberg
SRE for
DevOps
David N. Blank-Edelman
@otterbook
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
SLO monitor decide
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
Observation #1:
Create virtuous and reinforcing
feedback loops
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
What Makes SRE, SRE (dramatic recreation)
• hire only coders
• have an SLA for your service
• measure and report performance against SLA
• Use Error Budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to DEV team
• Cap SRE operational load at 50%
• Share 5% of ops work with DEV team
• Oncall teams at least 8 people, or 6x2
• Maximum of 2 events per oncall shift
• Post mortem for every event
• Post mortems blameless and focus on process and technology, not people
Observation #2:
You can’t fire your way to reliable.
• Airbnb
• Amazon
• Apple
• Baidu
• Dropbox
• Etsy
• Facebook
• GitHub
• LinkedIn
•Microsoft
•Netflix
•Pinterest
•Spotify
•Stack Exchange
•Twitter
•Uber
•Yahoo!
•Yelp
• Airbnb
• Amazon
• Apple
• Baidu
• Dropbox
• Etsy
• Facebook
• GitHub
• LinkedIn
•Microsoft
•Netflix
•Pinterest
•Spotify
•Stack Exchange
•Twitter
•Uber
•Yahoo!
•Yelp
Facebook has Production Engineers
Facebook has Production Engineers
“Production Engineers at Facebook are hybrid
software/systems engineers who ensure that
Facebook's services run smoothly and have the
capacity for future growth. They are embedded in
every one of Facebook's product and infrastructure
teams, and are core participants in every significant
engineering effort underway in the company.”
Facebook has Production Engineers
“Production Engineers at Facebook are hybrid
software/systems engineers who ensure that
Facebook's services run smoothly and have the
capacity for future growth. They are embedded in
every one of Facebook's product and infrastructure
teams, and are core participants in every significant
engineering effort underway in the company.”
Facebook has Production Engineers
“Production Engineers at Facebook are hybrid
software/systems engineers who ensure that
Facebook's services run smoothly and have the
capacity for future growth. They are embedded
in every one of Facebook's product and infrastructure
teams, and are core participants in every significant
engineering effort underway in the company.”
Observation #3:
There is no one right way to
build an SRE team. But
there are wrong ways.
Facebook has Production Engineers
•~1-10 ratio
•18, 24, 36 months with a service
(not a S.W.A.T. team or ops monkeys)
•Also do Bootcamp
•Lead reports to Facebook head of engineering
(same person responsible for Ops and Eng)
Observation #4:
SRE requires management support
at the highest levels.
Production Engineering Maturity Model
•maturity of SW/PE team’s work
•maturity of level of service
•relationship status of PE/SWE teams
•stages:
• bootstrap/build out
• scale/initial deployment
• awesomize
Get in Touch!
www.apcera.com/dnb
dnb@apcera.com
@otterbook

Contenu connexe

Tendances

SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
DevOpsDays Tel Aviv
 

Tendances (20)

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness
Tis The Season: Load Testing Tips and Checklist for Retail Seasonal ReadinessTis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness
Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness
 
The Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit AmsterdamThe Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit Amsterdam
 
DevOps: The Key to IT Performance
DevOps: The Key to IT PerformanceDevOps: The Key to IT Performance
DevOps: The Key to IT Performance
 
DevOps Enterprise Summit 2016
DevOps Enterprise Summit 2016DevOps Enterprise Summit 2016
DevOps Enterprise Summit 2016
 
Continuous Testing: Preparing for DevOps
Continuous Testing: Preparing for DevOpsContinuous Testing: Preparing for DevOps
Continuous Testing: Preparing for DevOps
 
Continuous Delivery in a Legacy Shop—One Step at a Time
Continuous Delivery in a Legacy Shop—One Step at a TimeContinuous Delivery in a Legacy Shop—One Step at a Time
Continuous Delivery in a Legacy Shop—One Step at a Time
 
Are We There Yet? Signposts On Your Journey to Awesome
Are We There Yet? Signposts On Your Journey to AwesomeAre We There Yet? Signposts On Your Journey to Awesome
Are We There Yet? Signposts On Your Journey to Awesome
 
Continuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsContinuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOps
 
Vmware2021 why even devop nicolefv
Vmware2021 why even devop nicolefvVmware2021 why even devop nicolefv
Vmware2021 why even devop nicolefv
 
Advance ALM and DevOps Practices with Continuous Improvement
Advance ALM and DevOps Practices with Continuous ImprovementAdvance ALM and DevOps Practices with Continuous Improvement
Advance ALM and DevOps Practices with Continuous Improvement
 
DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management Talk
 
Quality Jam 2016 Product Roadmap
Quality Jam 2016 Product RoadmapQuality Jam 2016 Product Roadmap
Quality Jam 2016 Product Roadmap
 
How Continuous Delivery and Lean Management Make your DevOps Amazeballs
How Continuous Delivery and Lean Management Make your DevOps AmazeballsHow Continuous Delivery and Lean Management Make your DevOps Amazeballs
How Continuous Delivery and Lean Management Make your DevOps Amazeballs
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
 
What I learned from 5 years of sciencing the crap out of DevOps
What I learned from 5 years of sciencing the crap out of DevOpsWhat I learned from 5 years of sciencing the crap out of DevOps
What I learned from 5 years of sciencing the crap out of DevOps
 
DevOPs Transformation Workshop
DevOPs Transformation WorkshopDevOPs Transformation Workshop
DevOPs Transformation Workshop
 
DOES 2016 Sciencing the Crap Out of DevOps
DOES 2016 Sciencing the Crap Out of DevOpsDOES 2016 Sciencing the Crap Out of DevOps
DOES 2016 Sciencing the Crap Out of DevOps
 
Continuous Delivery + DevOps = Awesome
Continuous Delivery + DevOps = AwesomeContinuous Delivery + DevOps = Awesome
Continuous Delivery + DevOps = Awesome
 
QASymphony Atlanta Customer User Group Fall 2017
QASymphony Atlanta Customer User Group Fall 2017QASymphony Atlanta Customer User Group Fall 2017
QASymphony Atlanta Customer User Group Fall 2017
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 

En vedette

DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
Gene Kim
 
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at ScaleDOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
Gene Kim
 
DOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBsDOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBs
Gene Kim
 
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at VerizonDOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
Gene Kim
 
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
Gene Kim
 

En vedette (20)

DOES SFO 2016 - Paula Thrasher & Kevin Stanley - Building Brilliant Teams
DOES SFO 2016 - Paula Thrasher & Kevin Stanley - Building Brilliant Teams DOES SFO 2016 - Paula Thrasher & Kevin Stanley - Building Brilliant Teams
DOES SFO 2016 - Paula Thrasher & Kevin Stanley - Building Brilliant Teams
 
DOES SFO 2016 - David Habershon - Ministry of Social Development New Zealand
DOES SFO 2016 - David Habershon - Ministry of Social Development New ZealandDOES SFO 2016 - David Habershon - Ministry of Social Development New Zealand
DOES SFO 2016 - David Habershon - Ministry of Social Development New Zealand
 
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
 
DOES SFO 2016 - Courtney Kissler - Inspire and Nurture the Human Spirit
DOES SFO 2016 - Courtney Kissler - Inspire and Nurture the Human SpiritDOES SFO 2016 - Courtney Kissler - Inspire and Nurture the Human Spirit
DOES SFO 2016 - Courtney Kissler - Inspire and Nurture the Human Spirit
 
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at ScaleDOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
 
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the EnterpriseDOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
 
DOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBsDOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBs
 
DOES SFO 2016 - Steve Mayner - Transformational Leadership
DOES SFO 2016 - Steve Mayner - Transformational LeadershipDOES SFO 2016 - Steve Mayner - Transformational Leadership
DOES SFO 2016 - Steve Mayner - Transformational Leadership
 
DOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to OpenDOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to Open
 
DOES SFO 2016 - Mark Imbriaco - Lessons From the Bleeding Edge
DOES SFO 2016 - Mark Imbriaco - Lessons From the Bleeding EdgeDOES SFO 2016 - Mark Imbriaco - Lessons From the Bleeding Edge
DOES SFO 2016 - Mark Imbriaco - Lessons From the Bleeding Edge
 
DOES SFO 2016 - Marc Priolo - Are we there yet?
DOES SFO 2016 - Marc Priolo - Are we there yet? DOES SFO 2016 - Marc Priolo - Are we there yet?
DOES SFO 2016 - Marc Priolo - Are we there yet?
 
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
 
DOES SFO 2016 - Alexa Alley - Value Stream Mapping
DOES SFO 2016 - Alexa Alley - Value Stream MappingDOES SFO 2016 - Alexa Alley - Value Stream Mapping
DOES SFO 2016 - Alexa Alley - Value Stream Mapping
 
DOES SFO 2016 - Cornelia Davis - DevOps: Who Does What?
DOES SFO 2016 - Cornelia Davis - DevOps: Who Does What?DOES SFO 2016 - Cornelia Davis - DevOps: Who Does What?
DOES SFO 2016 - Cornelia Davis - DevOps: Who Does What?
 
DOES SFO 2016 - Michael Nygard - Tempo, Maneuverability, Initiative
DOES SFO 2016 - Michael Nygard - Tempo, Maneuverability, InitiativeDOES SFO 2016 - Michael Nygard - Tempo, Maneuverability, Initiative
DOES SFO 2016 - Michael Nygard - Tempo, Maneuverability, Initiative
 
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at VerizonDOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
DOES SFO 2016 - Ross Clanton and Chivas Nambiar - DevOps at Verizon
 
DOES SFO 2016 - Topo Pal - DevOps at Capital One
DOES SFO 2016 - Topo Pal - DevOps at Capital OneDOES SFO 2016 - Topo Pal - DevOps at Capital One
DOES SFO 2016 - Topo Pal - DevOps at Capital One
 
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
DOES SFO 2016 - Kaimar Karu - ITIL. You keep using that word. I don't think i...
 
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 

Similaire à DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe

Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different Approach
DVClub
 

Similaire à DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe (20)

S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Turning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational CapitalTurning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational Capital
 
sitHH16 - The Implications of Becoming Agile
sitHH16 - The Implications of Becoming AgilesitHH16 - The Implications of Becoming Agile
sitHH16 - The Implications of Becoming Agile
 
Own Your Own Impact: Incident Response at Airbnb [FutureStack16]
Own Your Own Impact: Incident Response at Airbnb [FutureStack16]Own Your Own Impact: Incident Response at Airbnb [FutureStack16]
Own Your Own Impact: Incident Response at Airbnb [FutureStack16]
 
The Rationale for Continuous Delivery
The Rationale for Continuous DeliveryThe Rationale for Continuous Delivery
The Rationale for Continuous Delivery
 
Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different Approach
 
All daydevops 2016 - Turning Human Capital into High Performance Organizati...
All daydevops   2016 - Turning Human Capital into High Performance Organizati...All daydevops   2016 - Turning Human Capital into High Performance Organizati...
All daydevops 2016 - Turning Human Capital into High Performance Organizati...
 
Having the Correct Context for an Agile Transformation
Having the Correct Context for an Agile TransformationHaving the Correct Context for an Agile Transformation
Having the Correct Context for an Agile Transformation
 
Technical Capabilities as enabler for Agile and DevOps
Technical Capabilities as enabler for Agile and DevOpsTechnical Capabilities as enabler for Agile and DevOps
Technical Capabilities as enabler for Agile and DevOps
 
Tom Gilb - Power to the Programmers @ I T.A.K.E. Unconference 2014, Bucharest
Tom Gilb - Power to the Programmers @ I T.A.K.E. Unconference 2014, BucharestTom Gilb - Power to the Programmers @ I T.A.K.E. Unconference 2014, Bucharest
Tom Gilb - Power to the Programmers @ I T.A.K.E. Unconference 2014, Bucharest
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run it
 
What a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysisWhat a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysis
 
My Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOpsMy Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOps
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
 
Jonny wooldridge DevOps Large and Small
Jonny wooldridge DevOps Large and SmallJonny wooldridge DevOps Large and Small
Jonny wooldridge DevOps Large and Small
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated Testing
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery
 
Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 

Plus de Gene Kim

Plus de Gene Kim (9)

DOES SFO 2016 - Steve Brodie - The Future of DevOps in the Enterprise
DOES SFO 2016 - Steve Brodie - The Future of DevOps in the EnterpriseDOES SFO 2016 - Steve Brodie - The Future of DevOps in the Enterprise
DOES SFO 2016 - Steve Brodie - The Future of DevOps in the Enterprise
 
DOES SFO 2016 - Aimee Bechtle - Utilizing Distributed Dojos to Transform a Wo...
DOES SFO 2016 - Aimee Bechtle - Utilizing Distributed Dojos to Transform a Wo...DOES SFO 2016 - Aimee Bechtle - Utilizing Distributed Dojos to Transform a Wo...
DOES SFO 2016 - Aimee Bechtle - Utilizing Distributed Dojos to Transform a Wo...
 
DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...
DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...
DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...
 
DOES SFO 2016 - Andy Cooper & Brandon Holcomb - When IT Closes the Deal
DOES SFO 2016 - Andy Cooper & Brandon Holcomb - When IT Closes the DealDOES SFO 2016 - Andy Cooper & Brandon Holcomb - When IT Closes the Deal
DOES SFO 2016 - Andy Cooper & Brandon Holcomb - When IT Closes the Deal
 
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
 
DOES SFO 2016 - Sam Guckenheimer & Ed Blankenship "Moving to One Engineering ...
DOES SFO 2016 - Sam Guckenheimer & Ed Blankenship "Moving to One Engineering ...DOES SFO 2016 - Sam Guckenheimer & Ed Blankenship "Moving to One Engineering ...
DOES SFO 2016 - Sam Guckenheimer & Ed Blankenship "Moving to One Engineering ...
 
DOES16 San Francisco - Opal Perry - Technology Transformation: How Team Value...
DOES16 San Francisco - Opal Perry - Technology Transformation: How Team Value...DOES16 San Francisco - Opal Perry - Technology Transformation: How Team Value...
DOES16 San Francisco - Opal Perry - Technology Transformation: How Team Value...
 
DOES16 San Francisco - Dominica DeGrandis - Time Theft: How Hidden and Unplan...
DOES16 San Francisco - Dominica DeGrandis - Time Theft: How Hidden and Unplan...DOES16 San Francisco - Dominica DeGrandis - Time Theft: How Hidden and Unplan...
DOES16 San Francisco - Dominica DeGrandis - Time Theft: How Hidden and Unplan...
 
DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...
DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...
DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe

  • 1. Lessons Learned from a Parallel Universe David N. Blank-Edelman @otterbook photo: Will Langenberg
  • 2. SRE for DevOps David N. Blank-Edelman @otterbook
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems are blameless and focus on process and technology, not people
  • 17. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems are blameless and focus on process and technology, not people
  • 19. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems are blameless and focus on process and technology, not people
  • 20. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems are blameless and focus on process and technology, not people
  • 21. Observation #1: Create virtuous and reinforcing feedback loops
  • 22. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems are blameless and focus on process and technology, not people
  • 23. What Makes SRE, SRE (dramatic recreation) • hire only coders • have an SLA for your service • measure and report performance against SLA • Use Error Budgets and gate launches on them • Common staffing pool for SRE and DEV • Excess Ops work overflows to DEV team • Cap SRE operational load at 50% • Share 5% of ops work with DEV team • Oncall teams at least 8 people, or 6x2 • Maximum of 2 events per oncall shift • Post mortem for every event • Post mortems blameless and focus on process and technology, not people
  • 24. Observation #2: You can’t fire your way to reliable.
  • 25.
  • 26. • Airbnb • Amazon • Apple • Baidu • Dropbox • Etsy • Facebook • GitHub • LinkedIn •Microsoft •Netflix •Pinterest •Spotify •Stack Exchange •Twitter •Uber •Yahoo! •Yelp
  • 27. • Airbnb • Amazon • Apple • Baidu • Dropbox • Etsy • Facebook • GitHub • LinkedIn •Microsoft •Netflix •Pinterest •Spotify •Stack Exchange •Twitter •Uber •Yahoo! •Yelp
  • 28.
  • 30.
  • 31. Facebook has Production Engineers “Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
  • 32. Facebook has Production Engineers “Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
  • 33. Facebook has Production Engineers “Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
  • 34. Observation #3: There is no one right way to build an SRE team. But there are wrong ways.
  • 35. Facebook has Production Engineers •~1-10 ratio •18, 24, 36 months with a service (not a S.W.A.T. team or ops monkeys) •Also do Bootcamp •Lead reports to Facebook head of engineering (same person responsible for Ops and Eng)
  • 36. Observation #4: SRE requires management support at the highest levels.
  • 37. Production Engineering Maturity Model •maturity of SW/PE team’s work •maturity of level of service •relationship status of PE/SWE teams •stages: • bootstrap/build out • scale/initial deployment • awesomize
  • 38.

Notes de l'éditeur

  1. toil
  2. deployment (work in production, instrumentation, canary, new features), awesomize (outliers last 5%, productionized)