SlideShare une entreprise Scribd logo
1  sur  17
Put Some SRE in Your Shipped Software
Hard-won lessons from the world of SRE
Theo Schlossnagle
CEO, Circonus
@postwait
The nature of the problem:
Software Sucks.
Once you’ve run software at scale,
you have a deep understanding of
how it is all tied together with loose
string and hope.
We spend massive effort to
operationalize our stacks.
The burning question:
Why Ship
Software?The trends are there.
You can choose to see them or not.
Rules.
1. Crash landings should be both fast and controlled.
2. Post-mortems are fundamental.
3. Use circuit breakers.
4. Behavior is complex. Understand it.
5. Have a failure budget.
6. Instrumentation & observability have no equals.
Build upon the right layers
Crash Analysis
If you don’t know why it failed,
you dont’ know anything at all.
When in doubt or even curious
Expose Telemetry
Ideally, any question you would ask of
a production system can be done so
nondisruptively.
Known unknowns
Events &
Distributed
TracingA clearer story of what just happened.
During failure reconstruction,
logs hold truth
Logging for humans
Computers talking to computers have
better ways than logs. Logs are for
computers talking to humans.
Real unknown unknowns
are solved by:
Dynamic Tracing
bpftrace / eBPF DTrace
Your m8g, o11y need to be
accessible
Internalized MVP
No additional apparatus. No additional
deployment constraints
Shipping software means
more operators
Codify Operational
Assessment &
Procedures
More operators, less average
knowledge. Ensure procedures are
repeatable.
Tools -> Solutions
Every effort to bring
SRE techniques to
software engineering
makes SRE more
accessible and useful
in Cloud/SaaS
engineering.
Thank You.
Theo Schlossnagle
CEO, Circonus
@postwait

Contenu connexe

Similaire à Put Some SRE in Your Shipped Software

How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)Siglos
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoDeja vu Security
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...Daniel Kanchev
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application SecurityNicholas Davis
 
Testing for the deeplearning folks
Testing for the deeplearning folksTesting for the deeplearning folks
Testing for the deeplearning folksVishwas N
 
IcingaCamp Stockholm - How to make your monitoring shut up
IcingaCamp Stockholm - How to make your monitoring shut upIcingaCamp Stockholm - How to make your monitoring shut up
IcingaCamp Stockholm - How to make your monitoring shut upIcinga
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)ClubHack
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Approaches to unraveling a complex test problem
Approaches to unraveling a complex test problemApproaches to unraveling a complex test problem
Approaches to unraveling a complex test problemJohan Hoberg
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsMaritime DevCon
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowTechWell
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedis Labs
 
Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security
Daniel Lance - What "You've Got Mail" Taught Me About Cyber SecurityDaniel Lance - What "You've Got Mail" Taught Me About Cyber Security
Daniel Lance - What "You've Got Mail" Taught Me About Cyber SecurityEnergySec
 
A Digital Conversation: The Next Web
A Digital Conversation: The Next Web A Digital Conversation: The Next Web
A Digital Conversation: The Next Web Reading Room
 
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Nexus User Conference DevOps "Table Stakes": The minimum required to play the...
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Aaron Rinehart
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 

Similaire à Put Some SRE in Your Shipped Software (20)

How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for Echo
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application Security
 
Testing for the deeplearning folks
Testing for the deeplearning folksTesting for the deeplearning folks
Testing for the deeplearning folks
 
IcingaCamp Stockholm - How to make your monitoring shut up
IcingaCamp Stockholm - How to make your monitoring shut upIcingaCamp Stockholm - How to make your monitoring shut up
IcingaCamp Stockholm - How to make your monitoring shut up
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Approaches to unraveling a complex test problem
Approaches to unraveling a complex test problemApproaches to unraveling a complex test problem
Approaches to unraveling a complex test problem
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent Systems
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea Flow
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious Future
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security
Daniel Lance - What "You've Got Mail" Taught Me About Cyber SecurityDaniel Lance - What "You've Got Mail" Taught Me About Cyber Security
Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security
 
A Digital Conversation: The Next Web
A Digital Conversation: The Next Web A Digital Conversation: The Next Web
A Digital Conversation: The Next Web
 
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Nexus User Conference DevOps "Table Stakes": The minimum required to play the...
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
dist_systems.pdf
dist_systems.pdfdist_systems.pdf
dist_systems.pdf
 

Plus de Theo Schlossnagle

Plus de Theo Schlossnagle (20)

Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Put Some SRE in Your Shipped Software

  • 1. Put Some SRE in Your Shipped Software Hard-won lessons from the world of SRE
  • 3. The nature of the problem: Software Sucks. Once you’ve run software at scale, you have a deep understanding of how it is all tied together with loose string and hope. We spend massive effort to operationalize our stacks.
  • 4. The burning question: Why Ship Software?The trends are there. You can choose to see them or not.
  • 5. Rules. 1. Crash landings should be both fast and controlled. 2. Post-mortems are fundamental. 3. Use circuit breakers. 4. Behavior is complex. Understand it. 5. Have a failure budget. 6. Instrumentation & observability have no equals.
  • 6. Build upon the right layers Crash Analysis If you don’t know why it failed, you dont’ know anything at all.
  • 7.
  • 8. When in doubt or even curious Expose Telemetry Ideally, any question you would ask of a production system can be done so nondisruptively.
  • 9.
  • 10. Known unknowns Events & Distributed TracingA clearer story of what just happened.
  • 11.
  • 12. During failure reconstruction, logs hold truth Logging for humans Computers talking to computers have better ways than logs. Logs are for computers talking to humans.
  • 13. Real unknown unknowns are solved by: Dynamic Tracing bpftrace / eBPF DTrace
  • 14. Your m8g, o11y need to be accessible Internalized MVP No additional apparatus. No additional deployment constraints
  • 15. Shipping software means more operators Codify Operational Assessment & Procedures More operators, less average knowledge. Ensure procedures are repeatable. Tools -> Solutions
  • 16. Every effort to bring SRE techniques to software engineering makes SRE more accessible and useful in Cloud/SaaS engineering.
  • 17. Thank You. Theo Schlossnagle CEO, Circonus @postwait

Notes de l'éditeur

  1. Maybe change to : bpftrace / eBPF and/or DTrace