Put Some SRE in Your Shipped Software

•Télécharger en tant que PPTX, PDF•

1 j'aime•2,053 vues

Theo Schlossnagle

A highlight of SRE techniques appropriate for enhancing shipped software.

Technologie

Put Some SRE in Your Shipped Software
Hard-won lessons from the world of SRE

Theo Schlossnagle
CEO, Circonus
@postwait

The nature of the problem:
Software Sucks.
Once you’ve run software at scale,
you have a deep understanding of
how it is all tied together with loose
string and hope.
We spend massive effort to
operationalize our stacks.

The burning question:
Why Ship
Software?The trends are there.
You can choose to see them or not.

Rules.
1. Crash landings should be both fast and controlled.
2. Post-mortems are fundamental.
3. Use circuit breakers.
4. Behavior is complex. Understand it.
5. Have a failure budget.
6. Instrumentation & observability have no equals.

Build upon the right layers
Crash Analysis
If you don’t know why it failed,
you dont’ know anything at all.

When in doubt or even curious
Expose Telemetry
Ideally, any question you would ask of
a production system can be done so
nondisruptively.

Known unknowns
Events &
Distributed
TracingA clearer story of what just happened.

During failure reconstruction,
logs hold truth
Logging for humans
Computers talking to computers have
better ways than logs. Logs are for
computers talking to humans.

Real unknown unknowns
are solved by:
Dynamic Tracing
bpftrace / eBPF DTrace

Your m8g, o11y need to be
accessible
Internalized MVP
No additional apparatus. No additional
deployment constraints

Shipping software means
more operators
Codify Operational
Assessment &
Procedures
More operators, less average
knowledge. Ensure procedures are
repeatable.
Tools -> Solutions

Every effort to bring
SRE techniques to
software engineering
makes SRE more
accessible and useful
in Cloud/SaaS
engineering.

Thank You.
Theo Schlossnagle
CEO, Circonus
@postwait

Recommandé

Adding Simplicity to ComplexityTheo Schlossnagle

Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Pavlo Baron

Chaos Engineering 101: A Field Guidematthewbrahms

CraftsmanshipTheo Schlossnagle

Metric Abuse: Frequently Misused Metrics in OracleSteve Karam

Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom

Exploratory Testing in an Agile ContextElisabeth Hendrickson

Chaos Engineering Without Observability ... Is Just ChaosCharity Majors

Recommandé

Adding Simplicity to ComplexityTheo Schlossnagle

Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Pavlo Baron

Chaos Engineering 101: A Field Guidematthewbrahms

CraftsmanshipTheo Schlossnagle

Metric Abuse: Frequently Misused Metrics in OracleSteve Karam

Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom

Exploratory Testing in an Agile ContextElisabeth Hendrickson

Chaos Engineering Without Observability ... Is Just ChaosCharity Majors

How to Monitoring the SRE Golden Signals (E-Book)Siglos

Hushcon 2016 Keynote: Test for EchoDeja vu Security

Normal accidents and outpatient surgeriesJonathan Creasy

More Aim, Less Blame: How to use postmortems to turn failures into something ...Daniel Kanchev

Survey Presentation About Application SecurityNicholas Davis

Testing for the deeplearning folksVishwas N

IcingaCamp Stockholm - How to make your monitoring shut upIcinga

Mere Paas Teensy Hai (Nikhil Mittal)ClubHack

Prometheus - Open Source Forum JapanBrian Brazil

Approaches to unraveling a complex test problemJohan Hoberg

Angus Fletcher - Error Handling in Concurrent SystemsMaritime DevCon

What does "monitoring" mean? (FOSDEM 2017)Brian Brazil

Identify Development Pains and Resolve Them with Idea FlowTechWell

RedisConf17 - Observability and the Glorious FutureRedis Labs

Chaos engineering Alberto Acerbis

Daniel Lance - What "You've Got Mail" Taught Me About Cyber SecurityEnergySec

A Digital Conversation: The Next Web Reading Room

Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Aaron Rinehart

Prometheus (Prometheus London, 2016)Brian Brazil

dist_systems.pdfCherenetToma

Monitoring 101Theo Schlossnagle

Distributed Systems - Like It Or NotTheo Schlossnagle

Contenu connexe

Similaire à Put Some SRE in Your Shipped Software

How to Monitoring the SRE Golden Signals (E-Book)Siglos

Hushcon 2016 Keynote: Test for EchoDeja vu Security

Normal accidents and outpatient surgeriesJonathan Creasy

More Aim, Less Blame: How to use postmortems to turn failures into something ...Daniel Kanchev

Survey Presentation About Application SecurityNicholas Davis

Testing for the deeplearning folksVishwas N

IcingaCamp Stockholm - How to make your monitoring shut upIcinga

Mere Paas Teensy Hai (Nikhil Mittal)ClubHack

Prometheus - Open Source Forum JapanBrian Brazil

Approaches to unraveling a complex test problemJohan Hoberg

Angus Fletcher - Error Handling in Concurrent SystemsMaritime DevCon

What does "monitoring" mean? (FOSDEM 2017)Brian Brazil

Identify Development Pains and Resolve Them with Idea FlowTechWell

RedisConf17 - Observability and the Glorious FutureRedis Labs

Chaos engineering Alberto Acerbis

Daniel Lance - What "You've Got Mail" Taught Me About Cyber SecurityEnergySec

A Digital Conversation: The Next Web Reading Room

Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Aaron Rinehart

Prometheus (Prometheus London, 2016)Brian Brazil

dist_systems.pdfCherenetToma

Similaire à Put Some SRE in Your Shipped Software (20)

How to Monitoring the SRE Golden Signals (E-Book)

Hushcon 2016 Keynote: Test for Echo

Normal accidents and outpatient surgeries

More Aim, Less Blame: How to use postmortems to turn failures into something ...

Survey Presentation About Application Security

Testing for the deeplearning folks

IcingaCamp Stockholm - How to make your monitoring shut up

Mere Paas Teensy Hai (Nikhil Mittal)

Prometheus - Open Source Forum Japan

Approaches to unraveling a complex test problem

Angus Fletcher - Error Handling in Concurrent Systems

What does "monitoring" mean? (FOSDEM 2017)

Identify Development Pains and Resolve Them with Idea Flow

RedisConf17 - Observability and the Glorious Future

Chaos engineering

Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security

A Digital Conversation: The Next Web

Nexus User Conference DevOps "Table Stakes": The minimum required to play the...

Prometheus (Prometheus London, 2016)

dist_systems.pdf

Plus de Theo Schlossnagle

Monitoring 101Theo Schlossnagle

Distributed Systems - Like It Or NotTheo Schlossnagle

Applying SRE techniques to micro service designTheo Schlossnagle

SRECon Coherent PerformanceTheo Schlossnagle

Commandments of scaleTheo Schlossnagle

Adaptive availabilityTheo Schlossnagle

Project realityTheo Schlossnagle

Monitoring the #DevOps wayTheo Schlossnagle

Operational Software DesignTheo Schlossnagle

A Coherent Discussion About PerformanceTheo Schlossnagle

The math behind big systems analysis.Theo Schlossnagle

Understanding SlownessTheo Schlossnagle

OmniOS Motivation and Design ~ LISA 2012Theo Schlossnagle

Monitoring and observabilityTheo Schlossnagle

Omnios and unixTheo Schlossnagle

Monitoring and observabilityTheo Schlossnagle

Xtreme DeploymentTheo Schlossnagle

AtldevopsTheo Schlossnagle

It's all about telemetryTheo Schlossnagle

Is this normal?Theo Schlossnagle

Plus de Theo Schlossnagle (20)

Monitoring 101

Distributed Systems - Like It Or Not

Applying SRE techniques to micro service design

SRECon Coherent Performance

Commandments of scale

Adaptive availability

Project reality

Monitoring the #DevOps way

Operational Software Design

A Coherent Discussion About Performance

The math behind big systems analysis.

Understanding Slowness

OmniOS Motivation and Design ~ LISA 2012

Monitoring and observability

Omnios and unix

Monitoring and observability

Xtreme Deployment

Atldevops

It's all about telemetry

Is this normal?

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

A Year of the Servo Reboot: Where Are We Now?Igalia

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Real Time Object Detection Using Open CVKhem

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day PresentationMichael W. Hawkins

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

A Call to Action for Generative AI in 2024Results

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

A Year of the Servo Reboot: Where Are We Now?

A Domino Admins Adventures (Engage 2024)

Boost PC performance: How more available memory can improve productivity

Advantages of Hiring UIUX Design Service Providers for Your Business

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

CNv6 Instructor Chapter 6 Quality of Service

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Real Time Object Detection Using Open CV

Driving Behavioral Change for Information Management through Data-Driven Gree...

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

GenCyber Cyber Security Day Presentation

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Tata AIG General Insurance Company - Insurer Innovation Award 2024

What Are The Drone Anti-jamming Systems Technology?

A Call to Action for Generative AI in 2024

Put Some SRE in Your Shipped Software

1. Put Some SRE in Your Shipped Software Hard-won lessons from the world of SRE

2. Theo Schlossnagle CEO, Circonus @postwait

3. The nature of the problem: Software Sucks. Once you’ve run software at scale, you have a deep understanding of how it is all tied together with loose string and hope. We spend massive effort to operationalize our stacks.

4. The burning question: Why Ship Software?The trends are there. You can choose to see them or not.

5. Rules. 1. Crash landings should be both fast and controlled. 2. Post-mortems are fundamental. 3. Use circuit breakers. 4. Behavior is complex. Understand it. 5. Have a failure budget. 6. Instrumentation & observability have no equals.

6. Build upon the right layers Crash Analysis If you don’t know why it failed, you dont’ know anything at all.

8. When in doubt or even curious Expose Telemetry Ideally, any question you would ask of a production system can be done so nondisruptively.

10. Known unknowns Events & Distributed TracingA clearer story of what just happened.

11.

12. During failure reconstruction, logs hold truth Logging for humans Computers talking to computers have better ways than logs. Logs are for computers talking to humans.

13. Real unknown unknowns are solved by: Dynamic Tracing bpftrace / eBPF DTrace

14. Your m8g, o11y need to be accessible Internalized MVP No additional apparatus. No additional deployment constraints

15. Shipping software means more operators Codify Operational Assessment & Procedures More operators, less average knowledge. Ensure procedures are repeatable. Tools -> Solutions

16. Every effort to bring SRE techniques to software engineering makes SRE more accessible and useful in Cloud/SaaS engineering.

17. Thank You. Theo Schlossnagle CEO, Circonus @postwait

Notes de l'éditeur

Maybe change to : bpftrace / eBPF and/or DTrace