SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Theo Schlossnagle, Founder @Circonus, @postwait on Twitter
Operational Software
Ignorance
Buys
Pain
https://www.flickr.com/photos/hoyvinmayvin/4906678960
Tenet #1
Designing for the unknown
should never burden today
Compromise when forced.
Design with tomorrow’s expected volume,
but today’s functional requirements.
https://www.flickr.com/photos/alexander/20090860
Tenet #2
Never be left curious as to
what your software is doing
Observability is the only fast-path to success.
https://www.flickr.com/photos/jox1989/4764186425
Tenet #3
Understand what your
software was doing
This is one of the trickiest tenets.
You can’t log every instruction performed, every packet
sent & received, every context switch, every I/O.
Log things that are infrequent,
measure things that are frequent.
What’s frequent? I said this was tricky.
https://www.flickr.com/photos/guest_family/6175062186
Tenet #4
Understand internal failures
The only thing worse than a malfunction in production,
is one without sufficient actionable data.
Core dump, stack trace analysis, etc.
These can wake people up.
https://www.flickr.com/photos/skipthefiller/2451481428
Tenet #5
Operator remediation of an
external failure is a bug
A system failing, a disk failing, a network partition, etc.
Once the external failure is remediated, no human being
should be involved in returning the (internal) system to
correct operation.
https://www.flickr.com/photos/timzim/2308099322
Tenet #6 & #7
Tight coupling

reduces resiliency
Loose coupling

reduces debug-ability
Tenet #8
Avoid difficult problems

if possible
One does not “just add” replication, consensus and fault
tolerance into systems.
Never solve a harder problem than you are presented.
The best solution to a problem is to remove the problem.
https://www.flickr.com/photos/shakestercody/2124972276
A tail of two systems
Queueing
Storage
https://www.flickr.com/photos/skynoir/8300610952
From RabbitMQ to Fq
RabbitMQ: Oh, the outages we’ve had.
violates #1, #2, #3, #5, #8
Fq:
(#1) build only semantics we need
(#2/#3) DTrace probes
(#4) cores dumps and backtrace.io
(#5) handled by simplicity of design and use
(#6/#7) push decoupling out, gain debug-ability
(#8) punt clustering downstream to clients
https://www.flickr.com/photos/fotologic/2165675515
ns level deep routing
From Postgres to Snowth
Postgres 8… hacked in a column store using arrays
Tenet #3: build pg_statsd
Tenet #5: … build Snowth
Snowth: ring topology time-series store
(#1/#8) only commutative operations,
(#2) DTrace probes
(#3) Circonus itself (statsd, histograms, etc.)
(#4) backtrace.io
(#7) implemented Zipkin (Dapper-like) tracing
quartile bands
“mvalue” best-guess modes

Contenu connexe

Tendances

Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesDynatrace
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guidematthewbrahms
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to ProductionKarthik Gaekwad
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Alex Cachia
 
Casino In The Clouds
Casino In The CloudsCasino In The Clouds
Casino In The Cloudsgojkoadzic
 
Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Dynatrace
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Stefan Zier
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About PerformanceTheo Schlossnagle
 
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgThe Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgDevSecOpsSg
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
 
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Institut Lean France
 
Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015James Wickett
 
AppSec is Eating Security
AppSec is Eating SecurityAppSec is Eating Security
AppSec is Eating SecurityAlex Stamos
 
4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to knowDynatrace
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachVMware Tanzu
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot teamThoughtworks
 
Chaos Driven Development
Chaos Driven DevelopmentChaos Driven Development
Chaos Driven DevelopmentBruce Wong
 

Tendances (20)

Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
Chaos engineering intro
Chaos engineering introChaos engineering intro
Chaos engineering intro
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to Production
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...
 
Casino In The Clouds
Casino In The CloudsCasino In The Clouds
Casino In The Clouds
 
Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgThe Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the Way
 
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
 
Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015
 
AppSec is Eating Security
AppSec is Eating SecurityAppSec is Eating Security
AppSec is Eating Security
 
4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps Approach
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot team
 
Chaos Driven Development
Chaos Driven DevelopmentChaos Driven Development
Chaos Driven Development
 

En vedette

Pack iTS case study analysis
Pack iTS case study analysisPack iTS case study analysis
Pack iTS case study analysisSumit Singh
 
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Avere Systems
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architectureSiddharth Singh
 
Starbucks organisational culture
Starbucks organisational cultureStarbucks organisational culture
Starbucks organisational cultureJoel Sebastian
 
Google Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideGoogle Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideJimmy Jay
 
Common terminologies of obstetrics
Common terminologies of obstetricsCommon terminologies of obstetrics
Common terminologies of obstetricsZeeshan Khan
 
The Payments Value Chain
The Payments Value ChainThe Payments Value Chain
The Payments Value ChainPYMNTS.com
 
Porter's five force analysis on computer industry
Porter's five force analysis on computer industryPorter's five force analysis on computer industry
Porter's five force analysis on computer industryRajath Menon
 
Overview Of Oil & Gas Accounting
Overview Of Oil & Gas AccountingOverview Of Oil & Gas Accounting
Overview Of Oil & Gas Accountinghori
 
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...eme2525
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity IssuesPhilip Corsano
 
Elements of a fairytale
Elements of a fairytaleElements of a fairytale
Elements of a fairytaleamandakuhl
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPTUpender Upr
 
Project Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsProject Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsMaria Erland, PMP
 
Allergy and Hypersensitivity
Allergy and HypersensitivityAllergy and Hypersensitivity
Allergy and HypersensitivityMedicineAndHealth
 
Organization development and change
Organization development and changeOrganization development and change
Organization development and changesomanishalaka
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computingHaslina
 
Compensation management
Compensation managementCompensation management
Compensation management805984
 

En vedette (20)

Pack iTS case study analysis
Pack iTS case study analysisPack iTS case study analysis
Pack iTS case study analysis
 
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architecture
 
Os security issues
Os security issuesOs security issues
Os security issues
 
Starbucks organisational culture
Starbucks organisational cultureStarbucks organisational culture
Starbucks organisational culture
 
Google Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideGoogle Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative Guide
 
Common terminologies of obstetrics
Common terminologies of obstetricsCommon terminologies of obstetrics
Common terminologies of obstetrics
 
The Payments Value Chain
The Payments Value ChainThe Payments Value Chain
The Payments Value Chain
 
Porter's five force analysis on computer industry
Porter's five force analysis on computer industryPorter's five force analysis on computer industry
Porter's five force analysis on computer industry
 
ADVANTAGES OF CAVITY WALLS
ADVANTAGES OF CAVITY WALLSADVANTAGES OF CAVITY WALLS
ADVANTAGES OF CAVITY WALLS
 
Overview Of Oil & Gas Accounting
Overview Of Oil & Gas AccountingOverview Of Oil & Gas Accounting
Overview Of Oil & Gas Accounting
 
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity Issues
 
Elements of a fairytale
Elements of a fairytaleElements of a fairytale
Elements of a fairytale
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 
Project Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsProject Management Office Roles Functions And Benefits
Project Management Office Roles Functions And Benefits
 
Allergy and Hypersensitivity
Allergy and HypersensitivityAllergy and Hypersensitivity
Allergy and Hypersensitivity
 
Organization development and change
Organization development and changeOrganization development and change
Organization development and change
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
Compensation management
Compensation managementCompensation management
Compensation management
 

Similaire à Operational Software Design

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012slandelle
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdfdino715195
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Living system or build factory - Chris Maxwell
Living system or build factory  - Chris MaxwellLiving system or build factory  - Chris Maxwell
Living system or build factory - Chris MaxwellDevopsdays
 
Gatling - JUGL, 2012-09-13
Gatling  - JUGL, 2012-09-13Gatling  - JUGL, 2012-09-13
Gatling - JUGL, 2012-09-13Nicolas Rémond
 
Ways to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliveryWays to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliverya32an
 
Easy recovery621 user guide en
Easy recovery621 user guide enEasy recovery621 user guide en
Easy recovery621 user guide enjmav1502
 
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourComment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourCARA_Lyon
 
Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010David Gageot
 
Growing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesGrowing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesPhilip Bauer
 
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check itChristian Earl Magpayo
 
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015Archersan
 
BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.Jakub Kałużny
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharingJames Hsieh
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...Sveta Smirnova
 

Similaire à Operational Software Design (20)

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012
 
Netgear router error codes
Netgear router error codesNetgear router error codes
Netgear router error codes
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Living system or build factory - Chris Maxwell
Living system or build factory  - Chris MaxwellLiving system or build factory  - Chris Maxwell
Living system or build factory - Chris Maxwell
 
Gatling - JUGL, 2012-09-13
Gatling  - JUGL, 2012-09-13Gatling  - JUGL, 2012-09-13
Gatling - JUGL, 2012-09-13
 
Ways to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliveryWays to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous delivery
 
Easy recovery621 user guide en
Easy recovery621 user guide enEasy recovery621 user guide en
Easy recovery621 user guide en
 
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourComment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
 
Silos are for farmers
Silos are for farmersSilos are for farmers
Silos are for farmers
 
Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010
 
Growing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesGrowing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaises
 
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
 
NCET Tech
NCET Tech NCET Tech
NCET Tech
 
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
 
BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.
 
DAC
DACDAC
DAC
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 

Plus de Theo Schlossnagle

Plus de Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Operational Software Design

  • 1. Theo Schlossnagle, Founder @Circonus, @postwait on Twitter Operational Software Ignorance Buys Pain https://www.flickr.com/photos/hoyvinmayvin/4906678960
  • 2. Tenet #1 Designing for the unknown should never burden today Compromise when forced. Design with tomorrow’s expected volume, but today’s functional requirements. https://www.flickr.com/photos/alexander/20090860
  • 3. Tenet #2 Never be left curious as to what your software is doing Observability is the only fast-path to success. https://www.flickr.com/photos/jox1989/4764186425
  • 4. Tenet #3 Understand what your software was doing This is one of the trickiest tenets. You can’t log every instruction performed, every packet sent & received, every context switch, every I/O. Log things that are infrequent, measure things that are frequent. What’s frequent? I said this was tricky. https://www.flickr.com/photos/guest_family/6175062186
  • 5. Tenet #4 Understand internal failures The only thing worse than a malfunction in production, is one without sufficient actionable data. Core dump, stack trace analysis, etc. These can wake people up. https://www.flickr.com/photos/skipthefiller/2451481428
  • 6. Tenet #5 Operator remediation of an external failure is a bug A system failing, a disk failing, a network partition, etc. Once the external failure is remediated, no human being should be involved in returning the (internal) system to correct operation. https://www.flickr.com/photos/timzim/2308099322
  • 7. Tenet #6 & #7 Tight coupling
 reduces resiliency Loose coupling
 reduces debug-ability
  • 8. Tenet #8 Avoid difficult problems
 if possible One does not “just add” replication, consensus and fault tolerance into systems. Never solve a harder problem than you are presented. The best solution to a problem is to remove the problem. https://www.flickr.com/photos/shakestercody/2124972276
  • 9. A tail of two systems Queueing Storage https://www.flickr.com/photos/skynoir/8300610952
  • 10. From RabbitMQ to Fq RabbitMQ: Oh, the outages we’ve had. violates #1, #2, #3, #5, #8 Fq: (#1) build only semantics we need (#2/#3) DTrace probes (#4) cores dumps and backtrace.io (#5) handled by simplicity of design and use (#6/#7) push decoupling out, gain debug-ability (#8) punt clustering downstream to clients https://www.flickr.com/photos/fotologic/2165675515
  • 11. ns level deep routing
  • 12. From Postgres to Snowth Postgres 8… hacked in a column store using arrays Tenet #3: build pg_statsd Tenet #5: … build Snowth Snowth: ring topology time-series store (#1/#8) only commutative operations, (#2) DTrace probes (#3) Circonus itself (statsd, histograms, etc.) (#4) backtrace.io (#7) implemented Zipkin (Dapper-like) tracing
  • 13.
  • 14.