The document discusses the Erlang programming language and its focus on fault tolerance. It summarizes the "Big Six" principles of Erlang - concurrency, fault detection, fault identification, error encapsulation, code upgrade, and stable storage. It explains how Erlang implements processes that can crash and restart independently, with supervision to restart failed processes when needed. The principles of loose coupling and monitoring are also discussed in the context of building fault-tolerant systems and organizations.
This is a story about unexpectedness.The only constant is change
We’ve all been confronted by this
Some of us have been confronted by this
And you always get asked this
There is only so much planning you can do. At some point, the 1000 year flood hits
The point being – shit happens, how are you going to deal with it?
Do more than one thing at a time
Know when something breaks
Know what broke
Don’t let it spread
Fix it
‘save game’ so you can do-over
Butwe know this.
Lets take this one at a time
The Six Essential Characteristics of a Fault Tolerant System
Do more than one thing at a time
Do more than one thing at a time
Do more than one thing at a time
Do more than one thing at a time
Do more than one thing at a time
The Six Essential Characteristics of a Fault Tolerant System
Only one success case here
Fail fast and fail stop
Only one success case here
The Six Essential Characteristics of a Fault Tolerant System
The process tells everyone what happenedStack traces are your friend.Its Not Java!
Only one success case here
The Six Essential Characteristics of a Fault Tolerant System
Let it Crash
Do more than one thing at a time
Do more than one thing at a time
Do more than one thing at a time
Do more than one thing at a time
The Six Essential Characteristics of a Fault Tolerant System
Go get catfood
Let it Crash
The Six Essential Characteristics of a Fault Tolerant System
The back up bag
Processes are units of error encapsulation(and good for GC too!)
Fault Tolerance and SystemsThink of this FORMALLY
The bottom line – can your organization deal with the above issues?
The Six Essential Characteristics of a Fault Tolerant System
The Six Essential Characteristics of a Fault Tolerant System
The Six Essential Characteristics of a Fault Tolerant System
Loose Coupling, of course, gives us all these benefits
Loose Coupling, of course, gives us all these benefits
Builds trust Trust in the stupidity of people, trust that things will fail, trust that you will be affected
Loose Coupling, of course, gives us all these benefits
The amount of brainpower we have is limited.Reduce complexity by being able to focus on specific / limited areas
Loose Coupling, of course, gives us all these benefits
Andrew!
Isn’t Performance an issue w/ Loose Coupling?
remember the bit about failure? well, why optimize if you're going to fail anyhow? yeah yeah, you might fail because you don't perform, but that is rarely the problem
Work / Elegant / Fast;yes, that mine craft plugin you built might gt a million signupsit won’tseriously – it doesn't register statistically
Fault detection and identification?
Monitoring!
DashboardsOtherwise, how do you know whats going on?
Out of band access Don’t rely on the system to always tell you whats happening
Be PolyglotEverything fails – even erlang. (noooo)
Loose Coupling, of course, gives us all these benefits
Loose Coupling, of course, gives us all these benefits
Don’t cross the streams!
Fault Tolerance and OrganizationsThink of this FORMALLY
People fall ill
AWS will go down
CFOs run off to brazil
Tail Risk (Things that can never happen)This deserves its own section(financial crisis)
The bottom line – can your organization deal with the above issues?
The Six Essential Characteristics of a Fault Tolerant Organization
Easy right?Things can happen in parallel?
you'd be surprised at how poorly this gets done micromanagement (supervision gone bad) chaining (good for AR/AP, bad for decision making) e.g. – memos that get passed around for approval
“Bob is ill” Ahrens Fox!
"Bob was working on the financial projections. He has the flu, so won't be able to get to it for another week. And that means that we won't know what to buy for two weeks…”
(Ahrens Fox) "Bob was working on the financial projections. He has the flu, so won't be able to get to it for another week. And that means that we won't know what to buy for two weeks…”
The Six Essential Characteristics of a Fault Tolerant Organization
If bob is ill, will you survive
The Six Essential Characteristics of a Fault Tolerant Organization
Onboarding people is importantThis is not what it should feel like for Bob’s replacement
The Six Essential Characteristics of a Fault Tolerant Organization
Documentation, policies, corporate knowledge, corporate culture methods, access codes, Try to have work/knowledge distributed so that if/when you need to let someone go, things can continue till you get someone new!)
Ask yourself this. Over and over again…
Loose Coupling, of course, gives us all these benefits