Lambda gives you multi-AZ out-of-the-box, but still, things can go wrong in production. There are region-wide outages, and performance degradation in services your function depends on can cause it to time out or error. And what if you're dealing with downstream systems that just aren't as scalable and can't handle the load you put on them? The bottom line is many things can go wrong and they often do at the worst of times. The goal of building resilient systems is not to prevent failures, but to build systems that can withstand these failures. In this talk, we will look at a number of practices and architectural patterns that can help you build more resilient serverless applications. Such as multi-region, active-active, employing DLQs and surge queues. We'll also see how we can use chaos experiments to help us identify failure modes before they manifest in production.
56. @theburningmonk theburningmonk.com
The Saga pattern
Begin transaction
Start book hotel request
End book hotel request
Start book
fl
ight request
End book
fl
ight request
Start book car rental request
End book car rental request
End transaction
96. @theburningmonk theburningmonk.com
circuit breaker pattern
When circuit is open, fail fast
but, allow 1 request through every Y mins
If request succeeds, close the circuit
After X consecutive timeouts, trip the circuit
117. @theburningmonk theburningmonk.com
“the discipline of experimenting on a system in order to build con
fi
dence in the
system’s capability to withstand turbulent conditions in production”
principlesofchaos.org
119. by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
120. @theburningmonk theburningmonk.com
chaos monkey kills an
EC2 instance
latency monkey induces
arti
fi
cial delay in APIs
chaos gorilla kills an AWS
Availability Zone
chaos kong kills an entire
AWS region