15. "It's just one of these cases where Mars is going to give us
a new deal, and we're going to have to play the cards we
get, not the ones we want”
Jim Erickson / Project Manager at Nasa for Mars Rovers missions
21. One might rephrase “calculation and
correction of error” as “recognition of
and response to difference”.Jeff Sussna / Designing Delivery: Rethinking IT in the Digital Service Economy
50. What normal looks like? Your steady state
{
"probes": {
"steady": {
"title": "All services must be healthy before we begin",
"layer": "application",
"type": "python",
"module": "chaosk8s.probes",
"func": "all_microservices_healthy"
}
}
}
51. Add sources of information with probes
"probes": {
"close": {
"title": "Fetch the CPU usage for our service",
"layer": "application",
"type": "python",
"module": "chaosprometheus.probes",
"func": "query",
"arguments": {
"query": "process_cpu_seconds_total{job='websvc'}",
"when": "2 minutes ago"
}
}
}
52. Set the condition for change in normality
"action": {
"title": "Let's max out the CPU of a node",
"layer": "application",
"type": "python",
"module": "chaosgremlin.actions",
"func": "attack",
"secrets": "gremlin",
"arguments": {
"command": {
"type": "cpu"
},
"target": {
"type": "Random"
}
}
}
53. Before learning
$ chaos run experiment.json
[2017-10-06 17:37:33 INFO] Running experiment: System is resilient to provider's failures
[2017-10-06 17:37:33 INFO] Observing steady state: All services must be healthy before we begin
[2017-10-06 17:37:33 INFO] Steady State succeeded
[2017-10-06 17:37:33 INFO] Observing steady state: Before we kill it, our microservice should be alive
[2017-10-06 17:37:33 INFO] Steady State succeeded
[2017-10-06 17:37:33 INFO] Observing action: Let's stop our provider
[2017-10-06 17:37:33 INFO] Action succeeded
[2017-10-06 17:37:33 INFO] Observing close state: All services must be healthy before we begin
[2017-10-06 17:37:33 INFO] Close State succeeded
[2017-10-06 17:37:33 INFO] Observing steady state: Consumer should respond as if nothing
[2017-10-06 17:37:44 ERROR] Steady State failed: {"timestamp":1507304264100,"status":500,"error":"Internal
Server Error","exception":"feign.RetryableException","message":"connect timed out executing GET http://my-
provider-service:8080/","path":"/invokeConsumedService"}
[2017-10-06 17:37:44 INFO] Experiment is now complete
54. Respond to the non-functional force of
change
Do not merely correct the error
55. Adaptation
$ chaos run experiment.json
[2017-10-06 17:40:25 INFO] Running experiment: System is resilient to provider's failures
[2017-10-06 17:40:25 INFO] Observing steady state: All services must be healthy before we begin
[2017-10-06 17:40:25 INFO] Steady State succeeded
[2017-10-06 17:40:25 INFO] Observing steady state: Before we kill it, our microservice should be alive
[2017-10-06 17:40:26 INFO] Steady State succeeded
[2017-10-06 17:40:26 INFO] Observing action: Let's stop our provider
[2017-10-06 17:40:26 INFO] Action succeeded
[2017-10-06 17:40:26 INFO] Observing close state: All services must be healthy before we begin
[2017-10-06 17:40:26 INFO] Close State succeeded
[2017-10-06 17:40:26 INFO] Observing steady state: Consumer should respond as if nothing
[2017-10-06 17:40:30 INFO] Steady State succeeded
[2017-10-06 17:40:30 INFO] Experiment is now complete