2. AGENDA
1. Deployment & Replicas: are we really safe ?
2. Understand Pod Eviction Lifecycle
3. Avoid Outages
4. Beyond the Outages
3. 1. Deployment & Replicas: really safe ?
We have:
● Replicas : 2
● RollingUpdate Strategy
● maxUnavailable: 1
* Everything seems quite strong to avoid downtime
* What happens if one pod disappear ?
* How about existing & upcoming traffic ?
4. 1. Deployment & Replicas: really safe ?
Downtime will occur IF:
- Existing traffic does not being handled properly
- Application does not handle graceful shutdown
- ……….
5. 2. Understand Pod Eviction Lifecycle
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time stopping app)
6. Add preStop hook to graceful
shutdown nginx
🡪 Make sure app finish handling
existing connections before quit
2. Understand Pod Eviction Lifecycle
7. 2. Understand Pod Eviction Lifecycle
- Drain “node 1”
- Sent SIGTERM to nginx pod
- preStop hook is executed
(nginx quit)
8. 2. Understand Pod Eviction Lifecycle
+ New request is coming
+ Being routed to stopping Nginx
+ Error….
10. 2. Understand Pod Eviction Lifecycle
- Why does this sh*t happens ?
- Why does stupid K8S still routing traffic to a “terminating” pod ?
- said CT Engineer -
11. 3. Avoid the Outages
Recall pod shutdown sequence
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time to stop app)
……………………….
RIGHT, but NOT ENOUGH
12. 3. Avoid the Outages
Figure 1: Sequences occur when pod is deleted
13. 3. Avoid the Outages
Figure 2: Timeline “version” for pod deletion’s events
- Two flows run in parellel
- No guarantee [A] finish after [B]
20. 4. Beyond the Outages
- Introducing: PodDisruptionBudgets
- An indicator of the number of disruptions that
can be tolerated at a given time for a class of
pods (a budget of faults).
- If number of pod < PodDisruptionBudgets, the
drain operation is halted
(wait for new pod come up & increase above the
threshold)
22. Summary
Application:
- Handed SIGTERM for graceful shutdown
System:
- Apply preStop lifecycle
- Apply Sleep to make sure pod Endpoint is deregistered from Service to
avoid new coming traffic
- Using PodDisruptionBudgets to avoid all pods down at the same time