K8s-zero-downtime-the-missing-part

•Download as PPTX, PDF•

0 likes•24 views

Agenda: 1. Deployment & Replicas: are we really safe ? 2. Understand Pod Eviction Lifecycle 3. Avoid Outages 4. Beyond the Outages

Engineering

AGENDA
1. Deployment & Replicas: are we really safe ?
2. Understand Pod Eviction Lifecycle
3. Avoid Outages
4. Beyond the Outages

1. Deployment & Replicas: really safe ?
We have:
● Replicas : 2
● RollingUpdate Strategy
● maxUnavailable: 1
* Everything seems quite strong to avoid downtime
* What happens if one pod disappear ?
* How about existing & upcoming traffic ?

1. Deployment & Replicas: really safe ?
Downtime will occur IF:
- Existing traffic does not being handled properly
- Application does not handle graceful shutdown
- ……….

2. Understand Pod Eviction Lifecycle
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time stopping app)

Add preStop hook to graceful
shutdown nginx
🡪 Make sure app finish handling
existing connections before quit
2. Understand Pod Eviction Lifecycle

2. Understand Pod Eviction Lifecycle
- Drain “node 1”
- Sent SIGTERM to nginx pod
- preStop hook is executed
(nginx quit)

2. Understand Pod Eviction Lifecycle
+ New request is coming
+ Being routed to stopping Nginx
+ Error….

2. Understand Pod Eviction Lifecycle
- Why does this sh*t happens ?
- Why does stupid K8S still routing traffic to a “terminating” pod ?
- said CT Engineer -

3. Avoid the Outages
Recall pod shutdown sequence
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time to stop app)
……………………….
RIGHT, but NOT ENOUGH

3. Avoid the Outages
Figure 1: Sequences occur when pod is deleted

3. Avoid the Outages
Figure 2: Timeline “version” for pod deletion’s events
- Two flows run in parellel
- No guarantee [A] finish after [B]

3. Avoid the Outages
● don’t work, just SLEEP
● … & wait for deregister flow (B) to complete
before graceful shutdown

4. Beyond the Outages
- Introducing: PodDisruptionBudgets
- An indicator of the number of disruptions that
can be tolerated at a given time for a class of
pods (a budget of faults).
- If number of pod < PodDisruptionBudgets, the
drain operation is halted
(wait for new pod come up & increase above the
threshold)

Summary
Application:
- Handed SIGTERM for graceful shutdown
System:
- Apply preStop lifecycle
- Apply Sleep to make sure pod Endpoint is deregistered from Service to
avoid new coming traffic
- Using PodDisruptionBudgets to avoid all pods down at the same time

Appendix: Service Disruption
Involuntary disruptions Voluntary disruptions
HW failure,
node disappear from cluster
deployment upgrade,
delete pod
node upgrade
node drain

What's hot

Git flow IntroductionDavid Paluy

Git Series. Episode 2. Merge, Upstream Commands and TagsMikhail Melnik

Git Flow - An IntroductionKnoldus Inc.

Continious integration pipelineGomathiNayagam S

Git presentationSai Kumar Satapathy

Git and git flowFran García

Git flowSuraj Aair

Kubernetes-native or not? When should you ditch your traditional CI/CD server...Red Hat Developers

Git tutorial git branches 20131206-BryanLearningTech

Kubernetes configuration and security policies with KubeLinter | DevNation Te...Red Hat Developers

Git TricksIvelina Dimova

Top 10 Kubernetes Native Java Quarkus Featuresjclingan

Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech TalkRed Hat Developers

Neutron upgradesVictor Morales

Why Aren't You Using Git Flow?John Congdon

Git workflows presentationMack Hardy

Git tutorialNuttapon Pattanavijit

git-flow R3LabsRaül Pérez

GitLab 라이선스별 특징 요약 - 인포그랩InfoGrab LC

Crossing the Streams! Rollout Strategies to Keep Your Users Happy!VMware Tanzu

What's hot (20)

Git flow Introduction

Git Series. Episode 2. Merge, Upstream Commands and Tags

Git Flow - An Introduction

Continious integration pipeline

Git presentation

Git and git flow

Git flow

Kubernetes-native or not? When should you ditch your traditional CI/CD server...

Git tutorial git branches 20131206-Bryan

Kubernetes configuration and security policies with KubeLinter | DevNation Te...

Git Tricks

Top 10 Kubernetes Native Java Quarkus Features

Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk

Neutron upgrades

Why Aren't You Using Git Flow?

Git workflows presentation

Git tutorial

git-flow R3Labs

GitLab 라이선스별 특징 요약 - 인포그랩

Crossing the Streams! Rollout Strategies to Keep Your Users Happy!

Similar to K8s-zero-downtime-the-missing-part

Production Grade Kubernetes ApplicationsNarayanan Krishnamurthy

Upgrade Kubernetes the boring wayOleksandr Slynko

F33 book-depend-pres-pt6NAVEENKUMARR18EC016

Why I love Kubernetes Failure Stories and you should too - GOTO BerlinHenning Jacobs

Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Henning Jacobs

Rac 12c optimizationRiyaj Shamsudeen

Monitoring klassisch oder CloudConSol Consulting & Solutions Software GmbH

Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesVMware Tanzu

Scheduling in AndroidOpersys inc.

Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs

Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone

2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...Ambassador Labs

Introduce to Git and JenkinsAn Nguyen

TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdfxiso

GitOps and security by Reto Bollinger, CSIO nine.chnine

Awe k2 midterms finalsKaren Tay

Cloud orchestration risksGlib Pakharenko

Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...Anne Nicolas

CloudStack UIShapeBlue

Bitworks CloudStack UI - CSEUUG 08 August 2017Ivan Kudryavtsev

Similar to K8s-zero-downtime-the-missing-part (20)

Production Grade Kubernetes Applications

Upgrade Kubernetes the boring way

F33 book-depend-pres-pt6

Why I love Kubernetes Failure Stories and you should too - GOTO Berlin

Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...

Rac 12c optimization

Monitoring klassisch oder Cloud

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices

Scheduling in Android

Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...

Velocity 2012 - Learning WebOps the Hard Way

2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...

Introduce to Git and Jenkins

TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf

GitOps and security by Reto Bollinger, CSIO nine.ch

Awe k2 midterms finals

Cloud orchestration risks

Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...

CloudStack UI

Bitworks CloudStack UI - CSEUUG 08 August 2017

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X79953056974 Low Rate Call Girls In Saket, Delhi NCR

Block diagram reduction techniques in control systems.pptNANDHAKUMARA10

Thermal Engineering -unit - III & IV.pptDineshKumar4165

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture

DeepFakes presentation : brief idea of DeepFakesMayuraD1

+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...Health

Computer Networks Basics of Network DevicesChandrakantDivate1

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)

Design For Accessibility: Getting it right from the startQuintin Balsdon

2016EF22_0 solar project report rooftop projectssmsksolar

Bridge Jacking Design Sample Calculation.pptxnuruddin69

Online electricity billing project report..pdfKamal Acharya

Learn the concepts of Thermodynamics on Magic MarksMagic Marks

Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Kandungan 087776558899

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2

Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7

Block diagram reduction techniques in control systems.ppt

Thermal Engineering -unit - III & IV.ppt

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

DeepFakes presentation : brief idea of DeepFakes

+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...

Computer Networks Basics of Network Devices

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads

Design For Accessibility: Getting it right from the start

2016EF22_0 solar project report rooftop projects

Bridge Jacking Design Sample Calculation.pptx

Online electricity billing project report..pdf

Learn the concepts of Thermodynamics on Magic Marks

Standard vs Custom Battery Packs - Decoding the Power Play

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...

Computer Lecture 01.pptxIntroduction to Computers

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...

K8s-zero-downtime-the-missing-part

1. ZERO-DOWNTIME DEPLOYMENT on K8S the missing part Bảo Huỳnh Site Reliability Engineering 12-Jun-2020

2. AGENDA 1. Deployment & Replicas: are we really safe ? 2. Understand Pod Eviction Lifecycle 3. Avoid Outages 4. Beyond the Outages

3. 1. Deployment & Replicas: really safe ? We have: ● Replicas : 2 ● RollingUpdate Strategy ● maxUnavailable: 1 * Everything seems quite strong to avoid downtime * What happens if one pod disappear ? * How about existing & upcoming traffic ?

4. 1. Deployment & Replicas: really safe ? Downtime will occur IF: - Existing traffic does not being handled properly - Application does not handle graceful shutdown - ……….

5. 2. Understand Pod Eviction Lifecycle ● kubectl delete / drain / upgrade ● A request 🡪 nodes where pod is located ● kubelet sends SIGTERM to pods ● kubelet sends SIGKILL after graceful period (preStop + time stopping app)

6. Add preStop hook to graceful shutdown nginx 🡪 Make sure app finish handling existing connections before quit 2. Understand Pod Eviction Lifecycle

7. 2. Understand Pod Eviction Lifecycle - Drain “node 1” - Sent SIGTERM to nginx pod - preStop hook is executed (nginx quit)

8. 2. Understand Pod Eviction Lifecycle + New request is coming + Being routed to stopping Nginx + Error….

9. 2. Understand Pod Eviction Lifecycle

10. 2. Understand Pod Eviction Lifecycle - Why does this sh*t happens ? - Why does stupid K8S still routing traffic to a “terminating” pod ? - said CT Engineer -

11. 3. Avoid the Outages Recall pod shutdown sequence ● kubectl delete / drain / upgrade ● A request 🡪 nodes where pod is located ● kubelet sends SIGTERM to pods ● kubelet sends SIGKILL after graceful period (preStop + time to stop app) ………………………. RIGHT, but NOT ENOUGH

12. 3. Avoid the Outages Figure 1: Sequences occur when pod is deleted

13. 3. Avoid the Outages Figure 2: Timeline “version” for pod deletion’s events - Two flows run in parellel - No guarantee [A] finish after [B]

14. 3. Avoid the Outages

15. 3. Avoid the Outages BUT HOW ???

16. 3. Avoid the Outages ● don’t work, just SLEEP ● … & wait for deregister flow (B) to complete before graceful shutdown

17. 3. Avoid the Outages

18. 3. Avoid the Outages

19. 3. Avoid the Outages

20. 4. Beyond the Outages - Introducing: PodDisruptionBudgets - An indicator of the number of disruptions that can be tolerated at a given time for a class of pods (a budget of faults). - If number of pod < PodDisruptionBudgets, the drain operation is halted (wait for new pod come up & increase above the threshold)

21. 4. Beyond the Outages

22. Summary Application: - Handed SIGTERM for graceful shutdown System: - Apply preStop lifecycle - Apply Sleep to make sure pod Endpoint is deregistered from Service to avoid new coming traffic - Using PodDisruptionBudgets to avoid all pods down at the same time

23. Questions & Answers

24. Appendix: Service Disruption Involuntary disruptions Voluntary disruptions HW failure, node disappear from cluster deployment upgrade, delete pod node upgrade node drain

K8s-zero-downtime-the-missing-part

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to K8s-zero-downtime-the-missing-part

Similar to K8s-zero-downtime-the-missing-part (20)

More from Huynh Thai Bao

More from Huynh Thai Bao (7)

Recently uploaded

Recently uploaded (20)

K8s-zero-downtime-the-missing-part