SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
11.12.2018
Hardening Kubernetes
Setups: War Stories from
the Trenches of Production
Puja Abbassi - @puja108
- Developer Advocate /
Product Owner @ Giant
Swarm
- #CKA #Security
#Operators
- Data & Network
Science “Almost-PhD”
Puja
@puja108
@puja108 2
customer
product community
1. On running 100+
clusters
2. Postmortems - Lots of
them!
3. Hardening and
Best-Practices
Agenda
@puja108 3
On running 100+
clusters
@puja108 4
- Different Clouds
- Different Regions
- On-Premise
- China
100+ clusters
@puja108 5
- Companies
- Industries
- Users
- Use Cases
Diversity
@puja108 6
- Opt for Freedom
- Educate Users
- Harden up
Freedom vs.
Control
@puja108 7
Postmortems - Lots of them!
@puja108 8
“The primary goals of writing a postmortem are to
ensure that the incident is documented, that all
contributing root cause(s) are well understood, and,
especially, that effective preventive actions are put
in place to reduce the likelihood and/or impact of
recurrence.”
- Google SRE book
Postmortem Philosophy
@puja108 9
1. Gather Issues
2. Fix in Code
3. Roll out continuously
4. Profit 😉
Single Product
@puja108 10
- Issue Template
- High Priority
- Assigned to
x-functional team
Postmortem
Practice
@puja108 11
Load Balancing
Postmortems
@puja108
Team 1
PM Team 2
Team 3
12
500+ Postmortems
@puja108 13
War Stories
@puja108 14
@puja108
Kubernetes
upstream issue:
#57992
(fixed in 1.11.4
and 1.12.0)
15
- Faulty ingress
objects can break
controller
- Lots of teams + lots
of freedom
= lots of issues
Ingress
Controller
Misconfiguration
@puja108 16
@puja108
Ever built a
full-mesh IPIP
tunnels ICMP
pinger?
17
@puja108
Customer Load
Test goes bad?
You take the
blame!
- “Must be Calico,
kube-proxy, IC!”
- Turns out EC2 network
saturation was the
bottleneck
- Solution: More
workers!
18
Hardening and Best-Practices
@puja108 19
- Old versions
- Ingress (~15%)
- Networking & DNS
- Resource Pressure
- Multi-tenancy
Postmortem
Hotspots
@puja108 20
- Issues might have
been solved already
- CVEs
- Test Upgrades
extensively
- Automate Upgrades (or
have a process)
Old versions
@puja108 21
- NGINX IC: Newer
versions are less
prone to
misconfiguration
- Separate controllers
- Load- and
failover-testing
- Last resort:
SVC of type LB
Ingress
@puja108 22
- Monitor network
health
- Monitor DNS latency
- Check for known
issues
- Apply best practices
Networking & DNS
@puja108 23
- Resource Management!
- Include Buffers (lots
of them)
- Protect K8s and
critical addons
(priority)
Resource
Pressure
@puja108 24
Multi-tenancy
@puja108 25
- Separate and isolate
namespaces with RBAC
- No cluster-admins!
- Separate clusters if
possible
- Automate with CI/CD
- Minimize manual ops
Best Practices
@puja108 26
- Preemptive Monitoring &
Alerting are key!
- Logging (and Tracing)
help debugging
- Fix issues fast
- Educate users
- Have a postmortem process
- Train Recovery
Stand on the Shoulders of
Giants!
@puja108 27
- Kubernetes the very hard way - Datadog
- Scaling Kubernetes to 2,500 Nodes - OpenAI
- 5 - 15s DNS lookups on Kubernetes? - BitMEX
- Scaling CoreDNS in Kubernetes Clusters
- Inside Kubernetes Resource Management (QoS) -
Michael Gasch
- List of Kubernetes Best Practice talks/blogs
- Kubernetes Office Hours
Questions?
Stay in touch
- Twitter: @puja108
- Github: puja108
- Slack/Discuss: puja
@puja108 28
Thank you!

Contenu connexe

Similaire à Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm

Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
cnvrg.io AI OS - Hands-on ML Workshops
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regression
Susant Sahani
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
corehard_by
 

Similaire à Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm (20)

Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
 
Desplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev DayDesplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev Day
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regression
 
Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
 
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Unleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformUnleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platform
 
The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStore
 
Kubernetes the deltatre way the basics - introduction to containers and orc...
Kubernetes the deltatre way   the basics - introduction to containers and orc...Kubernetes the deltatre way   the basics - introduction to containers and orc...
Kubernetes the deltatre way the basics - introduction to containers and orc...
 
sun solaris
sun solarissun solaris
sun solaris
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm