SlideShare une entreprise Scribd logo
1  sur  93
Télécharger pour lire hors ligne
Kubernetes
Failure Stories
CONTAINER DAYS
2019-06-26
HENNING JACOBS
@try_except_
2
ZALANDO AT A GLANCE
~ 5.4billion EUR
revenue 2018
> 250
million
visits
per
month
> 15.000
employees in
Europe
> 79%
of visits via
mobile devices
> 26
million
active customers
> 300.000
product choices
~ 2.000
brands
17
countries
3
SCALE
130Clusters
396Accounts
4
DEVELOPERS USING KUBERNETES
5
47+ cluster
components
INCIDENTS ARE FINE
INCIDENT
#0
8
INCIDENT #0: IMPACT
9
INCIDENT #0: CONTRIBUTING FACTORS
● Pods couldn’t get AWS IAM credentials, timing out and failing
● kube2iam could not get the Pod’s IP address
● kubelet was delayed in updating Pod statuses for multiple minutes
● Default kubelet configuration has a low rate limit for calls to the API
server (--kube-api-qps)
● Due to rescaling, only one node was available for builder Pods
● Rapid creation and deletion of Pods caused kubelet to fall behind
10
INCIDENT #0: FIX
github.com/zalando-incubator/kubernetes-on-aws/pull/2247
INCIDENT
#1
12
INCIDENT #1: CUSTOMER IMPACT
13
INCIDENT #1: CUSTOMER IMPACT
14
INCIDENT #1: INGRESS ERRORS
15
INCIDENT #1: AWS ALB 502
github.com/zalando/riptide
16
INCIDENT #1: AWS ALB 502
github.com/zalando/riptide
502 Bad Gateway
Server: awselb/2.0
...
17
INCIDENT #1: ALB HEALTHY HOST COUNT
3 healthy hosts
zero healthy hosts
2xx requests
18
LIFE OF A REQUEST (INGRESS)
Node Node
MyApp MyApp MyApp
EC2 network
K8s network
TLS
HTTP
Skipper Skipper
ALB
19
INCIDENT #1: SKIPPER MEMORY USAGE
Memory Limit
Memory Usage
20
INCIDENT #1: SKIPPER OOM
Node Node
MyApp MyApp MyApp
TLS
HTTP
Skipper Skipper
ALB
OOMKill
21
INCIDENT #1: CONTRIBUTING FACTORS
• Shared Ingress (per cluster)
• High latency of unrelated app (Solr)
caused high number of in-flight requests
• Skipper creates goroutine per HTTP request.
Goroutine costs 2kB memory + http.Request
• Memory limit was fixed at 500Mi (4x regular usage)
Fix for the memory issue in Skipper:
https://opensource.zalando.com/skipper/operation/operation/#scheduler
INCIDENT
#2
23
INCIDENT #2: CUSTOMER IMPACT
24
INCIDENT #1: IAM RETURNING 404
25
INCIDENT #1: NUMBER OF PODS
26
LIFE OF A REQUEST (INGRESS)
Node Node
MyApp MyApp MyApp
EC2 network
K8s network
TLS
HTTP
Skipper Skipper
ALB
27
ROUTES FROM API SERVER
Node Node
MyApp MyApp MyApp
Skipper
ALBAPI Server
Skipper
28
API SERVER DOWN
Node Node
MyApp MyApp MyApp
Skipper
ALBAPI Server
Skipper
OOMKill
29
INCIDENT #2: INNOCENT MANIFEST
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
name: "foobar"
spec:
schedule: "*/15 9-19 * * Mon-Fri"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
containers:
...
30
INCIDENT #2: FIXED CRON JOB
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
name: "foobar"
spec:
schedule: "7 8-18 * * Mon-Fri"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
activeDeadlineSeconds: 120
template:
spec:
restartPolicy: Never
containers:
31
INCIDENT #2: LESSONS LEARNED
• Fix Ingress to stay “healthy” during API server problems
• Fix Ingress to retain last known set of routes
• Use quota for number of pods
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
pods: "1500"
NOTE: we dropped quotas recently
github.com/zalando-incubator/kubernetes-
on-aws/pull/2059
INCIDENT
#3
33
INCIDENT #3: INGRESS ERRORS
34
INCIDENT #3: COREDNS OOMKILL
coredns invoked oom-killer:
gfp_mask=0x14000c0(GFP_KERNEL),
nodemask=(null), order=0, oom_score_adj=994
Memory cgroup out of memory: Kill process 6428
(coredns) score 2050 or sacrifice child
oom_reaper: reaped process 6428 (coredns),
now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
restarts
35
STOP THE BLEEDING: INCREASE MEMORY LIMIT
4Gi
2Gi
200Mi
36
SPIKE IN HTTP REQUESTS
37
SPIKE IN DNS QUERIES
38
INCREASE IN MEMORY USAGE
39
INCIDENT #3: CONTRIBUTING FACTORS
• HTTP retries
• No DNS caching
• Kubernetes ndots:5 problem
• Short maximum lifetime of HTTP connections
• Fixed memory limit for CoreDNS
• Monitoring affected by DNS outage
github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md
INCIDENT
#4
41
INCIDENT #4: CLUSTER DOWN
42
INCIDENT #4: MANUAL OPERATION
% etcdctl del -r /registry-kube-1/certificatesigningrequest prefix
43
INCIDENT #4: RTFM
% etcdctl del -r /registry-kube-1/certificatesigningrequest prefix
help: etcdctl del [options] <key> [range_end]
44
Junior Engineers are Features, not Bugs
https://www.youtube.com/watch?v=cQta4G3ge44
https://www.outcome-eng.com/human-error-never-root-cause/
46
INCIDENT #4: LESSONS LEARNED
• Disaster Recovery Plan?
• Backup etcd to S3
• Monitor the snapshots
INCIDENT
#5
48
INCIDENT #5: API LATENCY SPIKES
49
INCIDENT #5: CONNECTION ISSUES
...
Kubernetes worker and master nodes sporadically fail to connect to etcd
causing timeouts in the APIserver and disconnects in the pod network.
...
Master Node
API Server
etcd
etcd-member
50
INCIDENT #5: STOP THE BLEEDING
#!/bin/bash
while true; do
echo "sleep for 60 seconds"
sleep 60
timeout 5 curl http://localhost:8080/api/v1/nodes > /dev/null
if [ $? -eq 0 ]; then
echo "all fine, no need to restart etcd member"
continue
else
echo "restarting etcd-member"
systemctl restart etcd-member
fi
done
51
INCIDENT #5: CONFIRMATION FROM AWS
[...]
We can’t go into the details [...] that resulted the networking problems during
the “non-intrusive maintenance”, as it relates to internal workings of EC2.
We can confirm this only affected the T2 instance types, ...
[...]
We don’t explicitly recommend against running production services on T2
[...]
52
INCIDENT #5: LESSONS LEARNED
• It's never the AWS infrastructure until it is
• Treat t2 instances with care
• Kubernetes components are not necessarily "cloud native"
Cloud Native? Declarative, dynamic, resilient, and scalable
INCIDENT
#6
54
INCIDENT #6: IMPACT
Ingress
5XXs
55
INCIDENT #6: CLUSTER DOWN?
56
INCIDENT #6: THE TRIGGER
https://www.outcome-eng.com/human-error-never-root-cause/
58
CLUSTER UPGRADE
FLOW
59
CLUSTER LIFECYCLE MANAGER (CLM)
github.com/zalando-incubator/cluster-lifecycle-manager
60
CLUSTER CHANNELS
github.com/zalando-incubator/kubernetes-on-aws
Channel Description Clusters
dev Development and playground clusters. 3
alpha Main infrastructure clusters (important to us). 2
beta
Product clusters for the rest of the
organization (non-prod). 57+
stable
Product clusters for the rest of the
organization (prod). 57+
61
E2E TESTS ON EVERY PR
github.com/zalando-incubator/kubernetes-on-aws
62
RUNNING E2E TESTS (BEFORE)
Control plane
nodenode
branch: dev
Create Cluster Run e2e tests Delete Cluster
Testing dev to alpha upgrade
Control plane Control plane
63
RUNNING E2E TESTS (NOW)
Control plane
nodenode
Control plane
nodenode
branch: alpha (base) branch: dev (head)
Create Cluster Update Cluster Run e2e tests Delete Cluster
Testing dev to alpha upgrade
Control plane Control plane
64
INCIDENT #6: LESSONS LEARNED
• Automated e2e tests are pretty good, but not enough
• Test the diff/migration automatically
• Bootstrap new cluster with previous configuration
• Apply new configuration
• Run end-to-end & conformance tests
github.com/zalando-incubator/kubernetes-on-aws/tree/dev/test/e2e
INCIDENT
#7
66
⇒ all containers
on this node down
#7: KERNEL OOM KILLER
67
INCIDENT #7: KUBELET MEMORY
68
UPSTREAM ISSUE REPORTED
https://github.com/kubernetes/kubernetes/issues/73587
69
INCIDENT #7: THE PATCH
https://github.com/kubernetes/kubernetes/issues/73587
INCIDENT
#8
71
INCIDENT #8: IMPACT
Error during Pod creation:
MountVolume.SetUp failed for volume
"outfit-delivery-api-credentials" :
secrets "outfit-delivery-api-credentials" not found
⇒ All new Kubernetes deployments fail
72
INCIDENT #8: CREDENTIALS QUEUE
17:30:07 | [pool-6-thread-1 ] | Current queue size: 7115, current number of active workers: 20
17:31:07 | [pool-6-thread-1 ] | Current queue size: 7505, current number of active workers: 20
17:32:07 | [pool-6-thread-1 ] | Current queue size: 7886, current number of active workers: 20
..
17:37:07 | [pool-6-thread-1 ] | Current queue size: 9686, current number of active workers: 20
..
17:44:07 | [pool-6-thread-1 ] | Current queue size: 11976, current number of active workers: 20
..
19:16:07 | [pool-6-thread-1 ] | Current queue size: 58381, current number of active workers: 20
73
INCIDENT #8: CPU THROTTLING
74
INCIDENT #8: WHAT HAPPENED
Scaled down IAM provider
to reduce Slack
+ Number of deployments increased
⇒ Process could not process credentials fast enough
75
CPU/memory requests "block" resources on nodes.
Difference between actual usage and requests → Slack
SLACK
CPU
Memory
Node
"Slack"
76
DISABLING CPU THROTTLING
[Announcement] CPU limits will be disabled
⇒ Ingress Latency Improvements
kubelet … --cpu-cfs-quota=false
77
DISABLING CPU THROTTLING
78
A MILLION WAYS TO CRASH YOUR CLUSTER?
• Switch to latest Docker to fix issues with Docker daemon freezing
• Redesign of DNS setup due to high DNS latencies (5s),
switch from kube-dns to node-local dnsmasq+CoreDNS
• Disabling CPU throttling (CFS quota) to avoid latency issues
• Quick fix for timeouts using etcd-proxy: client-go still seems to have
issues with timeouts
• 502's during cluster updates: race condition during network setup
79
MORE TOPICS
• Graceful Pod shutdown and
race conditions (endpoints, Ingress)
• Incompatible Kubernetes changes
• CoreOS ContainerLinux "stable" won't boot
• Kubernetes EBS volume handling
• Docker
80
RACE CONDITIONS..
• Switch to the latest Docker version available to fix the issues with Docker daemon freezing
• Redesign of DNS setup due to high DNS latencies (5s), switch from kube-dns to CoreDNS
• Disabling CPU throttling (CFS quota) to avoid latency issues
• Quick fix for timeouts using etcd-proxy, since client-go still seems to have issues with timeouts
• 502's during cluster updates: race condition
•
github.com/zalando-incubator/kubernetes-on-aws
81
TIMEOUTS TO API SERVER..
github.com/zalando-incubator/kubernetes-on-aws
82
MANAGED
KUBERNETES?
83
WILL MANAGED K8S SAVE US?
GKE: monthly uptime percentage at 99.95% for regional clusters
84
WILL MANAGED K8S SAVE US?
NO(not really)
e.g. AWS EKS uptime SLA is only for API server
85
PRODUCTION PROOFING AWS EKS
List of things you might
want to look at for EKS
in production
https://medium.com/glia-tech/productionproofing-e
ks-ed52951ffd6c
86
AWS EKS IN PRODUCTION
https://kubedex.com/90-days-of-aws-eks-in-production/
87
DOCKER.. (ON GKE)
https://github.com/kubernetes/kubernetes/blob/8fd414537b5143ab0
39cb910590237cabf4af783/cluster/gce/gci/health-monitor.sh#L29
WELCOME TO
CLOUD NATIVE!
89
90
KUBERNETES FAILURE STORIES
A compiled list of links to public failure stories related to Kubernetes.
k8s.af
We need more failure talks!
Istio? Anyone?
91
DISCLAIMER
92
OPEN SOURCE
Kubernetes on AWS
github.com/zalando-incubator/kubernetes-on-aws
Skipper HTTP Router & Ingress controller
github.com/zalando/skipper
External DNS
github.com/kubernetes-incubator/external-dns
Postgres Operator
github.com/zalando-incubator/postgres-operator
Kubernetes Resource Report
github.com/hjacobs/kube-resource-report
Kubernetes Downscaler
github.com/hjacobs/kube-downscaler
Kubernetes Operator Pythonic Framework (Kopf)
github.com/zalando-incubator/kopf
QUESTIONS?
HENNING JACOBS
HEAD OF
DEVELOPER PRODUCTIVITY
henning@zalando.de
@try_except_
Illustrations by @01k

Contenu connexe

Tendances

애자일 S/W 개발
애자일 S/W 개발애자일 S/W 개발
애자일 S/W 개발
영기 김
 

Tendances (20)

Protecting Apps from Hacks in Kubernetes with NGINX
Protecting Apps from Hacks in Kubernetes with NGINXProtecting Apps from Hacks in Kubernetes with NGINX
Protecting Apps from Hacks in Kubernetes with NGINX
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019
 
Ansible ではじめる ネットワーク自動化(Ansible 2.9版)
Ansible ではじめる ネットワーク自動化(Ansible 2.9版)Ansible ではじめる ネットワーク自動化(Ansible 2.9版)
Ansible ではじめる ネットワーク自動化(Ansible 2.9版)
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
CODT2020 OpenStack Version Up and VMHA Masakari in Enterprise Cloud
CODT2020 OpenStack Version Up and VMHA Masakari in Enterprise CloudCODT2020 OpenStack Version Up and VMHA Masakari in Enterprise Cloud
CODT2020 OpenStack Version Up and VMHA Masakari in Enterprise Cloud
 
ansible why ?
ansible why ?ansible why ?
ansible why ?
 
Ansible module development 101
Ansible module development 101Ansible module development 101
Ansible module development 101
 
Tcp ip & io model
Tcp ip & io modelTcp ip & io model
Tcp ip & io model
 
Cluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards KubernetesCluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards Kubernetes
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方
 
Ndc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCNdc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABC
 
애자일 S/W 개발
애자일 S/W 개발애자일 S/W 개발
애자일 S/W 개발
 
CloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and TroubleshootingCloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and Troubleshooting
 
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for EveryoneAlexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
 
Schedulers and Timers in Akka
Schedulers and Timers in AkkaSchedulers and Timers in Akka
Schedulers and Timers in Akka
 
카카오스토리 웹팀의 코드리뷰 경험
카카오스토리 웹팀의 코드리뷰 경험카카오스토리 웹팀의 코드리뷰 경험
카카오스토리 웹팀의 코드리뷰 경험
 

Similaire à Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU 2019

Similaire à Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU 2019 (20)

Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinWhy I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
 
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DXDeveloper Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
 
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
 
12.07.2017 Docker Meetup - KUBERNETES ON AWS @ ZALANDO TECH
12.07.2017 Docker Meetup - KUBERNETES ON AWS @ ZALANDO TECH12.07.2017 Docker Meetup - KUBERNETES ON AWS @ ZALANDO TECH
12.07.2017 Docker Meetup - KUBERNETES ON AWS @ ZALANDO TECH
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
 
A hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stackA hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stack
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit Kubernetes
 
How do we use Kubernetes
How do we use KubernetesHow do we use Kubernetes
How do we use Kubernetes
 
Why use Gitlab
Why use GitlabWhy use Gitlab
Why use Gitlab
 
Why and How to Run Your Own Gitlab Runners as Your Company Grows
Why and How to Run Your Own Gitlab Runners as Your Company GrowsWhy and How to Run Your Own Gitlab Runners as Your Company Grows
Why and How to Run Your Own Gitlab Runners as Your Company Grows
 
Walls Within Walls: What if your attacker knows parkour?
Walls Within Walls: What if your attacker knows parkour?Walls Within Walls: What if your attacker knows parkour?
Walls Within Walls: What if your attacker knows parkour?
 
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander KukushkinPGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
 
Successful K8S Platforms in Airgapped Environments
Successful K8S Platforms in Airgapped EnvironmentsSuccessful K8S Platforms in Airgapped Environments
Successful K8S Platforms in Airgapped Environments
 
Let's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupLet's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg Meetup
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native Prague
 

Plus de Henning Jacobs

Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Henning Jacobs
 

Plus de Henning Jacobs (20)

Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
 
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018
 
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
 
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRWKubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
 
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes MeetupFrom AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
 
Kubernetes on AWS @Zalando - Berlin AWS User Group 2017-05-09
Kubernetes on AWS @Zalando - Berlin AWS User Group 2017-05-09Kubernetes on AWS @Zalando - Berlin AWS User Group 2017-05-09
Kubernetes on AWS @Zalando - Berlin AWS User Group 2017-05-09
 
Kubernetes at Zalando - CNCF End User Committee Presentation
Kubernetes at Zalando - CNCF End User Committee PresentationKubernetes at Zalando - CNCF End User Committee Presentation
Kubernetes at Zalando - CNCF End User Committee Presentation
 
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion PlatformKubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion Platform
 
Plan B: Service to Service Authentication with OAuth
Plan B: Service to Service Authentication with OAuthPlan B: Service to Service Authentication with OAuth
Plan B: Service to Service Authentication with OAuth
 
Docker Berlin Meetup Nov 2015: Zalando Intro
Docker Berlin Meetup Nov 2015: Zalando IntroDocker Berlin Meetup Nov 2015: Zalando Intro
Docker Berlin Meetup Nov 2015: Zalando Intro
 
STUPS @ AWS Enterprise Web Day Oktober 2015
STUPS @ AWS Enterprise Web Day Oktober 2015STUPS @ AWS Enterprise Web Day Oktober 2015
STUPS @ AWS Enterprise Web Day Oktober 2015
 
Python at Zalando Technology @ Python Users Berlin Meetup September 2015
Python at Zalando Technology @ Python Users Berlin Meetup September 2015Python at Zalando Technology @ Python Users Berlin Meetup September 2015
Python at Zalando Technology @ Python Users Berlin Meetup September 2015
 
STUPS by Zalando @WHD.local Frankfurt: STUPS.io - an Open Source Cloud Framew...
STUPS by Zalando @WHD.local Frankfurt: STUPS.io - an Open Source Cloud Framew...STUPS by Zalando @WHD.local Frankfurt: STUPS.io - an Open Source Cloud Framew...
STUPS by Zalando @WHD.local Frankfurt: STUPS.io - an Open Source Cloud Framew...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU 2019