SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Evolution of kube-proxy
Laurent Bernaille
@lbernail
Datadog
Monitoring service
Over 350 integrations
Over 1,200 employees
Over 8,000 customers
Runs on millions of hosts
Trillions of data points per day
10000s hosts in our infra
10s of Kubernetes clusters
Clusters from 50 to 3000 nodes
Multi-cloud
Very fast growth
What is kube-proxy?
● Network proxy running on each Kubernetes node
● Implements the Kubernetes service abstraction
○ Map service Cluster IPs (virtual IPs) to pods
○ Enable ingress traffic
Kube-proxy
Deployment and service
apiVersion: apps/v1
kind: Deployment
metadata:
name: echodeploy
labels:
app: echo
spec:
replicas: 3
selector:
matchLabels:
app: echo
template:
metadata:
labels:
app: echo
spec:
containers:
- name: echopod
image: lbernail/echo:0.5
apiVersion: v1
kind: Service
metadata:
name: echo
labels:
app: echo
spec:
type: ClusterIP
selector:
app: echo
ports:
- name: http
protocol: TCP
port: 80
targetPort: 5000
Created resources
Deployment ReplicaSet Pod 1
label: app=echo
Pod 2
label: app=echo
Pod 3
label: app=echo
Service
Selector: app=echo
kubectl get all
NAME AGE
deploy/echodeploy 16s
NAME AGE
rs/echodeploy-75dddcf5f6 16s
NAME READY
po/echodeploy-75dddcf5f6-jtjts 1/1
po/echodeploy-75dddcf5f6-r7nmk 1/1
po/echodeploy-75dddcf5f6-zvqhv 1/1
NAME TYPE CLUSTER-IP
svc/echo ClusterIP 10.200.246.139
The endpoint object
Deployment ReplicaSet Pod 1
label: app=echo
Pod 2
label: app=echo
Pod 3
label: app=echo
kubectl describe endpoints echo
Name: echo
Namespace: datadog
Labels: app=echo
Annotations: <none>
Subsets:
Addresses: 10.150.4.10,10.150.6.16,10.150.7.10
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 5000 TCP
Endpoints
Addresses:
10.150.4.10
10.150.6.16
10.150.7.10
Service
Selector: app=echo
Readiness
readinessProbe:
httpGet:
path: /ready
port: 5000
periodSeconds: 2
successThreshold: 2
failureThreshold: 2
● A pod can be started but no ready to serve requests
○ Initialization
○ Connection to backends
● Kubernetes provides an abstraction for this: Readiness Probes
Accessing the service
kubectl run -it test --image appropriate/curl ash
# while true ; do curl 10.200.246.139 ; sleep 1 ; done
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Endpoint only consist of pods that are Ready
How does it work?
Pod statuses
API Server
Node
kubelet pod
HC
Status updates
Node
kubelet pod
HC
ETCD
pods
Endpoints
API Server
Node
kubelet pod
HC
Status updates
Controller Manager
Watch
- pods
- services
endpoint
controller
Node
kubelet pod
HC
Sync endpoints:
- list pods matching selector
- add IP to endpoints
ETCD
pods
services
endpoints
Accessing services
API Server
Node A
Controller Manager
Watch
- pods
- services
endpoint
controller
Sync endpoints:
- list pods matching selector
- add IP to endpoints
ETCD
pods
services
endpoints
kube-proxy proxier
Watch
- services
- endpoints
Accessing services
API Server
Node A
Controller Manager
endpoint
controller
ETCD
pods
services
endpoints
kube-proxy proxier
client Node Bpod 1
Node Cpod 2
Implementations
userspace
proxy-mode=userspace
API Server
Node A
kube-proxy
iptables client
Node B
Node C
pod 1
pod 2
kube-proxy binds 1 port per service
Client traffic redirected to kube-proxy
kube-proxy connects to pods
proxy-mode=userspace
PREROUTING
any / any =>
KUBE-PORTALS-CONTAINER
All traffic is processed by kube chains
proxy-mode=userspace
PREROUTING
any / any =>
KUBE-PORTALS-CONTAINER
All traffic is processed by kube chains
KUBE-PORTALS-CONTAINER
any / VIP:PORT => NodeIP:PortX
Portal chain
One rule per service
One port allocated for each service (and bound by kube-proxy)
● Performance (userland proxying)
● Source IP cannot be kept
● Not recommended anymore
● Since Kubernetes 1.2, default is iptables
Limitations
iptables
proxy-mode=iptables
API Server
Node A
kube-proxy iptables
client
Node B
Node C
pod 1
pod 2
Outgoing traffic
1. Client to Service IP
2. DNAT: Client to Pod1 IP
Reverse path
1. Pod1 IP to Client
2. Reverse NAT: Service IP to client
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
All traffic is processed by kube chains
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
Global Service chain
Identify service and jump to appropriate service chain
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
KUBE-SVC-XXX
any / any proba 33% => KUBE-SEP-AAA
any / any proba 50% => KUBE-SEP-BBB
any / any => KUBE-SEP-CCC
Service chain (one per service)
Use statistic iptables module (probability of rule being applied)
Rules are evaluated sequentially (hence the 33%, 50%, 100%)
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
KUBE-SVC-XXX
any / any proba 33% => KUBE-SEP-AAA
any / any proba 50% => KUBE-SEP-BBB
any / any => KUBE-SEP-CCC
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
Endpoint Chain
Mark hairpin traffic (client = target) for SNAT
DNAT to the endpoint
Hairpin traffic?
API Server
Node A
kube-proxy iptables
pod 1
Node B
Node C
pod 2
pod 3
Client can also be a destination
After DNAT:
Src IP= Pod1, Dst IP= Pod1
No reverse NAT possible
=> SNAT on host for this traffic
1. Pod1 IP => SVC IP
2. SNAT: HostIP => SVC IP
3. DNAT: HostIP => Pod1 IP
Reverse path
1. Pod1 IP => Host IP
2. Reverse NAT: SVC IP => Pod1IP
Persistency
spec:
type: ClusterIP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 600
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
recent : set rsource KUBE-SEP-AAA
Use “recent” module
Add Source IP to set named KUBE-SEP-AAA
Persistency
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
recent : set rsource KUBE-SEP-AAA
Use “recent” module
Add Source IP to set named KUBE-SEP-AAA
KUBE-SVC-XXX
any / any recent: rcheck KUBE-SEP-AAA => KUBE-SEP-AAA
any / any recent: rcheck KUBE-SEP-BBB => KUBE-SEP-BBB
any / any recent: rcheck KUBE-SEP-CCC => KUBE-SEP-CCC
Load-balancing rules
Use recent module
If Source IP is in set named KUBE-SEP-AAA,
jump to KUBE-SEP-AAA
● iptables was not designed for load-balancing
● hard to debug (very quickly, 10000s of rules)
● performance impact
○ control plane: syncing rules
○ data plane: going through rules
Limitations
“Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole
Kubecon Berlin 2017
“Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole
Kubecon Berlin 2017
ipvs
● L4 load-balancer build in the Linux Kernel
● Many load-balancing algorithms
● Native persistency
● Fast atomic update (add/remove endpoint)
● Still relies on iptables for some use cases (SNAT)
● GA since 1.11
proxy-mode=ipvs
proxy-mode=ipvs
Pod X
Pod Y
Pod Z
Service ClusterIP:Port S
Backed by pod:port X, Y, Z
Virtual Server S
Realservers
- X
- Y
- Z
kube-proxy apiserver
Watches endpoints
Updates IPVS
New connection
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- X
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
App establishes connection to S
IPVS associates Realserver X
Pod X deleted
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Apiserver removes X from S endpoints
Kube-proxy removes X from realservers
Kernel drops traffic (no realserver)
Clean-up
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
net/ipv4/vs/expire_nodest_conn
Delete conntrack entry on next packet
Forcefully terminate (RST)
RST
Limit?
● No graceful termination
● As soon as a pod is Terminating connections are destroyed
● Addressing this took time
Graceful termination (1.12)
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- X Weight:0
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Apiserver removes X from S endpoints
Kube-proxy sets Weight to 0
No new connection
Garbage collection
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Pod exits and sends FIN
Kube-proxy removes realserver when it
has no active connection
FIN
What if X doesn’t send FIN?
Pod Y
Pod Z
Virtual Server S
Realservers
- X Weight:0
- Y
- Z
kube-proxy
IPVS conntrack
App A => S >> A => X
Conntrack entries expires after 900s
If A sends traffic, it never expires
Traffic blackholed until App detects issue
~15mn
> Mitigation: lower tcp_retries2
● Default timeouts
○ tcp / tcpfin : 900s / 120s
○ udp: 300s
● Under high load, ports can be reused fast and keep entry active
○ Especially with conn_reuse_mode=0
● Very bad for DNS => No graceful termination for UDP
IPVS connection tracking
● Very recent feature: allow for configurable timeouts
● Ideally we’d want:
○ Terminating pod: set weight to 0
○ Deleted pod: remove backend
> This would require an API change for Endpoints
IPVS connection tracking
● Works pretty well at large scale
● Not 100% feature parity (a few edge cases)
● Tuning the IPVS parameters took time
● Improvements under discussion
IPVS status
Common challenges
Control plane scalability
● On each update to a service (new pod, readiness change)
○ The full endpoint object is recomputed
○ The endpoint is sent to all kube-proxy
● Example of a single endpoint change
○ Service with 2000 endpoints, ~100B / endpoint, 5000 nodes
○ Each node gets 2000 x 100B = 200kB
○ Apiservers send 5000 x 200kB = 1GB
● Rolling update?
○ Total traffic => 2000 x 1GB = 2TB
=> Endpoint Slices
● Service potentially mapped to multiple EndpointSlices
● Maximum 100 endpoints in each slice
● Much more efficient for services with many endpoints
● Beta in 1.17
Conntrack size
● kube-proxy relies on the conntrack
○ Not great for services receiving a lot of connections
○ Pretty bad for the cluster DNS
● DNS strategy
○ Use node-local cache and force TCP for upstream
○ Much more manageable
Conntrack and UDP
● Very good surprise with Kernel 5.0+
Recent features
New features
● Dual-stack support
○ Alpha in 1.16
● Topology aware routing
○ Use topology keys to route to services
○ Alpha in 1.17
Conclusion
Conclusion
● kube-proxy works at significant scale
● A lot of ongoing efforts to improve kube-proxy
● iptables and IPVS are not perfect matches for the service abstraction
○ Not designed for client-side load-balancing
● Very promising eBPF alternative: Cilium
○ No need for iptables / IPVS
○ Works without conntrack
○ See Daniel Borkmann’s talk
Thank you

Contenu connexe

Tendances

Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - RuianHanLing Shen
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSDocker, Inc.
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきことmao999
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic ControlSUSE Labs Taipei
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabTaeung Song
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsoViller Hsiao
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 

Tendances (20)

eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVS
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 

Similaire à Evolution of kube-proxy (Brussels, Fosdem 2020)

Deep dive in container service discovery
Deep dive in container service discoveryDeep dive in container service discovery
Deep dive in container service discoveryDocker, Inc.
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxthaond2
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in KubernetesDaniel Smith
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesPaul Czarkowski
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDocker, Inc.
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021WDDay
 
Nynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxNynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxDanielHertzberg4
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeAcademy
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsSjuul Janssen
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Laurent Bernaille
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovSergey Fedorov
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWSGrant Ellis
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWSGrant Ellis
 
Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)DongHyeon Kim
 
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSKNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSElad Hirsch
 
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeAcademy
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesWojciech Barczyński
 

Similaire à Evolution of kube-proxy (Brussels, Fosdem 2020) (20)

Deep dive in container service discovery
Deep dive in container service discoveryDeep dive in container service discovery
Deep dive in container service discovery
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptx
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container Networking
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorial
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
 
Nynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxNynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptx
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basics
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)
 
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSKNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
 
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with Kubernetes
 

Plus de Laurent Bernaille

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceLaurent Bernaille
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesLaurent Bernaille
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logsLaurent Bernaille
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networkingLaurent Bernaille
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksLaurent Bernaille
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksLaurent Bernaille
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesLaurent Bernaille
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksLaurent Bernaille
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applicationsLaurent Bernaille
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Laurent Bernaille
 

Plus de Laurent Bernaille (16)

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My Namespace
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logs
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networking
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay Networks
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay Networks
 
Discovering OpenBSD on AWS
Discovering OpenBSD on AWSDiscovering OpenBSD on AWS
Discovering OpenBSD on AWS
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay Networks
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applications
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Evolution of kube-proxy (Brussels, Fosdem 2020)

  • 1. Evolution of kube-proxy Laurent Bernaille @lbernail
  • 2. Datadog Monitoring service Over 350 integrations Over 1,200 employees Over 8,000 customers Runs on millions of hosts Trillions of data points per day 10000s hosts in our infra 10s of Kubernetes clusters Clusters from 50 to 3000 nodes Multi-cloud Very fast growth
  • 4. ● Network proxy running on each Kubernetes node ● Implements the Kubernetes service abstraction ○ Map service Cluster IPs (virtual IPs) to pods ○ Enable ingress traffic Kube-proxy
  • 5. Deployment and service apiVersion: apps/v1 kind: Deployment metadata: name: echodeploy labels: app: echo spec: replicas: 3 selector: matchLabels: app: echo template: metadata: labels: app: echo spec: containers: - name: echopod image: lbernail/echo:0.5 apiVersion: v1 kind: Service metadata: name: echo labels: app: echo spec: type: ClusterIP selector: app: echo ports: - name: http protocol: TCP port: 80 targetPort: 5000
  • 6. Created resources Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo Service Selector: app=echo kubectl get all NAME AGE deploy/echodeploy 16s NAME AGE rs/echodeploy-75dddcf5f6 16s NAME READY po/echodeploy-75dddcf5f6-jtjts 1/1 po/echodeploy-75dddcf5f6-r7nmk 1/1 po/echodeploy-75dddcf5f6-zvqhv 1/1 NAME TYPE CLUSTER-IP svc/echo ClusterIP 10.200.246.139
  • 7. The endpoint object Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo kubectl describe endpoints echo Name: echo Namespace: datadog Labels: app=echo Annotations: <none> Subsets: Addresses: 10.150.4.10,10.150.6.16,10.150.7.10 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- http 5000 TCP Endpoints Addresses: 10.150.4.10 10.150.6.16 10.150.7.10 Service Selector: app=echo
  • 8. Readiness readinessProbe: httpGet: path: /ready port: 5000 periodSeconds: 2 successThreshold: 2 failureThreshold: 2 ● A pod can be started but no ready to serve requests ○ Initialization ○ Connection to backends ● Kubernetes provides an abstraction for this: Readiness Probes
  • 9. Accessing the service kubectl run -it test --image appropriate/curl ash # while true ; do curl 10.200.246.139 ; sleep 1 ; done Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Endpoint only consist of pods that are Ready
  • 10. How does it work?
  • 11. Pod statuses API Server Node kubelet pod HC Status updates Node kubelet pod HC ETCD pods
  • 12. Endpoints API Server Node kubelet pod HC Status updates Controller Manager Watch - pods - services endpoint controller Node kubelet pod HC Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints
  • 13. Accessing services API Server Node A Controller Manager Watch - pods - services endpoint controller Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints kube-proxy proxier Watch - services - endpoints
  • 14. Accessing services API Server Node A Controller Manager endpoint controller ETCD pods services endpoints kube-proxy proxier client Node Bpod 1 Node Cpod 2
  • 17. proxy-mode=userspace API Server Node A kube-proxy iptables client Node B Node C pod 1 pod 2 kube-proxy binds 1 port per service Client traffic redirected to kube-proxy kube-proxy connects to pods
  • 18. proxy-mode=userspace PREROUTING any / any => KUBE-PORTALS-CONTAINER All traffic is processed by kube chains
  • 19. proxy-mode=userspace PREROUTING any / any => KUBE-PORTALS-CONTAINER All traffic is processed by kube chains KUBE-PORTALS-CONTAINER any / VIP:PORT => NodeIP:PortX Portal chain One rule per service One port allocated for each service (and bound by kube-proxy)
  • 20. ● Performance (userland proxying) ● Source IP cannot be kept ● Not recommended anymore ● Since Kubernetes 1.2, default is iptables Limitations
  • 22. proxy-mode=iptables API Server Node A kube-proxy iptables client Node B Node C pod 1 pod 2 Outgoing traffic 1. Client to Service IP 2. DNAT: Client to Pod1 IP Reverse path 1. Pod1 IP to Client 2. Reverse NAT: Service IP to client
  • 23. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES All traffic is processed by kube chains
  • 24. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX Global Service chain Identify service and jump to appropriate service chain
  • 25. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC Service chain (one per service) Use statistic iptables module (probability of rule being applied) Rules are evaluated sequentially (hence the 33%, 50%, 100%)
  • 26. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port Endpoint Chain Mark hairpin traffic (client = target) for SNAT DNAT to the endpoint
  • 27. Hairpin traffic? API Server Node A kube-proxy iptables pod 1 Node B Node C pod 2 pod 3 Client can also be a destination After DNAT: Src IP= Pod1, Dst IP= Pod1 No reverse NAT possible => SNAT on host for this traffic 1. Pod1 IP => SVC IP 2. SNAT: HostIP => SVC IP 3. DNAT: HostIP => Pod1 IP Reverse path 1. Pod1 IP => Host IP 2. Reverse NAT: SVC IP => Pod1IP
  • 28. Persistency spec: type: ClusterIP sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 600 KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use “recent” module Add Source IP to set named KUBE-SEP-AAA
  • 29. Persistency KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use “recent” module Add Source IP to set named KUBE-SEP-AAA KUBE-SVC-XXX any / any recent: rcheck KUBE-SEP-AAA => KUBE-SEP-AAA any / any recent: rcheck KUBE-SEP-BBB => KUBE-SEP-BBB any / any recent: rcheck KUBE-SEP-CCC => KUBE-SEP-CCC Load-balancing rules Use recent module If Source IP is in set named KUBE-SEP-AAA, jump to KUBE-SEP-AAA
  • 30. ● iptables was not designed for load-balancing ● hard to debug (very quickly, 10000s of rules) ● performance impact ○ control plane: syncing rules ○ data plane: going through rules Limitations
  • 31. “Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole Kubecon Berlin 2017
  • 32. “Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole Kubecon Berlin 2017
  • 33. ipvs
  • 34. ● L4 load-balancer build in the Linux Kernel ● Many load-balancing algorithms ● Native persistency ● Fast atomic update (add/remove endpoint) ● Still relies on iptables for some use cases (SNAT) ● GA since 1.11 proxy-mode=ipvs
  • 35. proxy-mode=ipvs Pod X Pod Y Pod Z Service ClusterIP:Port S Backed by pod:port X, Y, Z Virtual Server S Realservers - X - Y - Z kube-proxy apiserver Watches endpoints Updates IPVS
  • 36. New connection Pod X Pod Y Pod Z Virtual Server S Realservers - X - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X App establishes connection to S IPVS associates Realserver X
  • 37. Pod X deleted Pod X Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Apiserver removes X from S endpoints Kube-proxy removes X from realservers Kernel drops traffic (no realserver)
  • 38. Clean-up Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X net/ipv4/vs/expire_nodest_conn Delete conntrack entry on next packet Forcefully terminate (RST) RST
  • 39. Limit? ● No graceful termination ● As soon as a pod is Terminating connections are destroyed ● Addressing this took time
  • 40. Graceful termination (1.12) Pod X Pod Y Pod Z Virtual Server S Realservers - X Weight:0 - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Apiserver removes X from S endpoints Kube-proxy sets Weight to 0 No new connection
  • 41. Garbage collection Pod X Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Pod exits and sends FIN Kube-proxy removes realserver when it has no active connection FIN
  • 42. What if X doesn’t send FIN? Pod Y Pod Z Virtual Server S Realservers - X Weight:0 - Y - Z kube-proxy IPVS conntrack App A => S >> A => X Conntrack entries expires after 900s If A sends traffic, it never expires Traffic blackholed until App detects issue ~15mn > Mitigation: lower tcp_retries2
  • 43. ● Default timeouts ○ tcp / tcpfin : 900s / 120s ○ udp: 300s ● Under high load, ports can be reused fast and keep entry active ○ Especially with conn_reuse_mode=0 ● Very bad for DNS => No graceful termination for UDP IPVS connection tracking
  • 44. ● Very recent feature: allow for configurable timeouts ● Ideally we’d want: ○ Terminating pod: set weight to 0 ○ Deleted pod: remove backend > This would require an API change for Endpoints IPVS connection tracking
  • 45. ● Works pretty well at large scale ● Not 100% feature parity (a few edge cases) ● Tuning the IPVS parameters took time ● Improvements under discussion IPVS status
  • 47. Control plane scalability ● On each update to a service (new pod, readiness change) ○ The full endpoint object is recomputed ○ The endpoint is sent to all kube-proxy ● Example of a single endpoint change ○ Service with 2000 endpoints, ~100B / endpoint, 5000 nodes ○ Each node gets 2000 x 100B = 200kB ○ Apiservers send 5000 x 200kB = 1GB ● Rolling update? ○ Total traffic => 2000 x 1GB = 2TB
  • 48. => Endpoint Slices ● Service potentially mapped to multiple EndpointSlices ● Maximum 100 endpoints in each slice ● Much more efficient for services with many endpoints ● Beta in 1.17
  • 49. Conntrack size ● kube-proxy relies on the conntrack ○ Not great for services receiving a lot of connections ○ Pretty bad for the cluster DNS ● DNS strategy ○ Use node-local cache and force TCP for upstream ○ Much more manageable
  • 50. Conntrack and UDP ● Very good surprise with Kernel 5.0+
  • 52. New features ● Dual-stack support ○ Alpha in 1.16 ● Topology aware routing ○ Use topology keys to route to services ○ Alpha in 1.17
  • 54. Conclusion ● kube-proxy works at significant scale ● A lot of ongoing efforts to improve kube-proxy ● iptables and IPVS are not perfect matches for the service abstraction ○ Not designed for client-side load-balancing ● Very promising eBPF alternative: Cilium ○ No need for iptables / IPVS ○ Works without conntrack ○ See Daniel Borkmann’s talk