SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Kubernetes from scratch @Veepee
SUMMARY
1 Study
Kubernetes components
Tools & exploitation
Network, security, runtime, proxy, ...3
Control plane deployment
4Node architecture
observability, isolation, discovery
2
Study
Kubernetes components
Components
● Control plane
○ Storage (etcd)
○ API
○ Scheduler
○ Controller-manager
● Nodes
○ Container runtime
○ Node agent (kubelet)
○ Service proxy
○ Network agent
● Key-value store
● Raft based distributed storage
● Client to Server & Server to Server TLS support
Project page : https://etcd.io/
Incubating at
Components : storage
Components : API server
● Store data in etcd
● Stateless REST API
● HTTP/2 + TLS
● gRPC support:
○ WATCH events over HTTP
○ Reactive event based triggers on Kubernetes components
Components : Scheduler
● Connected to API server only
● Watch for pod objects
● Select node to run on based on criterias:
○ Hardware (CPU available, CPU architecture, memory available, disk space)
○ (Anti-)Affinity patterns
○ Policy constraints (labels)
● 1 master per quorum (token in etcd)
Components : Controller manager
● Core controller:
○ Node status responses
○ Replication: ensure pod number on replication controllers
○ Endpoints: maintains Endpoints object for Services
○ Namespace: create default Service Account & Tokens
● 1 master per quorum (token in etcd)
Node components
● Container runtime: Run containers (Docker, containerd.io…)
● Node agent : connects to API server to handle containers & volumes
● Service proxy : load balances service IPs to pod endpoints
● Network agent : Connects nodes together (flannel, calico, kube-router…)
Control plane
Deployment
● 3 Kubernetes clusters per datacenter:
○ Benchmark
○ Staging
○ Production
● No cross DC cluster: No DC split brain situation to manage
Datacenter deployment
● 3 etcd per datacenter
○ TLSv1.2 enabled
○ Authentication through TLSv1.2 enabled
○ Hardware : 4 CPU 32GB RAM
○ OS : Debian 10.1
○ Version 3.4 enabled :
■ reduced latency
■ high write performance improvements
■ read not affected by commits
■ Will be the default version to K8S 1.17
■ See : https://kubernetes.io/blog/2019/08/30/announcing-etcd-3-4/
Etcd deployment
● API version: 1.15.x (old clusters) and 1.16.x (new clusters)
● 2 API server load balanced by haproxy (TCP mode)
○ Horizontally scalable
○ Vertically scalable
○ Current setup : 4 CPU 32GB RAM
○ OS : Debian 10.1
● Load balance etcd themselves
○ We discovered a bug in k8s < 1.16.3 when using TLS, ensure you have at least this
version
○ Issue: https://github.com/kubernetes/kubernetes/issues/83028
API server deployment
API server deployment
● Enabled/Enforced features (Admission controllers):
○ LimitRanger: Resource limitation validator
○ NodeRestriction: limit kubelet permissions on node/pod objects
○ PodSecurityPolicy: security policies to run pods
○ PodNodeSelector: limit node selection for pods
● See full list of admission controllers here:
○ https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers
● Enabled extra feature: Secret encryption on etcd in AES256
API server deployment
● 3 nodes per DC
○ Each has scheduler
○ Each has controller manager
○ Hardware: 2 CPU 8GB RAM
○ OS: Debian 10.1
Controller-Manager & scheduler deployment
● Enabled features on controller-manager: all defaults plus
○ BootstrapSigner: authenticate kubelets on cluster join
○ TokenCleaner: clean expired tokens
● Supplementary features on scheduler:
○ NodeRestrictions: restrict pods on some nodes
Controller-Manager & scheduler deployment
Control plane global overview
Node architecture
Network, security, runtime, proxy, ...
Node architecture: container runtime
● Valid choice: Docker (https://www.docker.com/)
○ The default one
○ Known by “everyone” in the container world
○ Owned by a company
○ Simple to use
Node architecture: container runtime
● Valid choices: Containerd (https://containerd.io/)
○ Younger than Docker
○ Extracted from Docker
○ CNCF enabled project
○ Some limitations:
■ No docker API v1!
■ K8S integration poorly documented
Node architecture: container runtime
● Veepee choice: Containerd
○ Supported by CNCF and community
○ Used by Docker as underlying container runtime
○ We use artifactory, Docker API v2 is fully supported
○ Less footprint, less code, lower latency for kubelet
Node architecture: system configuration
● Pod DNS configuration
○ clusterDomain: root DNS name for the pods/services
○ clusterDNS: DNS servers configured on pods
■ except if hostNetwork: true and pod DNS policy is default
● Protect system from pods: Ensure node system daemons can run
■ 128Mio memory reserved
■ 0.2 CPU reserved
■ Disk soft & hard limits
● Soft: don’t allow new pods to run if limit reached
● Hard: evict pods if limit reached
Node architecture: service proxy
● Exposes K8S service IP on nodes to access pods
● Multiple ways
○ IPTables
○ IPVS
○ External Load Balancer (example AWS ELB in layer 4 or layer 7)
● Multiple possibilities
○ Kube-proxy (iptables, ipvs)
○ Kube-router (ipvs)
○ Calico
○ ...
Node architecture: service proxy
● Veepee solution choice: kube-proxy
○ Stay close to Kubernetes distribution: don’t add more complexity
○ No default need for layer 7 load balancing (service type: LoadBalancer), can be
added as extra proxy in the future
○ Next challenge: IPTables vs IPVS
Node architecture: kube-proxy mode
● Kube-proxy: iptables mode
○ Default recommended mode (faster)
○ Works quite well… but:
■ Doesn’t integrate with Debian 10 and upper (thanks for Debian
iptables-nftables tool) => restore legacy iptables mode
■ Has locking problems when multiple programs need it
● https://github.com/weaveworks/weave/issues/3351
● https://github.com/kubernetes/kubernetes/issues/82587
● https://github.com/kubernetes/kubernetes/issues/46103
■ We need kube-proxy and Kubernetes Network Policies
■ We should take care of conntrack :(
Node architecture: kube-proxy mode
● Kube-proxy: ipvs mode
○ Works well technically (no locking issue/hacks!)
○ ipvsadm is a very better friend than iptables -t nat
○ ipvs also chosen by some other tools like kube-router
○ calico performance comparison convinced us
(https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/)
Node architecture: kube-proxy mode
● Veepee final choice: kube-proxy + IPVS
Node architecture: network layer
● Interconnects nodes
○ Ensure pod to pod and pod to service communication
○ Can be fully private (our choice) or shared with regular network
● Various ways to achieve it
○ Static routing
○ Dynamic routing (generally BGP)
○ VXLan VPN
○ IPIP VPN
● Multiple ways to allocate node CIDRs
○ Statically (enjoy)
○ Dynamically
Node architecture: network layer
Warning, reading this slide can make your network engineers crazy
● Allocate two CIDRs for your cluster
○ 1 for nodes and pods
○ 1 for service IPs
● Don’t be conservative, give a thousands of IPs to K8S, each node
requires a /24
○ CIDR /14 for nodes (up to 1024 nodes)
○ CIDR /16 for services (service IP randomness party)
Node architecture: network layer
● Needs:
○ Each solution must learn the CIDR of current node through API
○ Network mesh setup should be automagic
● Select the right solution
○ Flannel (default recommended one): VXLan, host-gw
○ Kube-router: IPIP or BGP
○ Calico: IPIP
○ WeaveNet: VXLan
Node architecture: network layer
First test: flannel in VXLan
● Works quite well
● Very easy setup
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
● Yes it’s like curl blah | bash
● No we didn’t installed it like this :)
Node architecture: network layer
First test: flannel in VXLan (https://github.com/coreos/flannel)
● Before a big sale, we load tested an app and… very bad network
performance on nodes
○ Iperf shows that the outside network was good, around 9.8Gbps over 10Gbps
○ Node to pod perf was at maximum too
○ Node to node using regular net is around 9.7Gbps
○ Node to node using VXLan is around 3.2Gbps and kernel load is very high
○ Investigation on the recommended way to run VXLan: offload VXLan to network
cards.
○ It’s not possible in our case we are using Libvirt/KVM VMs, discard VXLan
Node architecture: network layer
Second test: kube-router in BGP mode (https://www.kube-router.io/)
● Drops the need of offloading to network card
● Easy setup too
kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kube-router-all-service-daemonset.yaml
● Don’t forget to read the yaml and ensure you publish on right cluster :)
● As suspected, using BGP restore the full capacity of the bandwidth
● Other interesting features:
○ Service proxy (IPVS)
○ Network Policy support
○ Network LB using BGP
● Our choice:
○ BGP choice is very nice
○ We can extend the BGP to fabric if needed in the future
○ We need network policy isolation for some sensible apps
○ One binary for both network mesh and policies: less maintenance
Node architecture: network layer
Tools & exploitation
DNS, metrology, logging, ...
Kubernetes is not magic: tooling
With previous setup we have:
● API
● Container scheduling
● Network communication
We have some limits:
● No access from outside
● No DNS resolution
● No metrology/alerting
● Volatile logging on nodes
Tooling: DNS resolution
Two methods:
● External, using host resolv.conf: no DNS for inside cluster
communication, we can use DNS for external resources only
● Internal: inside cluster DNS records, enables service discovery
○ We need it, go ahead
Tooling: DNS resolution
Two main solutions:
● Kube-dns: legacy one, should not be used for new cluster
○ dnsmasq C layer, single thread
○ 3 containers for a single daemon ?
● Coredns: modern one
○ Golang multithreaded implementation (goroutine)
○ 1 container only
● Some benchmarks (from coredns team, be careful)
○ https://coredns.io/2018/11/27/cluster-dns-coredns-vs-kube-dns/
Tooling: DNS resolution
● CoreDNS is the more reasonable choice.
● Our deployment
○ Deployed as Kubernetes deployment
○ Runs on master nodes (3 pods)
○ Configured as default DNS service on all Kubelet
Tooling: Access from outside
Ingress: access from outside of the cluster
Various choices on the market:
● Nginx (the default one)
● Traefik
● Envoy
● Kong
● Ambassador
● Haproxy
● And more...
Tooling: Access from outside
We studied five:
● ambassador: promising but very young
(https://www.getambassador.io/)
● nginx: the OSS model on Nginx is unclear since F5 bought Nginx Inc.
(http://nginx.org/)
● haproxy: mature product but ingress is very young and HTTP/2 and
gRPC too (http://www.haproxy.org/)
● kong: built on the top of Nginx it's not for general purposes but can be a
very nice API gateway (https://konghq.com/kong/)
● Traefik: good licensing, mature and updated regularly
(https://traefik.io/)
Tooling: Access from outside
Because of risks on some products, we benched traefik:
● Kubernetes API ready
● HTTP/2 ready
● TLS/1.3 ready (Veepee minimum: TLS/1.2)
● Scalable & reactive configuration deployments
● TLS certificate reconfiguration in less than 10sec
● TCP/UDP raw balancing (traefik v2)
Tooling: Access from outside
Traefik bench:
● Very good performance in lab:
○ Tested using k6 and ab tools
○ Test backend was a raw golang HTTP service
○ HTTP: Up to 10krps with 2 pods on VM with 1CPU and 2GB RAM
○ HTTPS: Up to 6.3krps with 2 pods on VM with 1CPU and 2GB RAM
○ Scaling pods doesn’t increase performance, anyway it’s sufficient
Tooling: Access from outside
Traefik bench:
● Load Testing with a real product:
○ More than 1krps
○ not so recent dotnet.core app
○ Dotnet.core app doesn’t take care about containers and suffers from some
contention
○ Anyway the rate is sufficient for the sale: go ahead to prod
○ On a big event sale we sold ~32k concert tickets in 1h40 without problems
Tooling: Access from outside
Traefik bench:
● Before production sale:
○ We increase nodes from 2 to 3
○ We increase application size from 2 to 10 instances
● Production sale day (starting at 7am):
○ No incident
○ We sold 32k concert places in 1h40
Tooling: metrology/alerting
Need:
● collect metrics on pods to do nice graphs
Solution:
● A solution to rule them all
Tooling: metrology/alerting
Implementation:
● Pods exposes a /metrics endpoint through their HTTP listener
● Prometheus will scrape it
● Writing prometheus scrapping configuration by hand is painful
● Hopefully comes: https://github.com/coreos/kube-prometheus
+ =
Tooling: metrology/alerting
● Kube-prometheus implementation:
○ HA prometheus instances
○ HA alertmanager instances
○ Grafana for local metrics view (not reusable for something else)
○ Gather node metrics
○ ServiceMonitor Kubernetes API extension object
Tooling: metrology/alerting
Pod discovery
Tooling: metrology/alerting
Veepee ecosystem
integration
Tooling: metrology/alerting
Pod resource
overview
Tooling: metrology/alerting
Kube-prometheus
graphes (+ some
custom)
Tooling: logging
How to retrieve logs properly ?
● Logging is volatile on containers
● On docker hosts: just mount a volume from host and write on it
● On K8S: i don’t know where my container runs, i don’t know the host, the
host doesn’t want me to write on it, help me doctor!
Tooling: logging
● You can prevent open heart surgery in production by knowing the rules
Tooling: logging
● Never write logs on disk
○ if you need it, use a sidecar to read it and don’t forget rotation!
● Write on stdout/stderr in a parsable way
○ Json comes to the rescue: known by every devel language, easy to serialize &
implement
● Choose a software to gather container logs and push them:
○ filebeat
○ fluentd
○ fluentbit
○ logstash
Tooling: logging
● Our choice: fluentd
○ CNCF sponsored
(https://www.cncf.io/announcement/2019/04/11/cncf-announces-fluentd-graduati
on/)
○ Some needed features on fluentd are not in fluentbit
○ Already used by many SRE at Veepee
● Our deployment model: K8S Daemonset
○ Rolling upgrade flexibility
○ Ensure logs are gathered on each running node
○ Ensure configuration is same everywhere
Tooling: logging
Fluentd object deployment
Tooling: logging
Fluentd log ingestion
pipeline
Tooling: client/product isolation
Need:
● Ensure a client or product will not steal CPU/Memory/Disk resources of
another
Two work axis:
● Node level isolation
● Pod level isolation
Tooling: client/product isolation
Work axis: node level
● Ensure a client (tribe) or a product own the underlying node
● Billing per customer
● Resources per customer, then SRE team
Solution:
● Use enforced NodeSelector on namespaces
scheduler.alpha.kubernetes.io/node-selector: k8s.veepee.tech/tribe=foundation,k8s.veepee.tech=platform
○ Pod can be at only be scheduled on a node with at minimum those labels
Tooling: client/product isolation
Work axis: pod level
● Ensure pods are not stealing other pod resources
● Ensure scheduling do the right node choice according to available
resources
● Forbid pod allocation if no resource available (no overcommit)
Solution:
● LimitRanges
Tooling: client/product isolation
Applied LimitRanges
<ADD YOUR TITLE HERE/>
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
<ADD YOUR TITLE HERE/>
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Ut enim ad minim
veniam, quis nostrud
exercitation
Questions ?
THANK YOU

Contenu connexe

Tendances

TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower OffloadNetronome
 
Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containersThomas Graf
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...Arthur Berezin
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALinaro
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMdata://disrupted®
 
LAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testingLAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testingLinaro
 
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containersKuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containersAntoni Segura Puimedon
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster.org
 
OpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseAvishay Traeger
 
Writing the Container Network Interface(CNI) plugin in golang
Writing the Container Network Interface(CNI) plugin in golangWriting the Container Network Interface(CNI) plugin in golang
Writing the Container Network Interface(CNI) plugin in golangHungWei Chiu
 
LCE13: Virtualization Forum
LCE13: Virtualization ForumLCE13: Virtualization Forum
LCE13: Virtualization ForumLinaro
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesAdam Hamsik
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW OffloadsNetronome
 
Baker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API ServerBaker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API ServerHan Zhou
 
Containerize ovs ovn components
Containerize ovs ovn componentsContainerize ovs ovn components
Containerize ovs ovn componentsAliasgar Ginwala
 
Bsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessionsBsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessionsScott Tsai
 
Open Source Backends for OpenStack Neutron
Open Source Backends for OpenStack NeutronOpen Source Backends for OpenStack Neutron
Open Source Backends for OpenStack Neutronmestery
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 
OVN Controller Incremental Processing
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental ProcessingHan Zhou
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmapGluster.org
 

Tendances (20)

TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower Offload
 
Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVA
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
LAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testingLAS16-211: Using LAVA V2 for advanced KVM testing
LAS16-211: Using LAVA V2 for advanced KVM testing
 
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containersKuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
Kuryr & Fuxi: OpenStack networking and storage for Docker Swarm containers
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vos
 
OpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana Release
 
Writing the Container Network Interface(CNI) plugin in golang
Writing the Container Network Interface(CNI) plugin in golangWriting the Container Network Interface(CNI) plugin in golang
Writing the Container Network Interface(CNI) plugin in golang
 
LCE13: Virtualization Forum
LCE13: Virtualization ForumLCE13: Virtualization Forum
LCE13: Virtualization Forum
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
 
Baker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API ServerBaker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API Server
 
Containerize ovs ovn components
Containerize ovs ovn componentsContainerize ovs ovn components
Containerize ovs ovn components
 
Bsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessionsBsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessions
 
Open Source Backends for OpenStack Neutron
Open Source Backends for OpenStack NeutronOpen Source Backends for OpenStack Neutron
Open Source Backends for OpenStack Neutron
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
OVN Controller Incremental Processing
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental Processing
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 

Similaire à Kubernetes from scratch at veepee sysadmins days 2019

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetesJuraj Hantak
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18CodeOps Technologies LLP
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingSreenivas Makam
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersDocker, Inc.
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesImesh Gunaratne
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloudDobrica Pavlinušić
 
Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018CloudOps2005
 
Rohit Yadav - The future of the CloudStack Virtual Router
Rohit Yadav - The future of the CloudStack Virtual RouterRohit Yadav - The future of the CloudStack Virtual Router
Rohit Yadav - The future of the CloudStack Virtual RouterShapeBlue
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In ProductionClayton Weise
 
Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit confluent
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
kubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdfkubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdfbchiriamina2
 

Similaire à Kubernetes from scratch at veepee sysadmins days 2019 (20)

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes Networking
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloud
 
Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018
 
Rohit Yadav - The future of the CloudStack Virtual Router
Rohit Yadav - The future of the CloudStack Virtual RouterRohit Yadav - The future of the CloudStack Virtual Router
Rohit Yadav - The future of the CloudStack Virtual Router
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In Production
 
Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
kubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdfkubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdf
 

Dernier

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Kubernetes from scratch at veepee sysadmins days 2019

  • 2. SUMMARY 1 Study Kubernetes components Tools & exploitation Network, security, runtime, proxy, ...3 Control plane deployment 4Node architecture observability, isolation, discovery 2
  • 4. Components ● Control plane ○ Storage (etcd) ○ API ○ Scheduler ○ Controller-manager ● Nodes ○ Container runtime ○ Node agent (kubelet) ○ Service proxy ○ Network agent
  • 5. ● Key-value store ● Raft based distributed storage ● Client to Server & Server to Server TLS support Project page : https://etcd.io/ Incubating at Components : storage
  • 6. Components : API server ● Store data in etcd ● Stateless REST API ● HTTP/2 + TLS ● gRPC support: ○ WATCH events over HTTP ○ Reactive event based triggers on Kubernetes components
  • 7. Components : Scheduler ● Connected to API server only ● Watch for pod objects ● Select node to run on based on criterias: ○ Hardware (CPU available, CPU architecture, memory available, disk space) ○ (Anti-)Affinity patterns ○ Policy constraints (labels) ● 1 master per quorum (token in etcd)
  • 8. Components : Controller manager ● Core controller: ○ Node status responses ○ Replication: ensure pod number on replication controllers ○ Endpoints: maintains Endpoints object for Services ○ Namespace: create default Service Account & Tokens ● 1 master per quorum (token in etcd)
  • 9. Node components ● Container runtime: Run containers (Docker, containerd.io…) ● Node agent : connects to API server to handle containers & volumes ● Service proxy : load balances service IPs to pod endpoints ● Network agent : Connects nodes together (flannel, calico, kube-router…)
  • 11. ● 3 Kubernetes clusters per datacenter: ○ Benchmark ○ Staging ○ Production ● No cross DC cluster: No DC split brain situation to manage Datacenter deployment
  • 12. ● 3 etcd per datacenter ○ TLSv1.2 enabled ○ Authentication through TLSv1.2 enabled ○ Hardware : 4 CPU 32GB RAM ○ OS : Debian 10.1 ○ Version 3.4 enabled : ■ reduced latency ■ high write performance improvements ■ read not affected by commits ■ Will be the default version to K8S 1.17 ■ See : https://kubernetes.io/blog/2019/08/30/announcing-etcd-3-4/ Etcd deployment
  • 13. ● API version: 1.15.x (old clusters) and 1.16.x (new clusters) ● 2 API server load balanced by haproxy (TCP mode) ○ Horizontally scalable ○ Vertically scalable ○ Current setup : 4 CPU 32GB RAM ○ OS : Debian 10.1 ● Load balance etcd themselves ○ We discovered a bug in k8s < 1.16.3 when using TLS, ensure you have at least this version ○ Issue: https://github.com/kubernetes/kubernetes/issues/83028 API server deployment
  • 15. ● Enabled/Enforced features (Admission controllers): ○ LimitRanger: Resource limitation validator ○ NodeRestriction: limit kubelet permissions on node/pod objects ○ PodSecurityPolicy: security policies to run pods ○ PodNodeSelector: limit node selection for pods ● See full list of admission controllers here: ○ https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers ● Enabled extra feature: Secret encryption on etcd in AES256 API server deployment
  • 16. ● 3 nodes per DC ○ Each has scheduler ○ Each has controller manager ○ Hardware: 2 CPU 8GB RAM ○ OS: Debian 10.1 Controller-Manager & scheduler deployment
  • 17. ● Enabled features on controller-manager: all defaults plus ○ BootstrapSigner: authenticate kubelets on cluster join ○ TokenCleaner: clean expired tokens ● Supplementary features on scheduler: ○ NodeRestrictions: restrict pods on some nodes Controller-Manager & scheduler deployment
  • 20. Node architecture: container runtime ● Valid choice: Docker (https://www.docker.com/) ○ The default one ○ Known by “everyone” in the container world ○ Owned by a company ○ Simple to use
  • 21. Node architecture: container runtime ● Valid choices: Containerd (https://containerd.io/) ○ Younger than Docker ○ Extracted from Docker ○ CNCF enabled project ○ Some limitations: ■ No docker API v1! ■ K8S integration poorly documented
  • 22. Node architecture: container runtime ● Veepee choice: Containerd ○ Supported by CNCF and community ○ Used by Docker as underlying container runtime ○ We use artifactory, Docker API v2 is fully supported ○ Less footprint, less code, lower latency for kubelet
  • 23. Node architecture: system configuration ● Pod DNS configuration ○ clusterDomain: root DNS name for the pods/services ○ clusterDNS: DNS servers configured on pods ■ except if hostNetwork: true and pod DNS policy is default ● Protect system from pods: Ensure node system daemons can run ■ 128Mio memory reserved ■ 0.2 CPU reserved ■ Disk soft & hard limits ● Soft: don’t allow new pods to run if limit reached ● Hard: evict pods if limit reached
  • 24. Node architecture: service proxy ● Exposes K8S service IP on nodes to access pods ● Multiple ways ○ IPTables ○ IPVS ○ External Load Balancer (example AWS ELB in layer 4 or layer 7) ● Multiple possibilities ○ Kube-proxy (iptables, ipvs) ○ Kube-router (ipvs) ○ Calico ○ ...
  • 25. Node architecture: service proxy ● Veepee solution choice: kube-proxy ○ Stay close to Kubernetes distribution: don’t add more complexity ○ No default need for layer 7 load balancing (service type: LoadBalancer), can be added as extra proxy in the future ○ Next challenge: IPTables vs IPVS
  • 26. Node architecture: kube-proxy mode ● Kube-proxy: iptables mode ○ Default recommended mode (faster) ○ Works quite well… but: ■ Doesn’t integrate with Debian 10 and upper (thanks for Debian iptables-nftables tool) => restore legacy iptables mode ■ Has locking problems when multiple programs need it ● https://github.com/weaveworks/weave/issues/3351 ● https://github.com/kubernetes/kubernetes/issues/82587 ● https://github.com/kubernetes/kubernetes/issues/46103 ■ We need kube-proxy and Kubernetes Network Policies ■ We should take care of conntrack :(
  • 27. Node architecture: kube-proxy mode ● Kube-proxy: ipvs mode ○ Works well technically (no locking issue/hacks!) ○ ipvsadm is a very better friend than iptables -t nat ○ ipvs also chosen by some other tools like kube-router ○ calico performance comparison convinced us (https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/)
  • 28. Node architecture: kube-proxy mode ● Veepee final choice: kube-proxy + IPVS
  • 29. Node architecture: network layer ● Interconnects nodes ○ Ensure pod to pod and pod to service communication ○ Can be fully private (our choice) or shared with regular network ● Various ways to achieve it ○ Static routing ○ Dynamic routing (generally BGP) ○ VXLan VPN ○ IPIP VPN ● Multiple ways to allocate node CIDRs ○ Statically (enjoy) ○ Dynamically
  • 30. Node architecture: network layer Warning, reading this slide can make your network engineers crazy ● Allocate two CIDRs for your cluster ○ 1 for nodes and pods ○ 1 for service IPs ● Don’t be conservative, give a thousands of IPs to K8S, each node requires a /24 ○ CIDR /14 for nodes (up to 1024 nodes) ○ CIDR /16 for services (service IP randomness party)
  • 31. Node architecture: network layer ● Needs: ○ Each solution must learn the CIDR of current node through API ○ Network mesh setup should be automagic ● Select the right solution ○ Flannel (default recommended one): VXLan, host-gw ○ Kube-router: IPIP or BGP ○ Calico: IPIP ○ WeaveNet: VXLan
  • 32. Node architecture: network layer First test: flannel in VXLan ● Works quite well ● Very easy setup kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml ● Yes it’s like curl blah | bash ● No we didn’t installed it like this :)
  • 33. Node architecture: network layer First test: flannel in VXLan (https://github.com/coreos/flannel) ● Before a big sale, we load tested an app and… very bad network performance on nodes ○ Iperf shows that the outside network was good, around 9.8Gbps over 10Gbps ○ Node to pod perf was at maximum too ○ Node to node using regular net is around 9.7Gbps ○ Node to node using VXLan is around 3.2Gbps and kernel load is very high ○ Investigation on the recommended way to run VXLan: offload VXLan to network cards. ○ It’s not possible in our case we are using Libvirt/KVM VMs, discard VXLan
  • 34. Node architecture: network layer Second test: kube-router in BGP mode (https://www.kube-router.io/) ● Drops the need of offloading to network card ● Easy setup too kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kube-router-all-service-daemonset.yaml ● Don’t forget to read the yaml and ensure you publish on right cluster :) ● As suspected, using BGP restore the full capacity of the bandwidth ● Other interesting features: ○ Service proxy (IPVS) ○ Network Policy support ○ Network LB using BGP
  • 35. ● Our choice: ○ BGP choice is very nice ○ We can extend the BGP to fabric if needed in the future ○ We need network policy isolation for some sensible apps ○ One binary for both network mesh and policies: less maintenance Node architecture: network layer
  • 36. Tools & exploitation DNS, metrology, logging, ...
  • 37. Kubernetes is not magic: tooling With previous setup we have: ● API ● Container scheduling ● Network communication We have some limits: ● No access from outside ● No DNS resolution ● No metrology/alerting ● Volatile logging on nodes
  • 38. Tooling: DNS resolution Two methods: ● External, using host resolv.conf: no DNS for inside cluster communication, we can use DNS for external resources only ● Internal: inside cluster DNS records, enables service discovery ○ We need it, go ahead
  • 39. Tooling: DNS resolution Two main solutions: ● Kube-dns: legacy one, should not be used for new cluster ○ dnsmasq C layer, single thread ○ 3 containers for a single daemon ? ● Coredns: modern one ○ Golang multithreaded implementation (goroutine) ○ 1 container only ● Some benchmarks (from coredns team, be careful) ○ https://coredns.io/2018/11/27/cluster-dns-coredns-vs-kube-dns/
  • 40. Tooling: DNS resolution ● CoreDNS is the more reasonable choice. ● Our deployment ○ Deployed as Kubernetes deployment ○ Runs on master nodes (3 pods) ○ Configured as default DNS service on all Kubelet
  • 41. Tooling: Access from outside Ingress: access from outside of the cluster Various choices on the market: ● Nginx (the default one) ● Traefik ● Envoy ● Kong ● Ambassador ● Haproxy ● And more...
  • 42. Tooling: Access from outside We studied five: ● ambassador: promising but very young (https://www.getambassador.io/) ● nginx: the OSS model on Nginx is unclear since F5 bought Nginx Inc. (http://nginx.org/) ● haproxy: mature product but ingress is very young and HTTP/2 and gRPC too (http://www.haproxy.org/) ● kong: built on the top of Nginx it's not for general purposes but can be a very nice API gateway (https://konghq.com/kong/) ● Traefik: good licensing, mature and updated regularly (https://traefik.io/)
  • 43. Tooling: Access from outside Because of risks on some products, we benched traefik: ● Kubernetes API ready ● HTTP/2 ready ● TLS/1.3 ready (Veepee minimum: TLS/1.2) ● Scalable & reactive configuration deployments ● TLS certificate reconfiguration in less than 10sec ● TCP/UDP raw balancing (traefik v2)
  • 44. Tooling: Access from outside Traefik bench: ● Very good performance in lab: ○ Tested using k6 and ab tools ○ Test backend was a raw golang HTTP service ○ HTTP: Up to 10krps with 2 pods on VM with 1CPU and 2GB RAM ○ HTTPS: Up to 6.3krps with 2 pods on VM with 1CPU and 2GB RAM ○ Scaling pods doesn’t increase performance, anyway it’s sufficient
  • 45. Tooling: Access from outside Traefik bench: ● Load Testing with a real product: ○ More than 1krps ○ not so recent dotnet.core app ○ Dotnet.core app doesn’t take care about containers and suffers from some contention ○ Anyway the rate is sufficient for the sale: go ahead to prod ○ On a big event sale we sold ~32k concert tickets in 1h40 without problems
  • 46. Tooling: Access from outside Traefik bench: ● Before production sale: ○ We increase nodes from 2 to 3 ○ We increase application size from 2 to 10 instances ● Production sale day (starting at 7am): ○ No incident ○ We sold 32k concert places in 1h40
  • 47. Tooling: metrology/alerting Need: ● collect metrics on pods to do nice graphs Solution: ● A solution to rule them all
  • 48. Tooling: metrology/alerting Implementation: ● Pods exposes a /metrics endpoint through their HTTP listener ● Prometheus will scrape it ● Writing prometheus scrapping configuration by hand is painful ● Hopefully comes: https://github.com/coreos/kube-prometheus + =
  • 49. Tooling: metrology/alerting ● Kube-prometheus implementation: ○ HA prometheus instances ○ HA alertmanager instances ○ Grafana for local metrics view (not reusable for something else) ○ Gather node metrics ○ ServiceMonitor Kubernetes API extension object
  • 54. Tooling: logging How to retrieve logs properly ? ● Logging is volatile on containers ● On docker hosts: just mount a volume from host and write on it ● On K8S: i don’t know where my container runs, i don’t know the host, the host doesn’t want me to write on it, help me doctor!
  • 55. Tooling: logging ● You can prevent open heart surgery in production by knowing the rules
  • 56. Tooling: logging ● Never write logs on disk ○ if you need it, use a sidecar to read it and don’t forget rotation! ● Write on stdout/stderr in a parsable way ○ Json comes to the rescue: known by every devel language, easy to serialize & implement ● Choose a software to gather container logs and push them: ○ filebeat ○ fluentd ○ fluentbit ○ logstash
  • 57. Tooling: logging ● Our choice: fluentd ○ CNCF sponsored (https://www.cncf.io/announcement/2019/04/11/cncf-announces-fluentd-graduati on/) ○ Some needed features on fluentd are not in fluentbit ○ Already used by many SRE at Veepee ● Our deployment model: K8S Daemonset ○ Rolling upgrade flexibility ○ Ensure logs are gathered on each running node ○ Ensure configuration is same everywhere
  • 59. Tooling: logging Fluentd log ingestion pipeline
  • 60. Tooling: client/product isolation Need: ● Ensure a client or product will not steal CPU/Memory/Disk resources of another Two work axis: ● Node level isolation ● Pod level isolation
  • 61. Tooling: client/product isolation Work axis: node level ● Ensure a client (tribe) or a product own the underlying node ● Billing per customer ● Resources per customer, then SRE team Solution: ● Use enforced NodeSelector on namespaces scheduler.alpha.kubernetes.io/node-selector: k8s.veepee.tech/tribe=foundation,k8s.veepee.tech=platform ○ Pod can be at only be scheduled on a node with at minimum those labels
  • 62. Tooling: client/product isolation Work axis: pod level ● Ensure pods are not stealing other pod resources ● Ensure scheduling do the right node choice according to available resources ● Forbid pod allocation if no resource available (no overcommit) Solution: ● LimitRanges
  • 64. <ADD YOUR TITLE HERE/> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
  • 65. <ADD YOUR TITLE HERE/> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
  • 66.