Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Implement Advanced Scheduling Techniques in Kubernetes

4 445 vues

Publié le


Is advanced scheduling in Kubernetes achievable? Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations?

Oleg Chunikhin addressed those questions and demonstrated techniques for implementing advanced scheduling. For example, using spot instances and cost-effective resources on AWS, coupled with the ability to deliver a minimum set of functionalities that cover the majority of needs – without configuration complexity. You’ll get a run-down of the pitfalls and things to keep in mind for this route.

Publié dans : Technologie
  • Soyez le premier à commenter

Implement Advanced Scheduling Techniques in Kubernetes

  1. 1. Implement Advanced Scheduling Techniques in Kubernetes Oleg Chunikhin | CTO, Kublr | February 2018
  2. 2. Introduction • Oleg Chunikhin • CTO @ Kublr • Chief Software Architect @ EastBanc Technologies • Kublr • Enterprise Kubernetes cluster manager • Application delivery platform
  3. 3. What to Look For • Kubernetes overview • Scheduling algorithm • Scheduling controls • Advanced scheduling techniques • Examples, use cases, and recommendations
  4. 4. Kubernetes | Technology Stack Kubernetes • Orchestration • Network • Configuration • Service discovery • Ingress • Persistence • … Docker • Distribution • Configuration • Isolation
  5. 5. Docker | Architecture Docker image repository Instance Images App data Docker CLI Overlay network Docker daemon Application containersApplication containers
  6. 6. Kubernetes | Architecture Master Node K8s master components: etcd, scheduler, api, controller K8s metadata Docker kubelet App data K8s node components: overlay network, discovery, connectivity Infrastructure and application containers Infrastructure and application containers Overlay network
  7. 7. Kubernetes | Nodes and Pods Node2 Pod A-2 10.0.1.5 Cnt1 Cnt2 Node 1 Pod A-1 10.0.0.3 Cnt1 Cnt2 Pod B-1 10.0.0.8 Cnt3
  8. 8. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) Pod A Pod B K8S Controller(s) User Node 1 Pod A Pod B Node 2 Pod C
  9. 9. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User It all starts empty
  10. 10. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Kubelet registers node object in master
  11. 11. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2
  12. 12. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 User creates (unscheduled) Pod object(s) in Master
  13. 13. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Scheduler notices unscheduled Pods ...
  14. 14. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …identifies the best node to run them on… Pod A Pod B Pod C
  15. 15. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …and marks the pods as scheduled on corresponding nodes. Pod A Pod B Pod C
  16. 16. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Kubelet notices pods scheduled to its nodes… Pod A Pod B Pod C
  17. 17. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …and starts pods’ containers. Pod A Pod B Pod C Pod A Pod B
  18. 18. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Scheduler finds the best node to run pods. HOW? Pod A Pod B Pod C Pod A Pod B
  19. 19. Kubernetes | Scheduling Algorithm For each pod that needs scheduling: 1. Filter nodes 2. Calculate nodes priorities 3. Schedule pod if possible
  20. 20. Kubernetes | Scheduling Algorithm Volume filters • Do pod requested volumes’ zones fit the node’s zone? • Can the node attach to the volumes? • Are there mounted volumes conflicts? • Are there additional volume topology constraints? Volume filters Resource filters Topology filters Prioritization
  21. 21. Kubernetes | Scheduling Algorithm Resource filters • Does pod requested resources (CPU, RAM GPU, etc) fit the node’s available resources? • Can pod requested ports be opened on the node? • Is there no memory or disk pressure on the node? Volume filters Resource filters Topology filters Prioritization
  22. 22. Kubernetes | Scheduling Algorithm Topology filters • Is the pod requested to run on this node? • Are there inter-pod affinity constraints? • Does the node match the pod’s node selector? • Can the pod tolerate the node’s taints? Volume filters Resource filters Topology filters Prioritization
  23. 23. Kubernetes | Scheduling Algorithm Prioritize with weights for • Pod replicas distribution • Least (or most) node utilization • Balanced resource usage • Inter-pod affinity priority • Node affinity priority • Taint toleration priority Volume filters Resource filters Topology filters Prioritization
  24. 24. Scheduling Controlling Pods Destination • Specify resource requirements • Be aware of volumes • Use node constraints • Use affinity and anti-affinity • Scheduler configuration • Custom / multiple schedulers
  25. 25. Scheduling Controlled | Resources • CPU, RAM, other (GPU) • Requests and limits • Reserved resources kind: Node status: allocatable: cpu: "4" memory: 8070796Ki pods: "110" capacity: cpu: "4" memory: 8Gi pods: "110" kind: Pod spec: containers: - name: main resources: requests: cpu: 100m memory: 1Gi
  26. 26. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure the node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Pod A Node 2 Volume 2 Pod B Unschedulable Zone A Pod C Requested Volume Zone B
  27. 27. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure the node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Pod A Volume 2Pod B Pod C Requested Volume Volume 1
  28. 28. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Volume 1Pod A Node 2 Volume 2Pod B Pod C
  29. 29. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) annotations: "volume.alpha.kubernetes.io/node-affinity": '{ "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [{ "matchExpressions": [{ "key": "kubernetes.io/hostname", "operator": "In", "values": ["docker03"] }] }] }}'
  30. 30. Scheduling Controlled | Constraints • Host constraints • Labels and node selectors • Taints and tolerations Node 1Pod A kind: Pod spec: nodeName: node1 kind: Node metadata: name: node1
  31. 31. Scheduling Controlled | Node Constraints • Host constraints • Labels and node selectors • Taints and tolerations Node 1 Pod A Node 2 Node 3 label: tier: backend kind: Node metadata: labels: tier: backend kind: Pod spec: nodeSelector: tier: backend
  32. 32. Scheduling Controlled | Node Constraints • Host constraints • Labels and node selectors • Taints and tolerations kind: Pod spec: tolerations: - key: error value: disk operator: Equal effect: NoExecute tolerationSeconds: 60 kind: Node spec: taints: - effect: NoSchedule key: error value: disk timeAdded: null Pod B Node 1 tainted Pod A tolerate
  33. 33. Scheduling Controlled | Taints Taints communicate node conditions • Key – condition category • Value – specific condition • Operator – value wildcard • Equal • Exists • Effect • NoSchedule – filter at scheduling time • PreferNoSchedule – prioritize at scheduling time • NoExecute – filter at scheduling time, evict if executing • TolerationSeconds – time to tolerate “NoExecute” taint kind: Pod spec: tolerations: - key: <taint key> value: <taint value> operator: <match operator> effect: <taint effect> tolerationSeconds: 60
  34. 34. Scheduling Controlled | Affinity • Node affinity • Inter-pod affinity • Inter-pod anti-affinity kind: Pod spec: affinity: nodeAffinity: { ... } podAffinity: { ... } podAntiAffinity: { ... }
  35. 35. Scheduling Controlled | Node Affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 preference: { <node selector term> } - ... requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - { <node selector term> } - ... v
  36. 36. Interlude | Node Selector vs Node Selector Term ... nodeSelector: <label 1 key>: <label 1 value> ... ... <node selector term>: matchExpressions: - key: <label key> operator: In | NotIn | Exists | DoesNotExist | Gt | Lt values: - <label value 1> ... ...
  37. 37. Scheduling Controlled | Inter-pod Affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 podAffinityTerm: { <pod affinity term> } - ... requiredDuringSchedulingIgnoredDuringExecution: - { <pod affinity term> } - ...
  38. 38. Scheduling Controlled | Inter-pod Anti-affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 podAffinityTerm: { <pod affinity term> } - ... requiredDuringSchedulingIgnoredDuringExecution: - { <pod affinity term> } - ...
  39. 39. Scheduling Controlled | Pod Affinity Terms • topologyKey – nodes’ label key defining co-location • labelSelector and namespaces – select group of pods <pod affinity term>: topologyKey: <topology label key> namespaces: [ <namespace>, ... ] labelSelector: matchLabels: <label key>: <label value> ... matchExpressions: - key: <label key> operator: In | NotIn | Exists | DoesNotExist values: [ <value 1>, ... ] ...
  40. 40. Scheduling Controlled | Affinity Example affinity: topologyKey: tier labelSelector: matchLabels: group: a Node 1 tier: a Pod B group: a Node 3 tier: b tier: a Node 4 tier: b tier: b Pod B group: a Node 1 tier: a
  41. 41. Scheduling Controlled | Scheduler Configuration • Algorithm provider • Policy configuration file / ConfigMap • Extender
  42. 42. Default Scheduler | Algorithm Provider kube-scheduler --scheduler-name=default-scheduler --algorithm-provider=DefaultProvider --algorithm-provider=ClusterAutoscalerProvider
  43. 43. Default Scheduler | Custom Policy Config kube-scheduler --config=<file> --policy-config-file=<file> --use-legacy-policy-config=<true|false> --policy-configmap=<config map name> --policy-configmap-namespace=<config map ns>
  44. 44. Default Scheduler | Custom Policy Config { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [ {"name" : "PodFitsHostPorts"}, ... {"name" : "HostName"} ], "priorities" : [ {"name" : "LeastRequestedPriority", "weight" : 1}, ... {"name" : "EqualPriority", "weight" : 1} ], "hardPodAffinitySymmetricWeight" : 10, "alwaysCheckAllPredicates" : false }
  45. 45. Default Scheduler | Scheduler Extender { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [...], "priorities" : [...], "extenders" : [{ "urlPrefix": "http://127.0.0.1:12346/scheduler", "filterVerb": "filter", "bindVerb": "bind", "prioritizeVerb": "prioritize", "weight": 5, "enableHttps": false, "nodeCacheCapable": false }], "hardPodAffinitySymmetricWeight" : 10, "alwaysCheckAllPredicates" : false }
  46. 46. Default Scheduler | Scheduler Extender func fiter(pod, nodes) api.NodeList func prioritize(pod, nodes) HostPriorityList func bind(pod, node)
  47. 47. Scheduling Controlled | Multiple Schedulers kind: Pod Metadata: name: pod2 spec: schedulerName: my-scheduler kind: Pod Metadata: name: pod1 spec: ...
  48. 48. Scheduling Controlled | Custom Scheduler Naive implementation • In an infinite loop: • Get list of Nodes: /api/v1/nodes • Get list of Pods: /api/v1/pods • Select Pods with status.phase == Pending and spec.schedulerName == our-name • For each pod: • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  49. 49. Scheduling Controlled | Custom Scheduler Better implementation • Watch Pods: /api/v1/pods • On each Pod event: • Process if the Pod with status.phase == Pending and spec.schedulerName == our-name • Get list of Nodes: /api/v1/nodes • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  50. 50. Scheduling Controlled | Custom Scheduler Even better implementation • Watch Nodes: /api/v1/nodes • On each Node event: • Update Node cache • Watch Pods: /api/v1/pods • On each Pod event: • Process if the Pod with status.phase == Pending and spec.schedulerName == our-name • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  51. 51. Custom Scheduler | Standard Filters • Minimal set of filters • kube-scheduler • Extend • Re-implement GitHub kubernetes/kubernetes plugin/pkg/scheduler/scheduler.go plugin/pkg/scheduler/algorithm/predicates/predicates.go
  52. 52. Use Case | Distributed Pods apiVersion: v1 kind: Pod metadata: name: db-replica-3 labels: component: db spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: component operator: In values: [ "db" ] Node 2 db-replica-2 Node 1 Node 3 db-replica-1 db-replica-3
  53. 53. Use Case | Co-located Pods apiVersion: v1 kind: Pod metadata: name: app-replica-1 labels: component: web spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: component operator: In values: [ "db" ] Node 2 db-replica-2 Node 1 Node 3 db-replica-1 app-replica-1
  54. 54. Use Case | Reliable Service on Spot Nodes • “fixed” node group Expensive, more reliable, fixed number Tagged with label nodeGroup: fixed • “spot” node group Inexpensive, unreliable, auto-scaled Tagged with label nodeGroup: spot • Scheduling rules: • At least two pods on “fixed” nodes • All other pods favor “spot” nodes • Custom scheduler
  55. 55. Scheduling | Dos and Don’ts DO • Use resource-based scheduling instead of node-based • Specify resource requests • Keep requests == limits • Especially for non-elastic resources • Memory is non-elastic! • Safeguard against missing resource specs • Namespace default limits • Admission controllers • Plan architecture of localized volumes (EBS, local) • Use inter-pod affinity/anti-affinity if possible DON’T • ... assign pod to nodes directly • ... use pods with no resource requests • ... use resource requests rather node • ... use node-affinity or node assignment if possible
  56. 56. Scheduling | Key Takeaways • Scheduling filters and priorities • Resource requests and availability • Inter-pod affinity/anti-affinity • Volumes localization (AZ) • Node labels and selectors • Node affinity/anti-affinity • Node taints and tolerations • Scheduler(s) tweaking and customization
  57. 57. Oleg Chunikhin Chief Technology Officer oleg@kublr.com kublr.com Thank you!

×