Scheduler of a container orchestration system, such as YARN and K8s, is a critical component that users rely on to plan resources and manage applications.
And if we assess where we are today, in YARN effectively it had two power schedulers (Fair and Capacity scheduler) and both serve many strong use cases in big data ecosystem. It can scale up to 50k nodes per cluster, and schedule 20k containers per second, and extremely efficient to manage batch workloads.
K8s default scheduler is an industry-proven solution to efficiently manage long-running services. As more big data apps are moving to K8s and cloud world, but many features like hierarchical queues to support multi-tenancy better, fairness resource sharing, and preemption, etc. are either missing or not mature enough at this point of time to support big data apps running on K8s.
At this point, there is no solution that exists to address the needs of having a unified resource scheduling experiences across platforms. That makes it extremely difficult to manage workloads running on different environments, from on-premise to cloud.
Hence evolving a common scheduler powered from YARN and K8s’s legacy capabilities and improving towards cloud use cases will focus more on use cases like:
Better bin-packing scheduling (and gang scheduling)
Autoscale up and shrink policy management
Effectively run batch workloads and services with clear SLA’s
In summary, we are improving core scheduling capabilities to manage both K8s and YARN cluster which is cloud aware as a separate initiative and above-mentioned cases will be the core focus of this initiative. More details of our works will be presented in this talk.