SlideShare a Scribd company logo
1 of 52
Download to read offline
Orchestration
for the rest of us
1 / 52
Disclaimer
I gave this talk in 2015. Since then, the landscape of container
orchestration changed quite a bit.
While the ideas, concepts, and challenges that I mention
remain valid today, take the examples with a grain of salt.
(In fact, you should always take everything I say with a grain
of salt, lest I become lazy and complacent.)
Thank you!
2 / 52
Who am I?
French software engineer living in California
I have built and scaled the dotCloud PaaS
I know a few things about running containers
(in production)
3 / 52
Outline
What's orchestration?
(And when do we need it?)
What's scheduling?
(And why is it hard?)
Taxonomy of schedulers
(Depending on how they handle concurrency)
Mesos in action
Swarm in action
4 / 52
What's
orchestration?
5 / 52
6 / 52
Wikipedia to the rescue!
Orchestration describes the automated arrangement,
coordination, and management of complex computer systems,
middleware, and services.
7 / 52
Wikipedia to the rescue!
Orchestration describes the automated arrangement,
coordination, and management of complex computer systems,
middleware, and services.
[...] orchestration is often discussed in the context of service-
oriented architecture, virtualization, provisioning, Converged
Infrastructure and dynamic datacenter topics.
8 / 52
Wikipedia to the rescue!
Orchestration describes the automated arrangement,
coordination, and management of complex computer systems,
middleware, and services.
[...] orchestration is often discussed in the context of service-
oriented architecture, virtualization, provisioning, Converged
Infrastructure and dynamic datacenter topics.
Uhhh, ok, what does that exactly mean?
9 / 52
Example 1: dynamic cloud instances
10 / 52
Example 1: dynamic cloud instances
Q: do we always use 100% of our servers?
11 / 52
Example 1: dynamic cloud instances
Q: do we always use 100% of our servers?
A: obviously not!
12 / 52
Example 1: dynamic cloud instances
Every night, scale down
(by shutting down extraneous replicated instances)
Every morning, scale up
(by deploying new copies)
"Pay for what you use"
(i.e. save big $$$ here)
13 / 52
Example 1: dynamic cloud instances
How do we implement this?
Crontab
Autoscaling (save even bigger $$$)
That's relatively easy.
Now, how are things for our IAAS provider?
14 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
15 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
A: electricity!
16 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
A: electricity!
Q: what uses electricity?
17 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
A: electricity!
Q: what uses electricity?
A: servers, obviously
A: ... and associated cooling
18 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
A: electricity!
Q: what uses electricity?
A: servers, obviously
A: ... and associated cooling
Q: do we always use 100% of our servers?
19 / 52
Example 2: dynamic datacenter
Q: what's the #1 cost in a datacenter?
A: electricity!
Q: what uses electricity?
A: servers, obviously
A: ... and associated cooling
Q: do we always use 100% of our servers?
A: obviously not!
20 / 52
Example 2: dynamic datacenter
If only we could turn off unused servers during the night...
Problem: we can only turn off a server if it's totally empty!
(i.e. all VMs on it are stopped/moved)
Solution: migrate VMs and shutdown empty servers
(e.g. combine two hypervisors with 40% load into 80%+0%,
and shutdown the one at 0%)
21 / 52
Example 2: dynamic datacenter
How do we implement this?
Shutdown empty hosts
(but make sure that there is spare capacity!)
Restart hosts when capacity is low
Ability to "live migrate" VMs
(Xen already did this 10+ years ago)
Rebalance VMs on a regular basis
- what if a VM is stopped while we move it?
- should we allow provisioning on hosts involved in a migration?
Scheduling becomes more complex.
22 / 52
What is
scheduling?
23 / 52
Wikipedia to the rescue! (Again!)
In computing, scheduling is the method by which threads,
processes or data flows are given access to system resources.
The scheduler is concerned mainly with:
throughput (total amount or work done per time unit);
turnaround time (between submission and completion);
response time (between submission and start);
waiting time (between job readiness and execution);
fairness (appropriate times according to priorities).
In practice, these goals often conflict.
"Scheduling" = decide which resources to use.
24 / 52
Exercise 1
You have:
5 hypervisors (physical machines)
Each server has:
16 GB RAM, 8 cores, 1 TB disk
Each week, your team asks:
one VM with X RAM, Y CPU, Z disk
Scheduling = deciding which hypervisor to use for each VM.
Difficulty: easy!
25 / 52
Exercise 2
You have:
1000+ hypervisors (and counting!)
Each server has different resources:
8-500 GB of RAM, 4-64 cores, 1-100 TB disk
Multiple times a day, a different team asks for:
up to 50 VMs with different characteristics
Scheduling = deciding which hypervisor to use for each VM.
Difficulty: ???
26 / 52
Exercise 2
You have:
1000+ hypervisors (and counting!)
Each server has different resources:
8-500 GB of RAM, 4-64 cores, 1-100 TB disk
Multiple times a day, a different team asks for:
up to 50 VMs with different characteristics
Scheduling = deciding which hypervisor to use for each VM.
27 / 52
Exercise 3
You have machines (physical and/or virtual)
You have containers
You are trying to put the containers on the machines
Sounds familiar?
28 / 52
Scheduling with one resource
Can we do better?
29 / 52
Scheduling with one resource
Yup!
30 / 52
Scheduling with two resources
31 / 52
Scheduling with three resources
32 / 52
You need to be good at this
33 / 52
But also, you must be quick!
34 / 52
And be web scale!
35 / 52
And think outside (?) of the box!
36 / 52
Good luck!
37 / 52
TL,DR
Scheduling with multiple resources (dimensions) is hard
Don't expect to solve the problem with a Tiny Shell Script
There are literally tons of research papers written on this
38 / 52
TL,DR
Scheduling with multiple resources (dimensions) is hard
Don't expect to solve the problem with a Tiny Shell Script
There are literally tons of research papers written on this
Speaking of which...
39 / 52
Taxonomy of
schedulers
(According to the famous
"Omega paper")
40 / 52
Monolithic schedulers
Concurrency model: none
All scheduling requests go through a central place
The scheduler examines requests one at a time (usually)
No conflict is possible
41 / 52
Monolithic schedulers ranking
Pros:
simple to understand
no concurrency issue
Cons:
SPOF (need replication + master election)
prone to feature creep
head-of-line blocking (slow jobs blocking everybody)
supposedly not web scale (more on this later)
42 / 52
Monolithic schedulers examples
one-person manual scheduling ("Hello IT?")
Hadoop YARN
most grid schedulers for scientific compute
Google Borg (so they kind of scale anyway...)
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf;
43 / 52
We are not sure where the ultimate scalability limit to Borg’s
centralized architecture will come from; so far, every time we
have approached a limit, we’ve managed to eliminate it. A
single Borgmaster can manage many thousands of machines in
a cell, and several cells have arrival rates above 10 000 tasks
per minute. A busy Borgmaster uses 10–14 CPU cores and up to
50 GiB RAM. We use several techniques to achieve this scale.
44 / 52
Two-level schedulers
Concurrency model: pessimistic
Top level: master who holds all the resources
Second level: frameworks
To run something, you talk to a framework
The frameworks are given offers by the master
(chunks of resources)
A given resource is offered only once
(hence "pessimistic" concurrency; no conflict can happen)
If a framework needs more resources, it hoards them
(i.e. keeps them, without using them, waiting for more)
45 / 52
Two-level schedulers examples
Mesos
Frameworks correspond to different ways to consume resources:
Marathon (keep something running forever)
Chronos (cron-like periodic execution)
Jenkins (spin-up Jenkins slave on demand)
and many more
46 / 52
Two-level schedulers ranking
Pros:
easy to implement custom behavior
(supposedly) reduced wait times
DEM SCALES! (run multiple copies of a framework)
Cons:
SPOF (need replication + master election)
hoarding is inefficient
well-suited for small, short-lived jobs;
not so much for big, long-lived ones
(increased decision time = bad!)
47 / 52
Shared state schedulers
Concurrency model: optimistic
A master holds the authoritative state of the whole cluster
Multiple schedulers hold a (read-only) copy of that state
(and keep it in sync)
You submit jobs to one of those schedulers
The scheduler does its magic and submits a transaction
The master can accept the transaction fully or partially
(e.g. if another transaction caused overcommit on a
specific resource: memory >100% on a machine)
48 / 52
Shared state schedulers examples
Flynn
?
49 / 52
Shared state schedulers ranking
Pros:
easy to implement custom behavior
reduced wait times
super duper awesome scalability
Cons:
SPOF (need replication + master election)
need to handle partial transactions (I think)
haven't seen it in action at scale yet
(but I'd be delighted to be enlightened!)
50 / 52
Demo
51 / 52
Thanks!
Questions?
@jpetazzo
@docker
52 / 52

More Related Content

What's hot

How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)Jérôme Petazzoni
 
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Jérôme Petazzoni
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Jérôme Petazzoni
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionJérôme Petazzoni
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSJérôme Petazzoni
 
Docker and Puppet — Puppet Camp L.A. — SCALE12X
Docker and Puppet — Puppet Camp L.A. — SCALE12XDocker and Puppet — Puppet Camp L.A. — SCALE12X
Docker and Puppet — Puppet Camp L.A. — SCALE12XJérôme Petazzoni
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud
 
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...
Why everyone is excited about Docker (and you should too...) -  Carlo Bonamic...Why everyone is excited about Docker (and you should too...) -  Carlo Bonamic...
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...Codemotion
 
Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...Boden Russell
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with DockerMarc Campbell
 
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)Boden Russell
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)Casey Bisson
 
Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Docker, Inc.
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...Jérôme Petazzoni
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes CloudscalingRishidot Research
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
CI/CD with Kubernetes
CI/CD with KubernetesCI/CD with Kubernetes
CI/CD with KubernetesHart Hoover
 
OSCON: Better Collaboration through Tooling
OSCON: Better Collaboration through ToolingOSCON: Better Collaboration through Tooling
OSCON: Better Collaboration through ToolingDocker, Inc.
 

What's hot (20)

How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)
 
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFS
 
Docker and Puppet — Puppet Camp L.A. — SCALE12X
Docker and Puppet — Puppet Camp L.A. — SCALE12XDocker and Puppet — Puppet Camp L.A. — SCALE12X
Docker and Puppet — Puppet Camp L.A. — SCALE12X
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...
Why everyone is excited about Docker (and you should too...) -  Carlo Bonamic...Why everyone is excited about Docker (and you should too...) -  Carlo Bonamic...
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...
 
Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with Docker
 
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes Cloudscaling
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Hyper v r2 deep dive
Hyper v r2 deep diveHyper v r2 deep dive
Hyper v r2 deep dive
 
moscmy2016: Extending Docker
moscmy2016: Extending Dockermoscmy2016: Extending Docker
moscmy2016: Extending Docker
 
CI/CD with Kubernetes
CI/CD with KubernetesCI/CD with Kubernetes
CI/CD with Kubernetes
 
OSCON: Better Collaboration through Tooling
OSCON: Better Collaboration through ToolingOSCON: Better Collaboration through Tooling
OSCON: Better Collaboration through Tooling
 

Similar to Orchestration for the rest of us

Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Michael Rosenblum
 
Clustered PHP - DC PHP 2009
Clustered PHP - DC PHP 2009Clustered PHP - DC PHP 2009
Clustered PHP - DC PHP 2009marcelesser
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesYoav Francis
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semesterRafi Ullah
 
Devops - why, what and how?
Devops - why, what and how?Devops - why, what and how?
Devops - why, what and how?Malinda Kapuruge
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
Adobe Meetup AEM Architecture Sydney 2015
Adobe Meetup AEM Architecture Sydney 2015Adobe Meetup AEM Architecture Sydney 2015
Adobe Meetup AEM Architecture Sydney 2015Michael Henderson
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfmanimozhi98
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersEasier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersC4Media
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two HatsBen Stopford
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
Architecting for Failure in a Containerized World
Architecting for Failure in a Containerized WorldArchitecting for Failure in a Containerized World
Architecting for Failure in a Containerized WorldTom Faulhaber
 

Similar to Orchestration for the rest of us (20)

BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
 
Clustered PHP - DC PHP 2009
Clustered PHP - DC PHP 2009Clustered PHP - DC PHP 2009
Clustered PHP - DC PHP 2009
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester
 
Devops - why, what and how?
Devops - why, what and how?Devops - why, what and how?
Devops - why, what and how?
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Adobe Meetup AEM Architecture Sydney 2015
Adobe Meetup AEM Architecture Sydney 2015Adobe Meetup AEM Architecture Sydney 2015
Adobe Meetup AEM Architecture Sydney 2015
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Dsys guide37
Dsys guide37Dsys guide37
Dsys guide37
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
 
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersEasier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
 
Lecture1
Lecture1Lecture1
Lecture1
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two Hats
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Architecting for Failure in a Containerized World
Architecting for Failure in a Containerized WorldArchitecting for Failure in a Containerized World
Architecting for Failure in a Containerized World
 

More from Jérôme Petazzoni

Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Jérôme Petazzoni
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni
 
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Jérôme Petazzoni
 
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Jérôme Petazzoni
 
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Jérôme Petazzoni
 
Docker Non Technical Presentation
Docker Non Technical PresentationDocker Non Technical Presentation
Docker Non Technical PresentationJérôme Petazzoni
 
Containers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioContainers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioJérôme Petazzoni
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyJérôme Petazzoni
 
Docker en Production (Docker Paris)
Docker en Production (Docker Paris)Docker en Production (Docker Paris)
Docker en Production (Docker Paris)Jérôme Petazzoni
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkJérôme Petazzoni
 
Introduction to Docker and deployment and Azure
Introduction to Docker and deployment and AzureIntroduction to Docker and deployment and Azure
Introduction to Docker and deployment and AzureJérôme Petazzoni
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Jérôme Petazzoni
 

More from Jérôme Petazzoni (16)

Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
 
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
 
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
 
Docker Non Technical Presentation
Docker Non Technical PresentationDocker Non Technical Presentation
Docker Non Technical Presentation
 
Containers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioContainers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific Trio
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange County
 
Docker en Production (Docker Paris)
Docker en Production (Docker Paris)Docker en Production (Docker Paris)
Docker en Production (Docker Paris)
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New York
 
Introduction to Docker and deployment and Azure
Introduction to Docker and deployment and AzureIntroduction to Docker and deployment and Azure
Introduction to Docker and deployment and Azure
 
Killer Bugs From Outer Space
Killer Bugs From Outer SpaceKiller Bugs From Outer Space
Killer Bugs From Outer Space
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Orchestration for the rest of us

  • 2. Disclaimer I gave this talk in 2015. Since then, the landscape of container orchestration changed quite a bit. While the ideas, concepts, and challenges that I mention remain valid today, take the examples with a grain of salt. (In fact, you should always take everything I say with a grain of salt, lest I become lazy and complacent.) Thank you! 2 / 52
  • 3. Who am I? French software engineer living in California I have built and scaled the dotCloud PaaS I know a few things about running containers (in production) 3 / 52
  • 4. Outline What's orchestration? (And when do we need it?) What's scheduling? (And why is it hard?) Taxonomy of schedulers (Depending on how they handle concurrency) Mesos in action Swarm in action 4 / 52
  • 7. Wikipedia to the rescue! Orchestration describes the automated arrangement, coordination, and management of complex computer systems, middleware, and services. 7 / 52
  • 8. Wikipedia to the rescue! Orchestration describes the automated arrangement, coordination, and management of complex computer systems, middleware, and services. [...] orchestration is often discussed in the context of service- oriented architecture, virtualization, provisioning, Converged Infrastructure and dynamic datacenter topics. 8 / 52
  • 9. Wikipedia to the rescue! Orchestration describes the automated arrangement, coordination, and management of complex computer systems, middleware, and services. [...] orchestration is often discussed in the context of service- oriented architecture, virtualization, provisioning, Converged Infrastructure and dynamic datacenter topics. Uhhh, ok, what does that exactly mean? 9 / 52
  • 10. Example 1: dynamic cloud instances 10 / 52
  • 11. Example 1: dynamic cloud instances Q: do we always use 100% of our servers? 11 / 52
  • 12. Example 1: dynamic cloud instances Q: do we always use 100% of our servers? A: obviously not! 12 / 52
  • 13. Example 1: dynamic cloud instances Every night, scale down (by shutting down extraneous replicated instances) Every morning, scale up (by deploying new copies) "Pay for what you use" (i.e. save big $$$ here) 13 / 52
  • 14. Example 1: dynamic cloud instances How do we implement this? Crontab Autoscaling (save even bigger $$$) That's relatively easy. Now, how are things for our IAAS provider? 14 / 52
  • 15. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? 15 / 52
  • 16. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? A: electricity! 16 / 52
  • 17. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? A: electricity! Q: what uses electricity? 17 / 52
  • 18. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? A: electricity! Q: what uses electricity? A: servers, obviously A: ... and associated cooling 18 / 52
  • 19. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? A: electricity! Q: what uses electricity? A: servers, obviously A: ... and associated cooling Q: do we always use 100% of our servers? 19 / 52
  • 20. Example 2: dynamic datacenter Q: what's the #1 cost in a datacenter? A: electricity! Q: what uses electricity? A: servers, obviously A: ... and associated cooling Q: do we always use 100% of our servers? A: obviously not! 20 / 52
  • 21. Example 2: dynamic datacenter If only we could turn off unused servers during the night... Problem: we can only turn off a server if it's totally empty! (i.e. all VMs on it are stopped/moved) Solution: migrate VMs and shutdown empty servers (e.g. combine two hypervisors with 40% load into 80%+0%, and shutdown the one at 0%) 21 / 52
  • 22. Example 2: dynamic datacenter How do we implement this? Shutdown empty hosts (but make sure that there is spare capacity!) Restart hosts when capacity is low Ability to "live migrate" VMs (Xen already did this 10+ years ago) Rebalance VMs on a regular basis - what if a VM is stopped while we move it? - should we allow provisioning on hosts involved in a migration? Scheduling becomes more complex. 22 / 52
  • 24. Wikipedia to the rescue! (Again!) In computing, scheduling is the method by which threads, processes or data flows are given access to system resources. The scheduler is concerned mainly with: throughput (total amount or work done per time unit); turnaround time (between submission and completion); response time (between submission and start); waiting time (between job readiness and execution); fairness (appropriate times according to priorities). In practice, these goals often conflict. "Scheduling" = decide which resources to use. 24 / 52
  • 25. Exercise 1 You have: 5 hypervisors (physical machines) Each server has: 16 GB RAM, 8 cores, 1 TB disk Each week, your team asks: one VM with X RAM, Y CPU, Z disk Scheduling = deciding which hypervisor to use for each VM. Difficulty: easy! 25 / 52
  • 26. Exercise 2 You have: 1000+ hypervisors (and counting!) Each server has different resources: 8-500 GB of RAM, 4-64 cores, 1-100 TB disk Multiple times a day, a different team asks for: up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. Difficulty: ??? 26 / 52
  • 27. Exercise 2 You have: 1000+ hypervisors (and counting!) Each server has different resources: 8-500 GB of RAM, 4-64 cores, 1-100 TB disk Multiple times a day, a different team asks for: up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. 27 / 52
  • 28. Exercise 3 You have machines (physical and/or virtual) You have containers You are trying to put the containers on the machines Sounds familiar? 28 / 52
  • 29. Scheduling with one resource Can we do better? 29 / 52
  • 30. Scheduling with one resource Yup! 30 / 52
  • 31. Scheduling with two resources 31 / 52
  • 32. Scheduling with three resources 32 / 52
  • 33. You need to be good at this 33 / 52
  • 34. But also, you must be quick! 34 / 52
  • 35. And be web scale! 35 / 52
  • 36. And think outside (?) of the box! 36 / 52
  • 38. TL,DR Scheduling with multiple resources (dimensions) is hard Don't expect to solve the problem with a Tiny Shell Script There are literally tons of research papers written on this 38 / 52
  • 39. TL,DR Scheduling with multiple resources (dimensions) is hard Don't expect to solve the problem with a Tiny Shell Script There are literally tons of research papers written on this Speaking of which... 39 / 52
  • 40. Taxonomy of schedulers (According to the famous "Omega paper") 40 / 52
  • 41. Monolithic schedulers Concurrency model: none All scheduling requests go through a central place The scheduler examines requests one at a time (usually) No conflict is possible 41 / 52
  • 42. Monolithic schedulers ranking Pros: simple to understand no concurrency issue Cons: SPOF (need replication + master election) prone to feature creep head-of-line blocking (slow jobs blocking everybody) supposedly not web scale (more on this later) 42 / 52
  • 43. Monolithic schedulers examples one-person manual scheduling ("Hello IT?") Hadoop YARN most grid schedulers for scientific compute Google Borg (so they kind of scale anyway...) http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf; 43 / 52
  • 44. We are not sure where the ultimate scalability limit to Borg’s centralized architecture will come from; so far, every time we have approached a limit, we’ve managed to eliminate it. A single Borgmaster can manage many thousands of machines in a cell, and several cells have arrival rates above 10 000 tasks per minute. A busy Borgmaster uses 10–14 CPU cores and up to 50 GiB RAM. We use several techniques to achieve this scale. 44 / 52
  • 45. Two-level schedulers Concurrency model: pessimistic Top level: master who holds all the resources Second level: frameworks To run something, you talk to a framework The frameworks are given offers by the master (chunks of resources) A given resource is offered only once (hence "pessimistic" concurrency; no conflict can happen) If a framework needs more resources, it hoards them (i.e. keeps them, without using them, waiting for more) 45 / 52
  • 46. Two-level schedulers examples Mesos Frameworks correspond to different ways to consume resources: Marathon (keep something running forever) Chronos (cron-like periodic execution) Jenkins (spin-up Jenkins slave on demand) and many more 46 / 52
  • 47. Two-level schedulers ranking Pros: easy to implement custom behavior (supposedly) reduced wait times DEM SCALES! (run multiple copies of a framework) Cons: SPOF (need replication + master election) hoarding is inefficient well-suited for small, short-lived jobs; not so much for big, long-lived ones (increased decision time = bad!) 47 / 52
  • 48. Shared state schedulers Concurrency model: optimistic A master holds the authoritative state of the whole cluster Multiple schedulers hold a (read-only) copy of that state (and keep it in sync) You submit jobs to one of those schedulers The scheduler does its magic and submits a transaction The master can accept the transaction fully or partially (e.g. if another transaction caused overcommit on a specific resource: memory >100% on a machine) 48 / 52
  • 49. Shared state schedulers examples Flynn ? 49 / 52
  • 50. Shared state schedulers ranking Pros: easy to implement custom behavior reduced wait times super duper awesome scalability Cons: SPOF (need replication + master election) need to handle partial transactions (I think) haven't seen it in action at scale yet (but I'd be delighted to be enlightened!) 50 / 52