1. Resource Management Part II
Ohad Shai, Spring 2015
Challenges in Modern Data Centers Management, Spring 2015
2. Information provided in these slides is
for educational purposes only
Challenges in Modern Data Centers Management, Spring 2015
3. Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
4. Reminder: two actions by the scheduler
1. Job scheduling – previous lecture
• Selecting the next job(s) to execute
2. Resource matching (allocation) – this lecture
• Match the job(s) with available resources
• Both are interdependent
• One affects the other
• Both must be done fast
• Hundreds of jobs per-second in large scale installations
Challenges in Modern Data Centers Management, Spring 2015
5. Resource matching challenge
• Each server comes with resource capacity
• Cores, memory, local disk, etc.
• E.g., 16 cores and 128GB of memory
• Part of the capacity might already be used by running jobs
• Each job has resource requirements
• Memory and core requirements, disk space to use, etc.
• E.g., 2-cores X 4GB of memory
• Additional constraints
• E.g., OS, CPU architecture, accelerator (GPU), etc.
Challenges in Modern Data Centers Management, Spring 2015
6. Resource matching challenge cont.
• Match the stream of incoming jobs with available resources on the servers
• Jobs have already been ordered by the job scheduling step (proportional share…)
• Goal now is to minimize fragmentation (maximize utilization)
• Big numbers
• Thousands of servers
• Tens-of-thousands waiting (and running) jobs
• NP-complete problem
• Bin-packing optimization
• Needs to be done extremely fast
• Default to using heuristics
• Often perform close to optimal
Challenges in Modern Data Centers Management, Spring 2015
7. Two approaches
1. One job at-a-time
• Find “best” matching for first waiting job
• Find “best” matching for second waiting job
• etc.
• Once found – start executing the job immediately
2. Considering multiple jobs (look-ahead)
• Calculate a “match” between several jobs in the queue and available resources
• Start executing all jobs on the selected severs together
Challenges in Modern Data Centers Management, Spring 2015
8. One job at-a-time: Common heuristics
• Random
• Randomly pick server (with enough available resources) and assign it to the job
• First-fit
• Sort the servers by some constant algorithm and assign the first that fits
• Best-fit
• Packs the jobs on servers at the cost of unbalanced resource usage
• Useful if anticipating large jobs
• Worst-fit
• Maintains balanced resource usage across the servers
• Great for workloads that are mostly homogeneous
Challenges in Modern Data Centers Management, Spring 2015
9. Example 1: Worst-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 8 jobs arriving
• 2 x 1 core & 16GB of memory (“blue”)
• 6 x 1 core & 4GB of memory (“green”)
Challenges in Modern Data Centers Management, Spring 2015
10. Example 2: Best-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 4 jobs arriving
• 3 x 1 core & 8GB of memory (“blue”)
• 1 x 1 core & 32GB of memory (“green”)
Challenges in Modern Data Centers Management, Spring 2015
11. One job approach: # of dimensions
• Single-dimension
• Choose between memory or cores, and optimize for either
• Multiple dimensions
• Optimize for both memory and cores at the same time
• Can single-dimension heuristics be optimal?
• This is what we try answer next…
Challenges in Modern Data Centers Management, Spring 2015
12. Real-world example
• Paper by Ohad Shai, Edi Shmueli, and Dror G. Feitelson, on “Heuristics
for Resource Matching in Intel's Compute Farm”
• Presented in Job Scheduling Strategies for Parallel Processing (JSSPP), 2013
• Used traces from 4 large Intel sites (pools)
• Each trace contains more than month of activity
• Each trace contains 10 – 13 million jobs
Challenges in Modern Data Centers Management, Spring 2015
13. Resource requirements by jobs
• Most jobs require 1 core
Challenges in Modern Data Centers Management, Spring 2015
14. Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
Challenges in Modern Data Centers Management, Spring 2015
15. Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
• But still, there are bursts of higher demand
• Buckets of 1000 jobs
• Ordered by arrival
Challenges in Modern Data Centers Management, Spring 2015
16. Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
• But still, there are bursts of higher demand
• Buckets of 1000 jobs
• Ordered by arrival
Challenges in Modern Data Centers Management, Spring 2015
17. Which (single-dimension) heuristic is best?
• Approach
• Divide the jobs into buckets of 1000 jobs each
• Two heuristics X two dimensions (4 combinations)
• Heuristics: Best Fit, Worst Fit
• Dimensions: cores, memory
• Run each combination on all jobs in the bucket
• Combination that matched the highest # of jobs wins
• Gets 1 point
Challenges in Modern Data Centers Management, Spring 2015
19. Dealing with multiple dimensions: Mix-Fit
• As seen before, no single-dimension heuristic is optimal when considering
one job-at-a-time
• Mix-Fit
• Attempt to “Best-Fit” on both dimensions
Challenges in Modern Data Centers Management, Spring 2015
20. Mix-Fit: Results
• Same bucket experiment
• Yet, experiment shows “Mix-Fit” is not 100% either
Challenges in Modern Data Centers Management, Spring 2015
21. Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
22. Considering multiple jobs (look-ahead)
• Look deeper into the queue and try to assemble the optimal schedule
• Matching between multiple jobs and multiple servers
• Two types
1. “Sophisticated”: e.g., dynamic programming
• Backfilling with look-ahead to optimize the packing of parallel jobs, by Edi Shmueli , Dror G.
Feitelson, 2005
2. Meta-heuristic (heuristic of heuristics)
• Heuristics for Resource Matching in Intel's Compute Farm, by Ohad Shai, Edi Shmueli, and Dror G.
Feitelson, JSSPP, 2013
Challenges in Modern Data Centers Management, Spring 2015
23. Max-jobs (meta-heuristic)
• Run each of the heuristics (best-fit cores/memory, worst fit cores/memory)
on the list of waiting jobs
• Without actually starting the jobs
• Count the # of jobs matched by each heuristic
• Select the heuristic that matched the highest number of jobs
• Possible target functions
• Max # of matched jobs (==max-jobs)
• Max # of utilized cores
• Max amount of utilized memory
• Etc.
Challenges in Modern Data Centers Management, Spring 2015
24. Max-jobs: results
• Up to 22% reduction in wait time for jobs
Challenges in Modern Data Centers Management, Spring 2015
25. Max-jobs: results
• Up to 22% reduction in number of waiting jobs
Challenges in Modern Data Centers Management, Spring 2015
26. Resource Matching Challenge: Summary
Challenges in Modern Data Centers Management, Spring 2015
Single-dimension Multiple-dimensions
Single-job at a time
1. Best fit Memory
2. Best fit Cores
3. Worst fit Memory
4. Worst fit Cores
1. Mix-fit
Heuristic Near-optimal
Multiple jobs (Look-
ahead)
1. Max-jobs (meta-heuristics)
1. LOS (Dynamic programing)
(Shmueli, Feitelson, 2005)
27. Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
28. Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
• E.g., if a job is “large”, and the jobs already running on the servers do
not leave enough space to accommodate the job
Challenges in Modern Data Centers Management, Spring 2015
29. First-Come-First-Served (FCFS)
• Traverse the queue in a FIFO order
• Recall the jobs have been ordered in the job scheduling step (proportional share…)
• If a job does not “fit” any of the servers – stop
• Do not attempt to schedule further jobs
• Pros
• Simple
• Intuitively fair (jobs do not bypass jobs that arrived earlier)
• Cons
• Poor utilization (up to 30% waste reported when scheduling parallel jobs)
Challenges in Modern Data Centers Management, Spring 2015
30. Improving utilization: Skipping to the next job(s)
• Idea: skip “problematic” job(s) and continue matching the rest
• Great mean to improve utilization, but…
• Introduces starvation problem
• Jobs may get ‘stuck’ in the scheduler’s queue since as they never get the resources
they need in order to execute (they always bypassed by later jobs)
Challenges in Modern Data Centers Management, Spring 2015
Running Job
Running Job
Running Job
Empty Core
8 GB req
32 GB
Memory
8 GB
8 GB
8 GB
8 GB
32GB
req
Server
8 GB req 8 GB req
Wait Queue
Will be scheduled only
if there will be 32 GB
available
Will be
scheduled next
8 GB req
1st job2nd job3nd job4th job
This job will be most
likely starved
31. Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
1. FCFS (intuitively fair, poor utilization)
2. Skip problematic jobs (unfair, good utilization)
• Can we combine the best of both worlds?
Challenges in Modern Data Centers Management, Spring 2015
32. Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
1. FCFS (intuitively fair, poor utilization)
2. Skip problematic jobs (unfair, good utilization)
• Can we combine the best of both worlds?
Challenges in Modern Data Centers Management, Spring 2015
33. What are reservations?
• Technique used to keep fairness while improving utilization
1. The scheduler ‘marks’ certain resources ‘unavailable’, excluding them from being
used by other waiting jobs
2. The scheduler ‘remembers’ the job(s) for which these resources have been
reserved
3. When enough resources accumulate the scheduler launches the job(s) on the
reserved resources
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime
estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
Reservation
34. Reservations: Two flavors
1. Jobs runtimes are known in advance (information-based)
• Usually the runtimes are estimated (not accurate)
• Predicting runtime is complex…
2. Jobs runtimes are unknown (information-less)
• Very common (practical) scenario especially when serving dynamic usage
models with low visibility to the system
• Jobs that use random seed that affects the runtime, or variable that affects the runtime
and not visible to the system
Challenges in Modern Data Centers Management, Spring 2015
35. Information-based (runtime known/predicted)
• We’ll describe each job as a rectangle
• Horizontal axis describes the job runtime & vertical axis describes its resource
consumption (cores/memory/ …)
• Let’s look at the jobs’ representation in a server:
Challenges in Modern Data Centers Management, Spring 2015
Job
Run Time
Resource
Consumption
(processors)
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only
36. Backfilling
• Moving small jobs from the back of the queue to fill “holes” in the schedule
to improve utilization
• Using reservations to ensure that the jobs that have been bypassed
(skipped over) will not starve
• Jobs runtime must be known in advance
• Estimated or predicted
Challenges in Modern Data Centers Management, Spring 2015
queued job1st job
2nd job
3rd job
4th job 5th job
Reservation
6th job
6th job
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only
37. Backfilling flavors
1. Conservative backfilling
2. EASY backfilling
3. Selective backfilling
Challenges in Modern Data Centers Management, Spring 2015
38. Before we begin…. performance metrics
1. Wait time
• Time that the job waits in the scheduler’s queue until it starts executing (running)
2. Response time
• Total time that the job spent in the system (wait + runtime)
3. Slowdown
• Response time divided by actual runtime
4. Utilization
• The % of used resources in the pool, at a given moment
Challenges in Modern Data Centers Management, Spring 2015
39. Conservative backfilling
• The scheduler provides reservation for every job at arrival time
• Newly arriving jobs can move ahead if they don’t violate any previous reservation
• Pros
• For every job we know exactly when it will start executing
• Limits the slowdown of jobs that would otherwise have difficulty backfilling e.g.,
high resource consumers (jobs that need many cores)
• Cons
• Reduces backfilling opportunities due to blocking effect of the reservations
• e.g., long-running jobs with low resource requirements
Challenges in Modern Data Centers Management, Spring 2015
40. Conservative backfilling: Example
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon
“Utiliation, Predectability, Workloads and User
Runtime estimates in scheduling the IBM SP2 with
Backfilling”, used for educational purposes only
41. EASY (aggressive) backfilling
• Only the first job in the queue gets a reservation
• Jobs may move ahead as long as they do not violate the first jobs’ reservation
• Pros
• Provides much more opportunities for backfilling compared to (better utilization)
• Cons
• We can only tell when the first job in the queue will start
• Jobs that inherently have difficulty backfilling may suffer relatively to conservative
backfilling, since they will get reservation only when they reach the head of the
queue
• E.g., high resource consumers (jobs that need many cores)
Challenges in Modern Data Centers Management, Spring 2015
42. EASY backfilling: Example
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling the
IBM SP2 with Backfilling”, used for educational purposes only
43. Selective backfilling
• Jobs get reservation only when their expected slowdown exceeds threshold
• If the threshold is chosen judiciously few jobs should have reservation at any time,
but the most needy of jobs are assured of getting reservation
• Pros
• Provides much more backfilling opportunities relative to Conservative (good for long
narrow jobs)
• Reduces slowdown for short resource-consuming jobs (short wide) relatively to EASY
• Cons
• More complicated e.g., how to choose the optimal threshold?
Challenges in Modern Data Centers Management, Spring 2015
44. Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
46. Information-less: fixed reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there until enough resources are accumulated to satisfy its
requirements – then it starts executing
• Pros
• Simple
• Cons
• Waits can be significant (bad luck scenario)
Challenges in Modern Data Centers Management, Spring 2015
47. Information-less: floating reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there for a limited duration, e.g., until timeout expires – it
then may be relocated to a different server
• Pros
• Reduces risk of long waits (compared to fixed reservation)
• Cons
• Theoreticaly does not preclude starvation
Challenges in Modern Data Centers Management, Spring 2015
Practices from production environment:
• In large systems the potential for starvation for floating reservation is very low since at any given moment many
jobs finish (or about to finish)
• In production environment fixed reservation might cause jobs with “bad lack” to wait a significant amount of time.
“bad luck” might be caused by a server does not free its resources, e.g. runaway jobs
48. Reservations: Summary
Type Fixed Dynamic
Information-based
(runtimes known)
• Conservative backfilling
• EASY backfilling
• Selective backfilling
Information-less
(runtimes unknown)
• Static reservations • Floating reservations
Challenges in Modern Data Centers Management, Spring 2015
Practices from production environment:
• In real world, predicting the runtimes of jobs is a difficult problem
49. Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
50. Next lecture: RM part III
• Managing multiple data centers (meta-scheduling)
Challenges in Modern Data Centers Management, Spring 2015