5 - RM-Part-II

Resource Management Part II
Ohad Shai, Spring 2015
Challenges in Modern Data Centers Management, Spring 2015

Information provided in these slides is
for educational purposes only

Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)

Reminder: two actions by the scheduler
1. Job scheduling – previous lecture
• Selecting the next job(s) to execute
2. Resource matching (allocation) – this lecture
• Match the job(s) with available resources
• Both are interdependent
• One affects the other
• Both must be done fast
• Hundreds of jobs per-second in large scale installations

Resource matching challenge
• Each server comes with resource capacity
• Cores, memory, local disk, etc.
• E.g., 16 cores and 128GB of memory
• Part of the capacity might already be used by running jobs
• Each job has resource requirements
• Memory and core requirements, disk space to use, etc.
• E.g., 2-cores X 4GB of memory
• Additional constraints
• E.g., OS, CPU architecture, accelerator (GPU), etc.

Resource matching challenge cont.
• Match the stream of incoming jobs with available resources on the servers
• Jobs have already been ordered by the job scheduling step (proportional share…)
• Goal now is to minimize fragmentation (maximize utilization)
• Big numbers
• Thousands of servers
• Tens-of-thousands waiting (and running) jobs
• NP-complete problem
• Bin-packing optimization
• Needs to be done extremely fast
• Default to using heuristics
• Often perform close to optimal

Two approaches
1. One job at-a-time
• Find “best” matching for first waiting job
• Find “best” matching for second waiting job
• etc.
• Once found – start executing the job immediately
2. Considering multiple jobs (look-ahead)
• Calculate a “match” between several jobs in the queue and available resources
• Start executing all jobs on the selected severs together

One job at-a-time: Common heuristics
• Random
• Randomly pick server (with enough available resources) and assign it to the job
• First-fit
• Sort the servers by some constant algorithm and assign the first that fits
• Best-fit
• Packs the jobs on servers at the cost of unbalanced resource usage
• Useful if anticipating large jobs
• Worst-fit
• Maintains balanced resource usage across the servers
• Great for workloads that are mostly homogeneous

Example 1: Worst-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 8 jobs arriving
• 2 x 1 core & 16GB of memory (“blue”)
• 6 x 1 core & 4GB of memory (“green”)

Example 2: Best-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 4 jobs arriving
• 3 x 1 core & 8GB of memory (“blue”)
• 1 x 1 core & 32GB of memory (“green”)

One job approach: # of dimensions
• Single-dimension
• Choose between memory or cores, and optimize for either
• Multiple dimensions
• Optimize for both memory and cores at the same time
• Can single-dimension heuristics be optimal?
• This is what we try answer next…

Real-world example
• Paper by Ohad Shai, Edi Shmueli, and Dror G. Feitelson, on “Heuristics
for Resource Matching in Intel's Compute Farm”
• Presented in Job Scheduling Strategies for Parallel Processing (JSSPP), 2013
• Used traces from 4 large Intel sites (pools)
• Each trace contains more than month of activity
• Each trace contains 10 – 13 million jobs

Resource requirements by jobs
• Most jobs require 1 core

• Most jobs require less than 5 GB memory

• Most jobs require less than 5 GB memory
• But still, there are bursts of higher demand
• Buckets of 1000 jobs
• Ordered by arrival

Which (single-dimension) heuristic is best?
• Approach
• Divide the jobs into buckets of 1000 jobs each
• Two heuristics X two dimensions (4 combinations)
• Heuristics: Best Fit, Worst Fit
• Dimensions: cores, memory
• Run each combination on all jobs in the bucket
• Combination that matched the highest # of jobs wins
• Gets 1 point

Results
Conclusion: no single-dimension heuristic is optimal in all cases

Dealing with multiple dimensions: Mix-Fit
• As seen before, no single-dimension heuristic is optimal when considering
one job-at-a-time
• Mix-Fit
• Attempt to “Best-Fit” on both dimensions

Mix-Fit: Results
• Same bucket experiment
• Yet, experiment shows “Mix-Fit” is not 100% either

Check point
• Resource matching challenge 
• One job at-a-time 
• Single and multiple dimensions 
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Reservations

Considering multiple jobs (look-ahead)
• Look deeper into the queue and try to assemble the optimal schedule
• Matching between multiple jobs and multiple servers
• Two types
1. “Sophisticated”: e.g., dynamic programming
• Backfilling with look-ahead to optimize the packing of parallel jobs, by Edi Shmueli , Dror G.
Feitelson, 2005
2. Meta-heuristic (heuristic of heuristics)
• Heuristics for Resource Matching in Intel's Compute Farm, by Ohad Shai, Edi Shmueli, and Dror G.
Feitelson, JSSPP, 2013

Max-jobs (meta-heuristic)
• Run each of the heuristics (best-fit cores/memory, worst fit cores/memory)
on the list of waiting jobs
• Without actually starting the jobs
• Count the # of jobs matched by each heuristic
• Select the heuristic that matched the highest number of jobs
• Possible target functions
• Max # of matched jobs (==max-jobs)
• Max # of utilized cores
• Max amount of utilized memory
• Etc.

Max-jobs: results
• Up to 22% reduction in wait time for jobs

Max-jobs: results
• Up to 22% reduction in number of waiting jobs

Resource Matching Challenge: Summary
Single-dimension Multiple-dimensions
Single-job at a time
1. Best fit Memory
2. Best fit Cores
3. Worst fit Memory
4. Worst fit Cores
1. Mix-fit
Heuristic Near-optimal
Multiple jobs (Look-
ahead)
1. Max-jobs (meta-heuristics)
1. LOS (Dynamic programing)
(Shmueli, Feitelson, 2005)

Check point
• Considering multiple jobs (look-ahead) 
• Max jobs and dynamic programming 
• Reservations

Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
• E.g., if a job is “large”, and the jobs already running on the servers do
not leave enough space to accommodate the job

First-Come-First-Served (FCFS)
• Traverse the queue in a FIFO order
• Recall the jobs have been ordered in the job scheduling step (proportional share…)
• If a job does not “fit” any of the servers – stop
• Do not attempt to schedule further jobs
• Pros 
• Simple
• Intuitively fair (jobs do not bypass jobs that arrived earlier)
• Cons 
• Poor utilization (up to 30% waste reported when scheduling parallel jobs)

Improving utilization: Skipping to the next job(s)
• Idea: skip “problematic” job(s) and continue matching the rest
• Great mean to improve utilization, but…
• Introduces starvation problem
• Jobs may get ‘stuck’ in the scheduler’s queue since as they never get the resources
they need in order to execute (they always bypassed by later jobs)
Running Job
Running Job
Running Job
Empty Core
8 GB req
32 GB
Memory
8 GB
8 GB
8 GB
8 GB
32GB
req
Server
8 GB req 8 GB req
Wait Queue
Will be scheduled only
if there will be 32 GB
available
Will be
scheduled next
8 GB req
1st job2nd job3nd job4th job
This job will be most
likely starved

Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
1. FCFS (intuitively fair, poor utilization)
2. Skip problematic jobs (unfair, good utilization)
• Can we combine the best of both worlds?

What are reservations?
• Technique used to keep fairness while improving utilization
1. The scheduler ‘marks’ certain resources ‘unavailable’, excluding them from being
used by other waiting jobs
2. The scheduler ‘remembers’ the job(s) for which these resources have been
reserved
3. When enough resources accumulate the scheduler launches the job(s) on the
reserved resources
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime
estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
Reservation

Reservations: Two flavors
1. Jobs runtimes are known in advance (information-based)
• Usually the runtimes are estimated (not accurate)
• Predicting runtime is complex…
2. Jobs runtimes are unknown (information-less)
• Very common (practical) scenario especially when serving dynamic usage
models with low visibility to the system
• Jobs that use random seed that affects the runtime, or variable that affects the runtime
and not visible to the system

Information-based (runtime known/predicted)
• We’ll describe each job as a rectangle
• Horizontal axis describes the job runtime & vertical axis describes its resource
consumption (cores/memory/ …)
• Let’s look at the jobs’ representation in a server:
Job
Run Time
Resource
Consumption
(processors)
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only

Backfilling
• Moving small jobs from the back of the queue to fill “holes” in the schedule
to improve utilization
• Using reservations to ensure that the jobs that have been bypassed
(skipped over) will not starve
• Jobs runtime must be known in advance
• Estimated or predicted
queued job1st job
2nd job
3rd job
4th job 5th job
Reservation
6th job
6th job
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only

Backfilling flavors
1. Conservative backfilling
2. EASY backfilling
3. Selective backfilling

Before we begin…. performance metrics
1. Wait time
• Time that the job waits in the scheduler’s queue until it starts executing (running)
2. Response time
• Total time that the job spent in the system (wait + runtime)
3. Slowdown
• Response time divided by actual runtime
4. Utilization
• The % of used resources in the pool, at a given moment

Conservative backfilling
• The scheduler provides reservation for every job at arrival time
• Newly arriving jobs can move ahead if they don’t violate any previous reservation
• Pros 
• For every job we know exactly when it will start executing
• Limits the slowdown of jobs that would otherwise have difficulty backfilling e.g.,
high resource consumers (jobs that need many cores)
• Cons 
• Reduces backfilling opportunities due to blocking effect of the reservations
• e.g., long-running jobs with low resource requirements

Conservative backfilling: Example
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon
“Utiliation, Predectability, Workloads and User
Runtime estimates in scheduling the IBM SP2 with
Backfilling”, used for educational purposes only

EASY (aggressive) backfilling
• Only the first job in the queue gets a reservation
• Jobs may move ahead as long as they do not violate the first jobs’ reservation
• Pros 
• Provides much more opportunities for backfilling compared to (better utilization)
• Cons 
• We can only tell when the first job in the queue will start
• Jobs that inherently have difficulty backfilling may suffer relatively to conservative
backfilling, since they will get reservation only when they reach the head of the
queue
• E.g., high resource consumers (jobs that need many cores)

EASY backfilling: Example
Predectability, Workloads and User Runtime estimates in scheduling the
IBM SP2 with Backfilling”, used for educational purposes only

Selective backfilling
• Jobs get reservation only when their expected slowdown exceeds threshold
• If the threshold is chosen judiciously few jobs should have reservation at any time,
but the most needy of jobs are assured of getting reservation
• Pros 
• Provides much more backfilling opportunities relative to Conservative (good for long
narrow jobs)
• Reduces slowdown for short resource-consuming jobs (short wide) relatively to EASY
• Cons 
• More complicated e.g., how to choose the optimal threshold?

Check point
• Handling jobs that cannot be scheduled challenge 
• First-come-first-served (FCFS) 
• Improving utilization while avoiding starvation 
• Reservations 
• Information based (easy and conservative backfilling) 

Information-less (runtimes unknown)
1. Fixed reservation
2. Floating reservation

Information-less: fixed reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there until enough resources are accumulated to satisfy its
requirements – then it starts executing
• Pros 
• Simple
• Cons 
• Waits can be significant (bad luck scenario)

Information-less: floating reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there for a limited duration, e.g., until timeout expires – it
then may be relocated to a different server
• Pros 
• Reduces risk of long waits (compared to fixed reservation)
• Cons 
• Theoreticaly does not preclude starvation
Practices from production environment:
• In large systems the potential for starvation for floating reservation is very low since at any given moment many
jobs finish (or about to finish)
• In production environment fixed reservation might cause jobs with “bad lack” to wait a significant amount of time.
“bad luck” might be caused by a server does not free its resources, e.g. runaway jobs

Reservations: Summary
Type Fixed Dynamic
Information-based
(runtimes known)
• Conservative backfilling
• EASY backfilling
• Selective backfilling
Information-less
(runtimes unknown)
• Static reservations • Floating reservations
Practices from production environment:
• In real world, predicting the runtimes of jobs is a difficult problem

Check point
• Handling jobs that cannot be scheduled challenge 
• First-come-first-served (FCFS) 
• Improving utilization while avoiding starvation 
• Reservations 
• Information based (easy and conservative backfilling) 
• Information less (fixed and floating) 

Next lecture: RM part III
• Managing multiple data centers (meta-scheduling)

5 - RM-Part-II

Recommended

Recommended

More Related Content

Similar to 5 - RM-Part-II

Similar to 5 - RM-Part-II (20)

5 - RM-Part-II