SlideShare a Scribd company logo
1 of 50
Download to read offline
Resource Management Part II
Ohad Shai, Spring 2015
Challenges in Modern Data Centers Management, Spring 2015
Information provided in these slides is
for educational purposes only
Challenges in Modern Data Centers Management, Spring 2015
Check point
• Resource matching challenge
• One job at-a-time
• Single and multiple dimensions
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
Reminder: two actions by the scheduler
1. Job scheduling – previous lecture
• Selecting the next job(s) to execute
2. Resource matching (allocation) – this lecture
• Match the job(s) with available resources
• Both are interdependent
• One affects the other
• Both must be done fast
• Hundreds of jobs per-second in large scale installations
Challenges in Modern Data Centers Management, Spring 2015
Resource matching challenge
• Each server comes with resource capacity
• Cores, memory, local disk, etc.
• E.g., 16 cores and 128GB of memory
• Part of the capacity might already be used by running jobs
• Each job has resource requirements
• Memory and core requirements, disk space to use, etc.
• E.g., 2-cores X 4GB of memory
• Additional constraints
• E.g., OS, CPU architecture, accelerator (GPU), etc.
Challenges in Modern Data Centers Management, Spring 2015
Resource matching challenge cont.
• Match the stream of incoming jobs with available resources on the servers
• Jobs have already been ordered by the job scheduling step (proportional share…)
• Goal now is to minimize fragmentation (maximize utilization)
• Big numbers
• Thousands of servers
• Tens-of-thousands waiting (and running) jobs
• NP-complete problem
• Bin-packing optimization
• Needs to be done extremely fast
• Default to using heuristics
• Often perform close to optimal
Challenges in Modern Data Centers Management, Spring 2015
Two approaches
1. One job at-a-time
• Find “best” matching for first waiting job
• Find “best” matching for second waiting job
• etc.
• Once found – start executing the job immediately
2. Considering multiple jobs (look-ahead)
• Calculate a “match” between several jobs in the queue and available resources
• Start executing all jobs on the selected severs together
Challenges in Modern Data Centers Management, Spring 2015
One job at-a-time: Common heuristics
• Random
• Randomly pick server (with enough available resources) and assign it to the job
• First-fit
• Sort the servers by some constant algorithm and assign the first that fits
• Best-fit
• Packs the jobs on servers at the cost of unbalanced resource usage
• Useful if anticipating large jobs
• Worst-fit
• Maintains balanced resource usage across the servers
• Great for workloads that are mostly homogeneous
Challenges in Modern Data Centers Management, Spring 2015
Example 1: Worst-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 8 jobs arriving
• 2 x 1 core & 16GB of memory (“blue”)
• 6 x 1 core & 4GB of memory (“green”)
Challenges in Modern Data Centers Management, Spring 2015
Example 2: Best-fit is better
• 2 machines A and B
• Each with 4 cores and 32GB of memory
• 4 jobs arriving
• 3 x 1 core & 8GB of memory (“blue”)
• 1 x 1 core & 32GB of memory (“green”)
Challenges in Modern Data Centers Management, Spring 2015
One job approach: # of dimensions
• Single-dimension
• Choose between memory or cores, and optimize for either
• Multiple dimensions
• Optimize for both memory and cores at the same time
• Can single-dimension heuristics be optimal?
• This is what we try answer next…
Challenges in Modern Data Centers Management, Spring 2015
Real-world example
• Paper by Ohad Shai, Edi Shmueli, and Dror G. Feitelson, on “Heuristics
for Resource Matching in Intel's Compute Farm”
• Presented in Job Scheduling Strategies for Parallel Processing (JSSPP), 2013
• Used traces from 4 large Intel sites (pools)
• Each trace contains more than month of activity
• Each trace contains 10 – 13 million jobs
Challenges in Modern Data Centers Management, Spring 2015
Resource requirements by jobs
• Most jobs require 1 core
Challenges in Modern Data Centers Management, Spring 2015
Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
Challenges in Modern Data Centers Management, Spring 2015
Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
• But still, there are bursts of higher demand
• Buckets of 1000 jobs
• Ordered by arrival
Challenges in Modern Data Centers Management, Spring 2015
Resource requirements by jobs
• Most jobs require 1 core
• Most jobs require less than 5 GB memory
• But still, there are bursts of higher demand
• Buckets of 1000 jobs
• Ordered by arrival
Challenges in Modern Data Centers Management, Spring 2015
Which (single-dimension) heuristic is best?
• Approach
• Divide the jobs into buckets of 1000 jobs each
• Two heuristics X two dimensions (4 combinations)
• Heuristics: Best Fit, Worst Fit
• Dimensions: cores, memory
• Run each combination on all jobs in the bucket
• Combination that matched the highest # of jobs wins
• Gets 1 point
Challenges in Modern Data Centers Management, Spring 2015
Results
Conclusion: no single-dimension heuristic is optimal in all cases
Challenges in Modern Data Centers Management, Spring 2015
Dealing with multiple dimensions: Mix-Fit
• As seen before, no single-dimension heuristic is optimal when considering
one job-at-a-time
• Mix-Fit
• Attempt to “Best-Fit” on both dimensions
Challenges in Modern Data Centers Management, Spring 2015
Mix-Fit: Results
• Same bucket experiment
• Yet, experiment shows “Mix-Fit” is not 100% either
Challenges in Modern Data Centers Management, Spring 2015
Check point
• Resource matching challenge 
• One job at-a-time 
• Single and multiple dimensions 
• Considering multiple jobs (look-ahead)
• Max jobs and dynamic programming
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
Considering multiple jobs (look-ahead)
• Look deeper into the queue and try to assemble the optimal schedule
• Matching between multiple jobs and multiple servers
• Two types
1. “Sophisticated”: e.g., dynamic programming
• Backfilling with look-ahead to optimize the packing of parallel jobs, by Edi Shmueli , Dror G.
Feitelson, 2005
2. Meta-heuristic (heuristic of heuristics)
• Heuristics for Resource Matching in Intel's Compute Farm, by Ohad Shai, Edi Shmueli, and Dror G.
Feitelson, JSSPP, 2013
Challenges in Modern Data Centers Management, Spring 2015
Max-jobs (meta-heuristic)
• Run each of the heuristics (best-fit cores/memory, worst fit cores/memory)
on the list of waiting jobs
• Without actually starting the jobs
• Count the # of jobs matched by each heuristic
• Select the heuristic that matched the highest number of jobs
• Possible target functions
• Max # of matched jobs (==max-jobs)
• Max # of utilized cores
• Max amount of utilized memory
• Etc.
Challenges in Modern Data Centers Management, Spring 2015
Max-jobs: results
• Up to 22% reduction in wait time for jobs
Challenges in Modern Data Centers Management, Spring 2015
Max-jobs: results
• Up to 22% reduction in number of waiting jobs
Challenges in Modern Data Centers Management, Spring 2015
Resource Matching Challenge: Summary
Challenges in Modern Data Centers Management, Spring 2015
Single-dimension Multiple-dimensions
Single-job at a time
1. Best fit Memory
2. Best fit Cores
3. Worst fit Memory
4. Worst fit Cores
1. Mix-fit
Heuristic Near-optimal
Multiple jobs (Look-
ahead)
1. Max-jobs (meta-heuristics)
1. LOS (Dynamic programing)
(Shmueli, Feitelson, 2005)
Check point
• Resource matching challenge 
• One job at-a-time 
• Single and multiple dimensions 
• Considering multiple jobs (look-ahead) 
• Max jobs and dynamic programming 
• Handling jobs that cannot be scheduled challenge
• First-come-first-served (FCFS)
• Improving utilization while avoiding starvation
• Reservations
• Information based (easy and conservative backfilling)
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
• E.g., if a job is “large”, and the jobs already running on the servers do
not leave enough space to accommodate the job
Challenges in Modern Data Centers Management, Spring 2015
First-Come-First-Served (FCFS)
• Traverse the queue in a FIFO order
• Recall the jobs have been ordered in the job scheduling step (proportional share…)
• If a job does not “fit” any of the servers – stop
• Do not attempt to schedule further jobs
• Pros 
• Simple
• Intuitively fair (jobs do not bypass jobs that arrived earlier)
• Cons 
• Poor utilization (up to 30% waste reported when scheduling parallel jobs)
Challenges in Modern Data Centers Management, Spring 2015
Improving utilization: Skipping to the next job(s)
• Idea: skip “problematic” job(s) and continue matching the rest
• Great mean to improve utilization, but…
• Introduces starvation problem
• Jobs may get ‘stuck’ in the scheduler’s queue since as they never get the resources
they need in order to execute (they always bypassed by later jobs)
Challenges in Modern Data Centers Management, Spring 2015
Running Job
Running Job
Running Job
Empty Core
8 GB req
32 GB
Memory
8 GB
8 GB
8 GB
8 GB
32GB
req
Server
8 GB req 8 GB req
Wait Queue
Will be scheduled only
if there will be 32 GB
available
Will be
scheduled next
8 GB req
1st job2nd job3nd job4th job
This job will be most
likely starved
Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
1. FCFS (intuitively fair, poor utilization)
2. Skip problematic jobs (unfair, good utilization)
• Can we combine the best of both worlds?
Challenges in Modern Data Centers Management, Spring 2015
Handling jobs that cannot be scheduled challenge
• So far we covered the challenge of matching jobs with available
resources on the servers
• We implicitly assumed we can always find resources
• What if there are not enough resources for the jobs?
1. FCFS (intuitively fair, poor utilization)
2. Skip problematic jobs (unfair, good utilization)
• Can we combine the best of both worlds?
Challenges in Modern Data Centers Management, Spring 2015
What are reservations?
• Technique used to keep fairness while improving utilization
1. The scheduler ‘marks’ certain resources ‘unavailable’, excluding them from being
used by other waiting jobs
2. The scheduler ‘remembers’ the job(s) for which these resources have been
reserved
3. When enough resources accumulate the scheduler launches the job(s) on the
reserved resources
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime
estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
Reservation
Reservations: Two flavors
1. Jobs runtimes are known in advance (information-based)
• Usually the runtimes are estimated (not accurate)
• Predicting runtime is complex…
2. Jobs runtimes are unknown (information-less)
• Very common (practical) scenario especially when serving dynamic usage
models with low visibility to the system
• Jobs that use random seed that affects the runtime, or variable that affects the runtime
and not visible to the system
Challenges in Modern Data Centers Management, Spring 2015
Information-based (runtime known/predicted)
• We’ll describe each job as a rectangle
• Horizontal axis describes the job runtime & vertical axis describes its resource
consumption (cores/memory/ …)
• Let’s look at the jobs’ representation in a server:
Challenges in Modern Data Centers Management, Spring 2015
Job
Run Time
Resource
Consumption
(processors)
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only
Backfilling
• Moving small jobs from the back of the queue to fill “holes” in the schedule
to improve utilization
• Using reservations to ensure that the jobs that have been bypassed
(skipped over) will not starve
• Jobs runtime must be known in advance
• Estimated or predicted
Challenges in Modern Data Centers Management, Spring 2015
queued job1st job
2nd job
3rd job
4th job 5th job
Reservation
6th job
6th job
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling
the IBM SP2 with Backfilling”, used for educational purposes only
Backfilling flavors
1. Conservative backfilling
2. EASY backfilling
3. Selective backfilling
Challenges in Modern Data Centers Management, Spring 2015
Before we begin…. performance metrics
1. Wait time
• Time that the job waits in the scheduler’s queue until it starts executing (running)
2. Response time
• Total time that the job spent in the system (wait + runtime)
3. Slowdown
• Response time divided by actual runtime
4. Utilization
• The % of used resources in the pool, at a given moment
Challenges in Modern Data Centers Management, Spring 2015
Conservative backfilling
• The scheduler provides reservation for every job at arrival time
• Newly arriving jobs can move ahead if they don’t violate any previous reservation
• Pros 
• For every job we know exactly when it will start executing
• Limits the slowdown of jobs that would otherwise have difficulty backfilling e.g.,
high resource consumers (jobs that need many cores)
• Cons 
• Reduces backfilling opportunities due to blocking effect of the reservations
• e.g., long-running jobs with low resource requirements
Challenges in Modern Data Centers Management, Spring 2015
Conservative backfilling: Example
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon
“Utiliation, Predectability, Workloads and User
Runtime estimates in scheduling the IBM SP2 with
Backfilling”, used for educational purposes only
EASY (aggressive) backfilling
• Only the first job in the queue gets a reservation
• Jobs may move ahead as long as they do not violate the first jobs’ reservation
• Pros 
• Provides much more opportunities for backfilling compared to (better utilization)
• Cons 
• We can only tell when the first job in the queue will start
• Jobs that inherently have difficulty backfilling may suffer relatively to conservative
backfilling, since they will get reservation only when they reach the head of the
queue
• E.g., high resource consumers (jobs that need many cores)
Challenges in Modern Data Centers Management, Spring 2015
EASY backfilling: Example
Challenges in Modern Data Centers Management, Spring 2015
Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation,
Predectability, Workloads and User Runtime estimates in scheduling the
IBM SP2 with Backfilling”, used for educational purposes only
Selective backfilling
• Jobs get reservation only when their expected slowdown exceeds threshold
• If the threshold is chosen judiciously few jobs should have reservation at any time,
but the most needy of jobs are assured of getting reservation
• Pros 
• Provides much more backfilling opportunities relative to Conservative (good for long
narrow jobs)
• Reduces slowdown for short resource-consuming jobs (short wide) relatively to EASY
• Cons 
• More complicated e.g., how to choose the optimal threshold?
Challenges in Modern Data Centers Management, Spring 2015
Check point
• Resource matching challenge 
• One job at-a-time 
• Single and multiple dimensions 
• Considering multiple jobs (look-ahead) 
• Max jobs and dynamic programming 
• Handling jobs that cannot be scheduled challenge 
• First-come-first-served (FCFS) 
• Improving utilization while avoiding starvation 
• Reservations 
• Information based (easy and conservative backfilling) 
• Information less (fixed and floating)
Challenges in Modern Data Centers Management, Spring 2015
Information-less (runtimes unknown)
1. Fixed reservation
2. Floating reservation
Challenges in Modern Data Centers Management, Spring 2015
Information-less: fixed reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there until enough resources are accumulated to satisfy its
requirements – then it starts executing
• Pros 
• Simple
• Cons 
• Waits can be significant (bad luck scenario)
Challenges in Modern Data Centers Management, Spring 2015
Information-less: floating reservation
• The scheduler performs reservation for the job on a specific server
• The job ‘sticks’ there for a limited duration, e.g., until timeout expires – it
then may be relocated to a different server
• Pros 
• Reduces risk of long waits (compared to fixed reservation)
• Cons 
• Theoreticaly does not preclude starvation
Challenges in Modern Data Centers Management, Spring 2015
Practices from production environment:
• In large systems the potential for starvation for floating reservation is very low since at any given moment many
jobs finish (or about to finish)
• In production environment fixed reservation might cause jobs with “bad lack” to wait a significant amount of time.
“bad luck” might be caused by a server does not free its resources, e.g. runaway jobs
Reservations: Summary
Type Fixed Dynamic
Information-based
(runtimes known)
• Conservative backfilling
• EASY backfilling
• Selective backfilling
Information-less
(runtimes unknown)
• Static reservations • Floating reservations
Challenges in Modern Data Centers Management, Spring 2015
Practices from production environment:
• In real world, predicting the runtimes of jobs is a difficult problem
Check point
• Resource matching challenge 
• One job at-a-time 
• Single and multiple dimensions 
• Considering multiple jobs (look-ahead) 
• Max jobs and dynamic programming 
• Handling jobs that cannot be scheduled challenge 
• First-come-first-served (FCFS) 
• Improving utilization while avoiding starvation 
• Reservations 
• Information based (easy and conservative backfilling) 
• Information less (fixed and floating) 
Challenges in Modern Data Centers Management, Spring 2015
Next lecture: RM part III
• Managing multiple data centers (meta-scheduling)
Challenges in Modern Data Centers Management, Spring 2015

More Related Content

Similar to 5 - RM-Part-II

The final frontier
The final frontierThe final frontier
The final frontier
Terry Bunio
 
Sunny Min Intern Presentation (Secure)
Sunny Min Intern Presentation (Secure)Sunny Min Intern Presentation (Secure)
Sunny Min Intern Presentation (Secure)
Sunny Min
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
Nir Rubinstein
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Mr SMAK
 

Similar to 5 - RM-Part-II (20)

Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Enterprise Architecture for Small and Medium-Sized Enterprises: PhD Overview
Enterprise Architecture for Small and Medium-Sized Enterprises: PhD OverviewEnterprise Architecture for Small and Medium-Sized Enterprises: PhD Overview
Enterprise Architecture for Small and Medium-Sized Enterprises: PhD Overview
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
 
Real life forms to adf
Real life forms to adfReal life forms to adf
Real life forms to adf
 
Real life forms to adf
Real life forms to adfReal life forms to adf
Real life forms to adf
 
Big Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open ChallengesBig Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open Challenges
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing act
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Sunny Min Intern Presentation (Secure)
Sunny Min Intern Presentation (Secure)Sunny Min Intern Presentation (Secure)
Sunny Min Intern Presentation (Secure)
 
Wbs
WbsWbs
Wbs
 
Wbs, estimation and scheduling
Wbs, estimation and schedulingWbs, estimation and scheduling
Wbs, estimation and scheduling
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Vitriol
VitriolVitriol
Vitriol
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yoda
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 

5 - RM-Part-II

  • 1. Resource Management Part II Ohad Shai, Spring 2015 Challenges in Modern Data Centers Management, Spring 2015
  • 2. Information provided in these slides is for educational purposes only Challenges in Modern Data Centers Management, Spring 2015
  • 3. Check point • Resource matching challenge • One job at-a-time • Single and multiple dimensions • Considering multiple jobs (look-ahead) • Max jobs and dynamic programming • Handling jobs that cannot be scheduled challenge • First-come-first-served (FCFS) • Improving utilization while avoiding starvation • Reservations • Information based (easy and conservative backfilling) • Information less (fixed and floating) Challenges in Modern Data Centers Management, Spring 2015
  • 4. Reminder: two actions by the scheduler 1. Job scheduling – previous lecture • Selecting the next job(s) to execute 2. Resource matching (allocation) – this lecture • Match the job(s) with available resources • Both are interdependent • One affects the other • Both must be done fast • Hundreds of jobs per-second in large scale installations Challenges in Modern Data Centers Management, Spring 2015
  • 5. Resource matching challenge • Each server comes with resource capacity • Cores, memory, local disk, etc. • E.g., 16 cores and 128GB of memory • Part of the capacity might already be used by running jobs • Each job has resource requirements • Memory and core requirements, disk space to use, etc. • E.g., 2-cores X 4GB of memory • Additional constraints • E.g., OS, CPU architecture, accelerator (GPU), etc. Challenges in Modern Data Centers Management, Spring 2015
  • 6. Resource matching challenge cont. • Match the stream of incoming jobs with available resources on the servers • Jobs have already been ordered by the job scheduling step (proportional share…) • Goal now is to minimize fragmentation (maximize utilization) • Big numbers • Thousands of servers • Tens-of-thousands waiting (and running) jobs • NP-complete problem • Bin-packing optimization • Needs to be done extremely fast • Default to using heuristics • Often perform close to optimal Challenges in Modern Data Centers Management, Spring 2015
  • 7. Two approaches 1. One job at-a-time • Find “best” matching for first waiting job • Find “best” matching for second waiting job • etc. • Once found – start executing the job immediately 2. Considering multiple jobs (look-ahead) • Calculate a “match” between several jobs in the queue and available resources • Start executing all jobs on the selected severs together Challenges in Modern Data Centers Management, Spring 2015
  • 8. One job at-a-time: Common heuristics • Random • Randomly pick server (with enough available resources) and assign it to the job • First-fit • Sort the servers by some constant algorithm and assign the first that fits • Best-fit • Packs the jobs on servers at the cost of unbalanced resource usage • Useful if anticipating large jobs • Worst-fit • Maintains balanced resource usage across the servers • Great for workloads that are mostly homogeneous Challenges in Modern Data Centers Management, Spring 2015
  • 9. Example 1: Worst-fit is better • 2 machines A and B • Each with 4 cores and 32GB of memory • 8 jobs arriving • 2 x 1 core & 16GB of memory (“blue”) • 6 x 1 core & 4GB of memory (“green”) Challenges in Modern Data Centers Management, Spring 2015
  • 10. Example 2: Best-fit is better • 2 machines A and B • Each with 4 cores and 32GB of memory • 4 jobs arriving • 3 x 1 core & 8GB of memory (“blue”) • 1 x 1 core & 32GB of memory (“green”) Challenges in Modern Data Centers Management, Spring 2015
  • 11. One job approach: # of dimensions • Single-dimension • Choose between memory or cores, and optimize for either • Multiple dimensions • Optimize for both memory and cores at the same time • Can single-dimension heuristics be optimal? • This is what we try answer next… Challenges in Modern Data Centers Management, Spring 2015
  • 12. Real-world example • Paper by Ohad Shai, Edi Shmueli, and Dror G. Feitelson, on “Heuristics for Resource Matching in Intel's Compute Farm” • Presented in Job Scheduling Strategies for Parallel Processing (JSSPP), 2013 • Used traces from 4 large Intel sites (pools) • Each trace contains more than month of activity • Each trace contains 10 – 13 million jobs Challenges in Modern Data Centers Management, Spring 2015
  • 13. Resource requirements by jobs • Most jobs require 1 core Challenges in Modern Data Centers Management, Spring 2015
  • 14. Resource requirements by jobs • Most jobs require 1 core • Most jobs require less than 5 GB memory Challenges in Modern Data Centers Management, Spring 2015
  • 15. Resource requirements by jobs • Most jobs require 1 core • Most jobs require less than 5 GB memory • But still, there are bursts of higher demand • Buckets of 1000 jobs • Ordered by arrival Challenges in Modern Data Centers Management, Spring 2015
  • 16. Resource requirements by jobs • Most jobs require 1 core • Most jobs require less than 5 GB memory • But still, there are bursts of higher demand • Buckets of 1000 jobs • Ordered by arrival Challenges in Modern Data Centers Management, Spring 2015
  • 17. Which (single-dimension) heuristic is best? • Approach • Divide the jobs into buckets of 1000 jobs each • Two heuristics X two dimensions (4 combinations) • Heuristics: Best Fit, Worst Fit • Dimensions: cores, memory • Run each combination on all jobs in the bucket • Combination that matched the highest # of jobs wins • Gets 1 point Challenges in Modern Data Centers Management, Spring 2015
  • 18. Results Conclusion: no single-dimension heuristic is optimal in all cases Challenges in Modern Data Centers Management, Spring 2015
  • 19. Dealing with multiple dimensions: Mix-Fit • As seen before, no single-dimension heuristic is optimal when considering one job-at-a-time • Mix-Fit • Attempt to “Best-Fit” on both dimensions Challenges in Modern Data Centers Management, Spring 2015
  • 20. Mix-Fit: Results • Same bucket experiment • Yet, experiment shows “Mix-Fit” is not 100% either Challenges in Modern Data Centers Management, Spring 2015
  • 21. Check point • Resource matching challenge  • One job at-a-time  • Single and multiple dimensions  • Considering multiple jobs (look-ahead) • Max jobs and dynamic programming • Handling jobs that cannot be scheduled challenge • First-come-first-served (FCFS) • Improving utilization while avoiding starvation • Reservations • Information based (easy and conservative backfilling) • Information less (fixed and floating) Challenges in Modern Data Centers Management, Spring 2015
  • 22. Considering multiple jobs (look-ahead) • Look deeper into the queue and try to assemble the optimal schedule • Matching between multiple jobs and multiple servers • Two types 1. “Sophisticated”: e.g., dynamic programming • Backfilling with look-ahead to optimize the packing of parallel jobs, by Edi Shmueli , Dror G. Feitelson, 2005 2. Meta-heuristic (heuristic of heuristics) • Heuristics for Resource Matching in Intel's Compute Farm, by Ohad Shai, Edi Shmueli, and Dror G. Feitelson, JSSPP, 2013 Challenges in Modern Data Centers Management, Spring 2015
  • 23. Max-jobs (meta-heuristic) • Run each of the heuristics (best-fit cores/memory, worst fit cores/memory) on the list of waiting jobs • Without actually starting the jobs • Count the # of jobs matched by each heuristic • Select the heuristic that matched the highest number of jobs • Possible target functions • Max # of matched jobs (==max-jobs) • Max # of utilized cores • Max amount of utilized memory • Etc. Challenges in Modern Data Centers Management, Spring 2015
  • 24. Max-jobs: results • Up to 22% reduction in wait time for jobs Challenges in Modern Data Centers Management, Spring 2015
  • 25. Max-jobs: results • Up to 22% reduction in number of waiting jobs Challenges in Modern Data Centers Management, Spring 2015
  • 26. Resource Matching Challenge: Summary Challenges in Modern Data Centers Management, Spring 2015 Single-dimension Multiple-dimensions Single-job at a time 1. Best fit Memory 2. Best fit Cores 3. Worst fit Memory 4. Worst fit Cores 1. Mix-fit Heuristic Near-optimal Multiple jobs (Look- ahead) 1. Max-jobs (meta-heuristics) 1. LOS (Dynamic programing) (Shmueli, Feitelson, 2005)
  • 27. Check point • Resource matching challenge  • One job at-a-time  • Single and multiple dimensions  • Considering multiple jobs (look-ahead)  • Max jobs and dynamic programming  • Handling jobs that cannot be scheduled challenge • First-come-first-served (FCFS) • Improving utilization while avoiding starvation • Reservations • Information based (easy and conservative backfilling) • Information less (fixed and floating) Challenges in Modern Data Centers Management, Spring 2015
  • 28. Handling jobs that cannot be scheduled challenge • So far we covered the challenge of matching jobs with available resources on the servers • We implicitly assumed we can always find resources • What if there are not enough resources for the jobs? • E.g., if a job is “large”, and the jobs already running on the servers do not leave enough space to accommodate the job Challenges in Modern Data Centers Management, Spring 2015
  • 29. First-Come-First-Served (FCFS) • Traverse the queue in a FIFO order • Recall the jobs have been ordered in the job scheduling step (proportional share…) • If a job does not “fit” any of the servers – stop • Do not attempt to schedule further jobs • Pros  • Simple • Intuitively fair (jobs do not bypass jobs that arrived earlier) • Cons  • Poor utilization (up to 30% waste reported when scheduling parallel jobs) Challenges in Modern Data Centers Management, Spring 2015
  • 30. Improving utilization: Skipping to the next job(s) • Idea: skip “problematic” job(s) and continue matching the rest • Great mean to improve utilization, but… • Introduces starvation problem • Jobs may get ‘stuck’ in the scheduler’s queue since as they never get the resources they need in order to execute (they always bypassed by later jobs) Challenges in Modern Data Centers Management, Spring 2015 Running Job Running Job Running Job Empty Core 8 GB req 32 GB Memory 8 GB 8 GB 8 GB 8 GB 32GB req Server 8 GB req 8 GB req Wait Queue Will be scheduled only if there will be 32 GB available Will be scheduled next 8 GB req 1st job2nd job3nd job4th job This job will be most likely starved
  • 31. Handling jobs that cannot be scheduled challenge • So far we covered the challenge of matching jobs with available resources on the servers • We implicitly assumed we can always find resources • What if there are not enough resources for the jobs? 1. FCFS (intuitively fair, poor utilization) 2. Skip problematic jobs (unfair, good utilization) • Can we combine the best of both worlds? Challenges in Modern Data Centers Management, Spring 2015
  • 32. Handling jobs that cannot be scheduled challenge • So far we covered the challenge of matching jobs with available resources on the servers • We implicitly assumed we can always find resources • What if there are not enough resources for the jobs? 1. FCFS (intuitively fair, poor utilization) 2. Skip problematic jobs (unfair, good utilization) • Can we combine the best of both worlds? Challenges in Modern Data Centers Management, Spring 2015
  • 33. What are reservations? • Technique used to keep fairness while improving utilization 1. The scheduler ‘marks’ certain resources ‘unavailable’, excluding them from being used by other waiting jobs 2. The scheduler ‘remembers’ the job(s) for which these resources have been reserved 3. When enough resources accumulate the scheduler launches the job(s) on the reserved resources Challenges in Modern Data Centers Management, Spring 2015 Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only Reservation
  • 34. Reservations: Two flavors 1. Jobs runtimes are known in advance (information-based) • Usually the runtimes are estimated (not accurate) • Predicting runtime is complex… 2. Jobs runtimes are unknown (information-less) • Very common (practical) scenario especially when serving dynamic usage models with low visibility to the system • Jobs that use random seed that affects the runtime, or variable that affects the runtime and not visible to the system Challenges in Modern Data Centers Management, Spring 2015
  • 35. Information-based (runtime known/predicted) • We’ll describe each job as a rectangle • Horizontal axis describes the job runtime & vertical axis describes its resource consumption (cores/memory/ …) • Let’s look at the jobs’ representation in a server: Challenges in Modern Data Centers Management, Spring 2015 Job Run Time Resource Consumption (processors) Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
  • 36. Backfilling • Moving small jobs from the back of the queue to fill “holes” in the schedule to improve utilization • Using reservations to ensure that the jobs that have been bypassed (skipped over) will not starve • Jobs runtime must be known in advance • Estimated or predicted Challenges in Modern Data Centers Management, Spring 2015 queued job1st job 2nd job 3rd job 4th job 5th job Reservation 6th job 6th job Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
  • 37. Backfilling flavors 1. Conservative backfilling 2. EASY backfilling 3. Selective backfilling Challenges in Modern Data Centers Management, Spring 2015
  • 38. Before we begin…. performance metrics 1. Wait time • Time that the job waits in the scheduler’s queue until it starts executing (running) 2. Response time • Total time that the job spent in the system (wait + runtime) 3. Slowdown • Response time divided by actual runtime 4. Utilization • The % of used resources in the pool, at a given moment Challenges in Modern Data Centers Management, Spring 2015
  • 39. Conservative backfilling • The scheduler provides reservation for every job at arrival time • Newly arriving jobs can move ahead if they don’t violate any previous reservation • Pros  • For every job we know exactly when it will start executing • Limits the slowdown of jobs that would otherwise have difficulty backfilling e.g., high resource consumers (jobs that need many cores) • Cons  • Reduces backfilling opportunities due to blocking effect of the reservations • e.g., long-running jobs with low resource requirements Challenges in Modern Data Centers Management, Spring 2015
  • 40. Conservative backfilling: Example Challenges in Modern Data Centers Management, Spring 2015 Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
  • 41. EASY (aggressive) backfilling • Only the first job in the queue gets a reservation • Jobs may move ahead as long as they do not violate the first jobs’ reservation • Pros  • Provides much more opportunities for backfilling compared to (better utilization) • Cons  • We can only tell when the first job in the queue will start • Jobs that inherently have difficulty backfilling may suffer relatively to conservative backfilling, since they will get reservation only when they reach the head of the queue • E.g., high resource consumers (jobs that need many cores) Challenges in Modern Data Centers Management, Spring 2015
  • 42. EASY backfilling: Example Challenges in Modern Data Centers Management, Spring 2015 Courtesy of Ahuva W. Mu’alem & Dror G. Feitslon “Utiliation, Predectability, Workloads and User Runtime estimates in scheduling the IBM SP2 with Backfilling”, used for educational purposes only
  • 43. Selective backfilling • Jobs get reservation only when their expected slowdown exceeds threshold • If the threshold is chosen judiciously few jobs should have reservation at any time, but the most needy of jobs are assured of getting reservation • Pros  • Provides much more backfilling opportunities relative to Conservative (good for long narrow jobs) • Reduces slowdown for short resource-consuming jobs (short wide) relatively to EASY • Cons  • More complicated e.g., how to choose the optimal threshold? Challenges in Modern Data Centers Management, Spring 2015
  • 44. Check point • Resource matching challenge  • One job at-a-time  • Single and multiple dimensions  • Considering multiple jobs (look-ahead)  • Max jobs and dynamic programming  • Handling jobs that cannot be scheduled challenge  • First-come-first-served (FCFS)  • Improving utilization while avoiding starvation  • Reservations  • Information based (easy and conservative backfilling)  • Information less (fixed and floating) Challenges in Modern Data Centers Management, Spring 2015
  • 45. Information-less (runtimes unknown) 1. Fixed reservation 2. Floating reservation Challenges in Modern Data Centers Management, Spring 2015
  • 46. Information-less: fixed reservation • The scheduler performs reservation for the job on a specific server • The job ‘sticks’ there until enough resources are accumulated to satisfy its requirements – then it starts executing • Pros  • Simple • Cons  • Waits can be significant (bad luck scenario) Challenges in Modern Data Centers Management, Spring 2015
  • 47. Information-less: floating reservation • The scheduler performs reservation for the job on a specific server • The job ‘sticks’ there for a limited duration, e.g., until timeout expires – it then may be relocated to a different server • Pros  • Reduces risk of long waits (compared to fixed reservation) • Cons  • Theoreticaly does not preclude starvation Challenges in Modern Data Centers Management, Spring 2015 Practices from production environment: • In large systems the potential for starvation for floating reservation is very low since at any given moment many jobs finish (or about to finish) • In production environment fixed reservation might cause jobs with “bad lack” to wait a significant amount of time. “bad luck” might be caused by a server does not free its resources, e.g. runaway jobs
  • 48. Reservations: Summary Type Fixed Dynamic Information-based (runtimes known) • Conservative backfilling • EASY backfilling • Selective backfilling Information-less (runtimes unknown) • Static reservations • Floating reservations Challenges in Modern Data Centers Management, Spring 2015 Practices from production environment: • In real world, predicting the runtimes of jobs is a difficult problem
  • 49. Check point • Resource matching challenge  • One job at-a-time  • Single and multiple dimensions  • Considering multiple jobs (look-ahead)  • Max jobs and dynamic programming  • Handling jobs that cannot be scheduled challenge  • First-come-first-served (FCFS)  • Improving utilization while avoiding starvation  • Reservations  • Information based (easy and conservative backfilling)  • Information less (fixed and floating)  Challenges in Modern Data Centers Management, Spring 2015
  • 50. Next lecture: RM part III • Managing multiple data centers (meta-scheduling) Challenges in Modern Data Centers Management, Spring 2015