SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
for HPC Workloads
Mark Stillwell1 Frédéric Vivien2,1 Henri Casanova1
1Department of Information and Computer Sciences
University of Hawai’i at M¯anoa
2INRIA, France
Invited Talk, October 8, 2009
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Formalization
HPC Job Scheduling Problem
0 < N homogeneous nodes
0 < J jobs, each job j has:
arrival time 0 ≤ rj
0 < tj ≤ N tasks
compute time 0 < cj
J not known
rj and tj not known before rj
cj not known until j completes
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Formalization
Schedule Evaluation
make span not relevant for unrelated jobs
flow time over-emphasizes very long jobs
stretch re-balances in favor of short jobs
average stretch prone to starvation
max stretch helps with average while bounding worst case
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Current Approaches
Current Approaches
Batch Scheduling, which no one likes
usually FCFS with backfilling
backfilling needs (unreliable) compute time estimates
unbounded wait times
poor resource utilization
No particular objective
Gang Scheduling, which no one uses
globally coordinated time sharing
complicated and slow
memory pressure a concern
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
VM Technology
basically, time sharing
pooling of discrete resources (e.g., multiple CPUs)
hard limits on resource consumption
job preemption and task migration
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Problem Formulation
extends basic HPC problem
jobs now have per-task CPU need αj and memory
requirement mj
multiple tasks can run on one node if total memory
requirement ≤ 100%
job tasks must be assigned equal amounts of CPU
resource
assigning less than the need results in proportional
slowdown
assigned allocations can change
no run-time estimates
so we need another metric to optimize
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Yield
Definition
The yield, yj(t) of job j at time t is the ratio of the CPU
allocation given to the job to the job’s CPU need.
requires no knowledge of flow or compute times
can be optimized for at each scheduling event
maximizing minimum yield related to minimizing maximum
stretch
How do we keep track of job progress when the yield can
vary?
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Virtual Time
Definition
The virtual time vj(t) of job j at time t is the subjective time
experienced by the job.
vj(t) =
t
rj
yj(τ)dτ
job completes when vj(t) = cj
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
The Need for Preemption
final goal is to minimize maximum stretch
without preemption, stretch of non-clairvoyant on-line
algorithms unbounded
consider 2 jobs
both require all of the system resources
one has cj = 1
other has cj = ∆
need criteria to decide which jobs should be preempted
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Priority
Jobs should be preempted in order by increasing priority.
newly arrived jobs may have infinite priority
1/vj(t) performs well, but subject to starvation
(t − rj)/vj(t) time avoids starvation, but does not perform
well
(t − rj)/(vj(t))2 seems a reasonable compromise
other possibilities exist
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Greedy Heuristics
Greedy Scheduling Heuristics
GREEDY– Put tasks on the host with the lowest CPU
demand on which it can fit into memory; new jobs may
have to be resubmitted using bounded exponential backoff.
GREEDY-PMTN– Like GREEDY, but older tasks may be
preempted
GREEDY-PMTN-MIGR– Like GREEDY-PMTN, but older tasks
may be migrated as well as preempted
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
Connection to multi-capacity bin packing
For each discrete scheduling event:
problem similar to multi-capacity (vector) bin packing, but
has optimization target and variable CPU allocations
can formulate as an MILP [Stillwell et al., 2009]
(NP-complete)
relaxed LP heuristics slow, give low quality solutions
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
Applying MCB heuristics
yield is continuous, so choose a granularity (0.01)
perform a binary search on yield, seeking to maximize
for each fixed yield, set CPU requirement and apply
heuristic
found yield is the maximized minimum, leftover CPU used
to improve average
if a solution cannot be found at any yield, remove the
lowest priority job and try again
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simplified to 2-dimensional
case:
1 Put job tasks in two lists: CPU-intensive and
memory-intensive
2 Sort lists by “some criterion”. (MCB8: descending order by
maximum)
3 Starting with the first host, pick tasks that fit in order from
the list that goes against the current imbalance. Example:
current host tasks total 50% CPU and 60% memory
Assign the next task that fits from the list of CPU-intensive
jobs.
4 When no tasks can fit on a host, go to the next host.
5 If all tasks can be placed, then success, otherwise failure.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Scheduling Heuristics
DYNMCB8– Apply heuristic on every event
DYNMCB8-PER– Apply heuristic periodically
DYNMCB8-ASAP-PER– like DYNMCB8-PER, but try to
greedily schedule incoming jobs
DYNMCB8-STRETCH-PER– like DYNMCB8-PER, but try to
optimize worst-case max stretch
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Methodology
Methodology
discrete event simulator takes list of jobs and returns
stretch values
workloads based on synthetic and real traces
synthetic workload arrival times scaled to show
performance on different load conditions
algorithms evaluated by per-trace degredation factor
experiment with “free” preemption/migration and
experiment where preemption/migration costs job a
constant amount of wall clock time.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DegredationFactor
Load
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Comparison of Synthetic vs. Real workload results
Scaled synth. Unscaled synth. Real-world
Algs Deg. factor Deg. factor Deg. factor
avg. max avg. max avg. max
EASY 167 560 139 443 94 1476
FCFS 186 569 154 476 118 2219
greedy 294 1093 249 1050 153 1527
greedyp 41 875 35 785 9 147
greedypm 62 835 37 773 17 759
mcb 32 162 11 162 11 231
mcbp 1 12 2 21 3 20
gmcbp 1 9 2 22 2 20
mcbsp 1 12 2 21 3 23
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Computation Times
Most scheduling events involve 10 or fewer jobs and
require negligible time for all schedulers.
Even when there are about 100 jobs, the time for MCB8 is
under 5 seconds on a 3.2Ghz machine
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Results
Costs
Greedy approaches use significantly less bandwidth than
MCB approaches (<1GB/s in the worst case)
MCB approaches cause jobs to be preempted around 5
times on average.
DYNMCB8 uses 1.3GB/s on average, 5.1GB/s maximum
periodic algorithms 0.6GB/s on average, 2.1GB/s
maximum
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Scheduling DFRS Heuristics Experiments Conclusions
Conclusions
DFRS potentially much better than batch scheduling
multi-capacity bin packing heuristics perform best
targeting yield does as well as targeting worst case max
stretch
periodic MCB approaches perform nearly as well as
aggressive ones when there is no migration cost and much
better when there is a fixed migration cost
adding an opportunistic greedy scheduling heuristic to
DYNMCB8-PER gives no real benefit to max stretch
MCB approaches can calculate resource allocations
reasonably quickly
MCB approaches need to try to mitigate
migration/preemptions costs
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads
Appendix
For Further Reading
References I
Leinberger, W., Karypis, G., and Kumar, V. (1999).
Multi-capacity bin packing algorithms with applications to
job scheduling under multiple constraints.
In Proc. of the Intl. Conf. on Parallel Processing, pages
404–412. IEEE.
Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova, H.
(2009).
Resource Allocation using Virtual Clusters.
In Proc. of CCGrid 2009, pages 260–267. IEEE.
M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA
Dynamic Fractional Resource Schedulingfor HPC Workloads

Contenu connexe

Tendances

A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...IRJET Journal
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems iosrjce
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithmiosrjce
 
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing IJORCS
 
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERS
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERSTEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERS
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERSijscai
 
Modified heuristic time deviation Technique for job sequencing and Computatio...
Modified heuristic time deviation Technique for job sequencing and Computatio...Modified heuristic time deviation Technique for job sequencing and Computatio...
Modified heuristic time deviation Technique for job sequencing and Computatio...ijcsit
 
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...IJECEIAES
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systemsinside-BigData.com
 

Tendances (12)

K017446974
K017446974K017446974
K017446974
 
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
 
genetic paper
genetic papergenetic paper
genetic paper
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithm
 
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
 
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERS
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERSTEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERS
TEMPORALLY EXTENDED ACTIONS FOR REINFORCEMENT LEARNING BASED SCHEDULERS
 
Modified heuristic time deviation Technique for job sequencing and Computatio...
Modified heuristic time deviation Technique for job sequencing and Computatio...Modified heuristic time deviation Technique for job sequencing and Computatio...
Modified heuristic time deviation Technique for job sequencing and Computatio...
 
Job shop scheduling problem using genetic algorithm
Job shop scheduling problem using genetic algorithmJob shop scheduling problem using genetic algorithm
Job shop scheduling problem using genetic algorithm
 
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systems
 

En vedette

DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerMark Stillwell
 
Dynamic Fractional Resource Scheduling -- 2010, ARCS
Dynamic Fractional Resource Scheduling -- 2010, ARCSDynamic Fractional Resource Scheduling -- 2010, ARCS
Dynamic Fractional Resource Scheduling -- 2010, ARCSMark Stillwell
 
ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版Naoki Kitamura
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Mark Stillwell
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersMark Stillwell
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerMark Stillwell
 

En vedette (9)

DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
Dynamic Fractional Resource Scheduling -- 2010, ARCS
Dynamic Fractional Resource Scheduling -- 2010, ARCSDynamic Fractional Resource Scheduling -- 2010, ARCS
Dynamic Fractional Resource Scheduling -- 2010, ARCS
 
ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版
 
Main
MainMain
Main
 
H2C
H2CH2C
H2C
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual Clusters
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
HARNESS project Demo
HARNESS project DemoHARNESS project Demo
HARNESS project Demo
 

Similaire à Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
Real Time most famous algorithms
Real Time most famous algorithmsReal Time most famous algorithms
Real Time most famous algorithmsAndrea Tino
 
mrcspbayraksan
mrcspbayraksanmrcspbayraksan
mrcspbayraksanYifan Liu
 
Adaptive check-pointing and replication strategy to tolerate faults in comput...
Adaptive check-pointing and replication strategy to tolerate faults in comput...Adaptive check-pointing and replication strategy to tolerate faults in comput...
Adaptive check-pointing and replication strategy to tolerate faults in comput...IOSR Journals
 
An Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing SystemAn Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing SystemJason Liu
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computingrohit_ainapure
 
Cpu scheduling topics
Cpu scheduling topicsCpu scheduling topics
Cpu scheduling topicsSarzamin Khan
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorNECST Lab @ Politecnico di Milano
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLinaro
 
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTY
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTYROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTY
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTYijseajournal
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...NECST Lab @ Politecnico di Milano
 
fpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managamentfpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managamentdeepika977036
 

Similaire à Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon (20)

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
task_sched2.ppt
task_sched2.ppttask_sched2.ppt
task_sched2.ppt
 
Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)
 
Real Time most famous algorithms
Real Time most famous algorithmsReal Time most famous algorithms
Real Time most famous algorithms
 
mrcspbayraksan
mrcspbayraksanmrcspbayraksan
mrcspbayraksan
 
Adaptive check-pointing and replication strategy to tolerate faults in comput...
Adaptive check-pointing and replication strategy to tolerate faults in comput...Adaptive check-pointing and replication strategy to tolerate faults in comput...
Adaptive check-pointing and replication strategy to tolerate faults in comput...
 
K017126670
K017126670K017126670
K017126670
 
K017617786
K017617786K017617786
K017617786
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
 
An Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing SystemAn Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing System
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computing
 
ESC UNIT 3.ppt
ESC UNIT 3.pptESC UNIT 3.ppt
ESC UNIT 3.ppt
 
Cpu scheduling topics
Cpu scheduling topicsCpu scheduling topics
Cpu scheduling topics
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTY
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTYROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTY
ROBUST OPTIMIZATION FOR RCPSP UNDER UNCERTAINTY
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
fpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managamentfpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managament
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon

  • 1. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling for HPC Workloads Mark Stillwell1 Frédéric Vivien2,1 Henri Casanova1 1Department of Information and Computer Sciences University of Hawai’i at M¯anoa 2INRIA, France Invited Talk, October 8, 2009 M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 2. Scheduling DFRS Heuristics Experiments Conclusions Formalization HPC Job Scheduling Problem 0 < N homogeneous nodes 0 < J jobs, each job j has: arrival time 0 ≤ rj 0 < tj ≤ N tasks compute time 0 < cj J not known rj and tj not known before rj cj not known until j completes M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 3. Scheduling DFRS Heuristics Experiments Conclusions Formalization Schedule Evaluation make span not relevant for unrelated jobs flow time over-emphasizes very long jobs stretch re-balances in favor of short jobs average stretch prone to starvation max stretch helps with average while bounding worst case M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 4. Scheduling DFRS Heuristics Experiments Conclusions Current Approaches Current Approaches Batch Scheduling, which no one likes usually FCFS with backfilling backfilling needs (unreliable) compute time estimates unbounded wait times poor resource utilization No particular objective Gang Scheduling, which no one uses globally coordinated time sharing complicated and slow memory pressure a concern M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 5. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling VM Technology basically, time sharing pooling of discrete resources (e.g., multiple CPUs) hard limits on resource consumption job preemption and task migration M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 6. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Problem Formulation extends basic HPC problem jobs now have per-task CPU need αj and memory requirement mj multiple tasks can run on one node if total memory requirement ≤ 100% job tasks must be assigned equal amounts of CPU resource assigning less than the need results in proportional slowdown assigned allocations can change no run-time estimates so we need another metric to optimize M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 7. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Yield Definition The yield, yj(t) of job j at time t is the ratio of the CPU allocation given to the job to the job’s CPU need. requires no knowledge of flow or compute times can be optimized for at each scheduling event maximizing minimum yield related to minimizing maximum stretch How do we keep track of job progress when the yield can vary? M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 8. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Virtual Time Definition The virtual time vj(t) of job j at time t is the subjective time experienced by the job. vj(t) = t rj yj(τ)dτ job completes when vj(t) = cj M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 9. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling The Need for Preemption final goal is to minimize maximum stretch without preemption, stretch of non-clairvoyant on-line algorithms unbounded consider 2 jobs both require all of the system resources one has cj = 1 other has cj = ∆ need criteria to decide which jobs should be preempted M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 10. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Priority Jobs should be preempted in order by increasing priority. newly arrived jobs may have infinite priority 1/vj(t) performs well, but subject to starvation (t − rj)/vj(t) time avoids starvation, but does not perform well (t − rj)/(vj(t))2 seems a reasonable compromise other possibilities exist M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 11. Scheduling DFRS Heuristics Experiments Conclusions Greedy Heuristics Greedy Scheduling Heuristics GREEDY– Put tasks on the host with the lowest CPU demand on which it can fit into memory; new jobs may have to be resubmitted using bounded exponential backoff. GREEDY-PMTN– Like GREEDY, but older tasks may be preempted GREEDY-PMTN-MIGR– Like GREEDY-PMTN, but older tasks may be migrated as well as preempted M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 12. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics Connection to multi-capacity bin packing For each discrete scheduling event: problem similar to multi-capacity (vector) bin packing, but has optimization target and variable CPU allocations can formulate as an MILP [Stillwell et al., 2009] (NP-complete) relaxed LP heuristics slow, give low quality solutions M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 13. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics Applying MCB heuristics yield is continuous, so choose a granularity (0.01) perform a binary search on yield, seeking to maximize for each fixed yield, set CPU requirement and apply heuristic found yield is the maximized minimum, leftover CPU used to improve average if a solution cannot be found at any yield, remove the lowest priority job and try again M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 14. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 15. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 16. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 17. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 18. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 19. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 20. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 21. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Scheduling Heuristics DYNMCB8– Apply heuristic on every event DYNMCB8-PER– Apply heuristic periodically DYNMCB8-ASAP-PER– like DYNMCB8-PER, but try to greedily schedule incoming jobs DYNMCB8-STRETCH-PER– like DYNMCB8-PER, but try to optimize worst-case max stretch M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 22. Scheduling DFRS Heuristics Experiments Conclusions Methodology Methodology discrete event simulator takes list of jobs and returns stretch values workloads based on synthetic and real traces synthetic workload arrival times scaled to show performance on different load conditions algorithms evaluated by per-trace degredation factor experiment with “free” preemption/migration and experiment where preemption/migration costs job a constant amount of wall clock time. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 23. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 24. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 25. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 26. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 27. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 28. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 29. Scheduling DFRS Heuristics Experiments Conclusions Results Comparison of Synthetic vs. Real workload results Scaled synth. Unscaled synth. Real-world Algs Deg. factor Deg. factor Deg. factor avg. max avg. max avg. max EASY 167 560 139 443 94 1476 FCFS 186 569 154 476 118 2219 greedy 294 1093 249 1050 153 1527 greedyp 41 875 35 785 9 147 greedypm 62 835 37 773 17 759 mcb 32 162 11 162 11 231 mcbp 1 12 2 21 3 20 gmcbp 1 9 2 22 2 20 mcbsp 1 12 2 21 3 23 M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 30. Scheduling DFRS Heuristics Experiments Conclusions Results Computation Times Most scheduling events involve 10 or fewer jobs and require negligible time for all schedulers. Even when there are about 100 jobs, the time for MCB8 is under 5 seconds on a 3.2Ghz machine M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 31. Scheduling DFRS Heuristics Experiments Conclusions Results Costs Greedy approaches use significantly less bandwidth than MCB approaches (<1GB/s in the worst case) MCB approaches cause jobs to be preempted around 5 times on average. DYNMCB8 uses 1.3GB/s on average, 5.1GB/s maximum periodic algorithms 0.6GB/s on average, 2.1GB/s maximum M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 32. Scheduling DFRS Heuristics Experiments Conclusions Conclusions DFRS potentially much better than batch scheduling multi-capacity bin packing heuristics perform best targeting yield does as well as targeting worst case max stretch periodic MCB approaches perform nearly as well as aggressive ones when there is no migration cost and much better when there is a fixed migration cost adding an opportunistic greedy scheduling heuristic to DYNMCB8-PER gives no real benefit to max stretch MCB approaches can calculate resource allocations reasonably quickly MCB approaches need to try to mitigate migration/preemptions costs M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  • 33. Appendix For Further Reading References I Leinberger, W., Karypis, G., and Kumar, V. (1999). Multi-capacity bin packing algorithms with applications to job scheduling under multiple constraints. In Proc. of the Intl. Conf. on Parallel Processing, pages 404–412. IEEE. Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova, H. (2009). Resource Allocation using Virtual Clusters. In Proc. of CCGrid 2009, pages 260–267. IEEE. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads