The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Resource Aware Scheduling for Hadoop [Final Presentation]
1. Na>onal University of Singapore
School of Compu>ng
Department of Informa>on Systems
Lu Wei
Project No: H064420
Supervisor: Professor Tan Kian‐Lee
RESOURCE‐AWARE SCHEDULING
FOR HADOOP
1
6. Early Schedulers
• FIFO: MapReduce default, by Google
– Priority level & submission >me
– Data locality
– Problem: starva>on of other jobs in presence of a
long running job
• Hadoop On Demand (HOD): by Yahoo!
– Fairness: Sta>c node alloca>on using Torque
Resource Manager
– Problem: Poor data locality & underu>liza>on
6
7. Mainstream Schedulers
• Fair Scheduler: by Facebook
– Fairness: dynamic resource redistribu>on
– Challenges:
• data locality – solved with delayed scheduling
• Reduce/map dependence – solved with copy‐compute
splibng
• Capacity Scheduler: by Yahoo!
– Similar to Fair Scheduler
– Special support for memory intensive jobs
7
8. Alterna>ve Schedulers
• Adap>ve Scheduler (2010‐2011)
– Goal/deadline orientated
– Adap>vely establish predic>ons by job matching
– Problem: strong assump>ons & ques>onable
performance
• Machine Learning Approach (2010)
– Naïve Bayes & Proceptron with the aid of user hints
– Befer performance than FIFO
– Underu>liza>on during learning phase & Overhead
8
9. Exis>ng Schedulers
Scheduler Pro Con Resource‐Awareness
FIFO High throughput Starva>on of short Data locality
jobs
HOD Sharing of cluster Poor data locality & ‐
underu>liza>on
Fair Scheduler Fairness & dynamic Complicated Data locality
resource re‐ configura>on Copy‐compute
alloca>on splibng
Capacity Scheduler Similar to FS Similar to FS Special support for
memory intensive jobs
Adap>ve Scheduler Adap>ve approach Strong assump>ons Resource u>liza>on
& ques>onable control using job
performance matching
Machine Learning Reported befer Underu>liza>on Resource u>liza>on
performance than during learning control using pafern
FIFO phase & overhead classifica>on
9
10. Mo>va>ons
• Heterogeneity by Configura>on
– Hardware capacity differences among a cluster
• Heterogeneity by Usage
– All task slots are treated equally without
considera>ons of resource status of current node
or resource demand of queuing jobs
– Possible that a CPU busy node is assigned a CPU
intensive job; and an I/O busy node assigned an I/
O intensive job
10
12. Design Overview
1. Capture
– the job’s resource demand characteris>cs
– the TaskTracker’s sta>c capability & run>me
usage status
2. Combine and Transform into quan>fied
measurements
3. Predict how fast a given TaskTracker is
expected to finish a given task
4. Apply scheduling policy of choice
12
15. Design Details
• Task Processing Time Es>ma>on
testimate = te −cpu + te −disk + te −network
cs−cpu
testimate = ts−cpu × + te −disk −in + te −disk −out + te −disk −spill + te −network −in + te −network −out
ccpu
cs−disk −read s
€ te −disk −in = ts−disk −in ×
cdisk −read
× disk −in
ss−disk −in
€
ss−disk −spill
sdisk −spill = × sin
Ss−in
€
sout βs−oi −ratio × sin
snetwork −out = =
N total −reduce N total −reduce
€ 15
€
16. Design Details
• Scheduling policies
– Map Tasks
• Shortest Job First (SJF)
• Starva>on of long running jobs: addressed by periodical
re‐sampling
– Reduce Tasks
• Naïve I/O Biasing
– Do not schedule I/O intensive job on I/O busy node when
there are other reduce slots with higher disk I/O availability
– I/O intensive job: judged using map phase sample
– I/O busy node: disk I/O scores below cluster average
16
19. Es>ma>on Accuracy
• Cluster Configura>on I
– Shared with other users and other applica>ons
– 1 master, 10 slave nodes
– 1Gbps network, same rack
– Each node:
• 4 processors: Intel Xeon E5607 Quad Core CPU (2.26GHz),
• 32GBmemory, and
• 1TB hard disk
• Hadoop Configura>on
– HDFS block size: 64MB
– Data replica>on: 1
– Each node:
• Map slots: 1
• Reduce slots: 2
– Specula>ve map & reduce tasks: off
– Completed maps required before scheduling reduce: 1 out of 1000 total maps
19
20. Es>ma>on Accuracy
• Workload descrip>on:
– I/O workload: word count
• Counts the occurrence of each word in given input files
• Mapper: Scans through the input; outputs each word with itself as
the key and 1 as the value, sorted on the key value.
• Reducer: Collects those with the same key by adding up the value;
outputs the key and total occurrence
– CPU workload: pi es>ma>on
• Approximate the value of pi by coun>ng the number of points that
fall within the unit quarter circle
• Mapper: Reads coordinates of points; counts points inside/outside
of the inscribed circle of the square.
• Reducer: Accumulates numbers of points inside/outside results
from the mappers
20
28. Performance Benchmark:
Resource Scheduler vs. FIFO Scheduler
• Analysis FIFO vs Resource Scheduler in a Resource‐
Homogeneous Environment
– Negligible overhead (Simultaneous submission of an I/O job
and a CPU job )
– Resource Scheduler performs 1700
worse: slowdown in all 1650
measured dimensions and case 1600
– Reason: Resource scheduler has 1550
more concurrent running 1500
reducers compe>ng for 1450
worst
resources 1400 average
1350 best
– Expect: Same performance in a
1300
busy cluster (all reduce slots are
1250
constantly filled with running
1200
tasks) FIFO Resource FIFO Resource
total map >me (sec) total job >me (sec)
28
32. Performance Benchmark:
Resource Scheduler vs. FIFO Scheduler
FIFO vs Resource Scheduler in a Resource‐
Heterogeneous Environment Total map ?me
(Simultaneous submission of an I/O job and a percentage slowdown of resource to FIFO scheduler
CPU job ) 16.00%
14.00%
2700
12.00%
2550 10.00% homogenous
8.00% environment
2400 6.00%
heterogenous
4.00% environment
2250
2.00%
0.00%
2100
Best Average Worst
1950
worst
Total job ?me
percentage slowdown of resource to FIFO scheduler
1800 average
20.00%
best 18.00%
1650
16.00%
14.00%
1500
12.00% homogenous
10.00% environment
1350 8.00%
6.00% heterogenous
1200 4.00% environment
FIFO Resource FIFO Resource 2.00%
0.00%
‐2.00% Best Average Worst
Total map >me (sec) Total job >me (sec)
‐4.00%
32
33. Conclusion
• Resource based map task processing >me es>ma>on is sa>sfactory
• Resource scheduler did not manage to outperform FIFO scheduler
in resource‐homogenous environment and most cases of resource
heterogeneous environment due to extra concurrent reduce tasks
• However we verified that resource scheduler is indeed resource
aware – it performs befer when moved from a resource‐
homogeneous environment to a resource‐heterogeneous
environment:
– Smaller percentage slowdown compared to FIFO in all cases and all
measured dimensions
– Observed speedup compared to FIFO in worse cases due to I/O biasing
scheduling during reduce stage
33
34. Recommenda>ons for Future Work
• Evalua>on
– Heavier workload & busy cluster
• Observe overhead
• Benchmark performance
• Scheduling policy
– Map Task
• Highest Response Ra>o Next (HRRN)
testimated + twaiting twaiting
priority = = 1+
testimated testimated
– Reduce Task
• CPU Biasing for CPU intensive jobs
€
34