SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
HFSP: the Hadoop Fair Sojourn Protocol
Mario Pastorelli, Antonio Barbuzzi, Damiano Carra, Matteo
Dell’Amico, Pietro Michiardi
May 13, 2013
1
Outline
1 Hadoop and MapReduce
2 Fair Sojourn Protocol
3 HFSP Implementation
4 Experiments
2
Hadoop and MapReduce
Outline
1 Hadoop and MapReduce
2 Fair Sojourn Protocol
3 HFSP Implementation
4 Experiments
3
Hadoop and MapReduce MapReduce
MapReduce
Bring the computation to the data – split in blocks across the cluster
MAP
One task per block
Hadoop filesystem (HDFS): 64 MB by default
Stores locally key-value pairs
e.g., for word count: [(manzana, 15) , (melocoton, 7) , . . .]
4
Hadoop and MapReduce MapReduce
MapReduce
Bring the computation to the data – split in blocks across the cluster
MAP
One task per block
Hadoop filesystem (HDFS): 64 MB by default
Stores locally key-value pairs
e.g., for word count: [(manzana, 15) , (melocoton, 7) , . . .]
REDUCE
# of tasks set by the programmer
Mapper output is partitioned by key and pulled from “mappers”
The REDUCE function operates on all values for a single key
e.g., (melocoton, [7, 42, 13, . . .])
4
Hadoop and MapReduce Problem Statement
The Problem With Scheduling
Current Workloads
Huge job size variance
Running time: seconds to hours
I/O: KBs to TBs
[Chen et al., VLDB ’12; Ren et al., CMU TR ’12]
Consequence
Interactive jobs are delayed by long ones
In smaller clusters long queues exacerbate the problem
5
Fair Sojourn Protocol
Outline
1 Hadoop and MapReduce
2 Fair Sojourn Protocol
3 HFSP Implementation
4 Experiments
6
Fair Sojourn Protocol Introduction To FSP
Fair Sojourn Protocol [Friedman & Henderson, SIGMETRICS ’03]
100
usage (%)
cluster
50
10 15 37.5 42.5 50
time
(s)
100
usage (%)
cluster
10 5020 30
50
time
(s)
job 1
job 2
job 3
job 1 job 3job 2 job 1
Simulate completion time using a simulated processor sharing
discipline
Schedule all resources to the job that would complete first 7
Fair Sojourn Protocol Introduction To FSP
Multi-Processor FSP
10 13 3923.5
usage (%)
cluster
100
50
24.5
time
(s)
10 13 20 23 39
100
50
usage (%)
cluster
time
(s)
job 1
job 2
job 3
job 1
job 2
job 3
In our case, some jobs may not require all cluster resources
8
HFSP Implementation
Outline
1 Hadoop and MapReduce
2 Fair Sojourn Protocol
3 HFSP Implementation
4 Experiments
9
HFSP Implementation HFSP In General
HFSP In A Nutshell
Job Size Estimation
Naive estimation at first
After the first s “training” tasks have run, we make a better
estimation
s = 5 by default
On t task slots, we give priority to training tasks
t avoids starving “old” jobs
“shortcut” for very small jobs
10
HFSP Implementation HFSP In General
HFSP In A Nutshell
Job Size Estimation
Naive estimation at first
After the first s “training” tasks have run, we make a better
estimation
s = 5 by default
On t task slots, we give priority to training tasks
t avoids starving “old” jobs
“shortcut” for very small jobs
Scheduling Policy
We treat MAP and REDUCE phases as separate jobs
A virtual cluster outputs a per-job simulated completion time
Preempt running tasks of jobs that complete later in the virtual
cluster
10
HFSP Implementation Size Estimation
Job Size Estimation (1)
Initial Estimation
ξ · k · l
k: # of tasks
l: average size of past MAP/REDUCE tasks
ξ ∈ [1, ∞]: aggressivity for scheduling jobs in training phase
ξ = 1 (default): tend to schedule training jobs right away
they may have to be preempted
ξ = ∞: wait for training to end before deciding
may require more “waves”
11
HFSP Implementation Size Estimation
Job Size Estimation (2)
MAP Phase
From the size of the s samples, generate an empirical CDF
(Least-square) fit to a parametric distribution
Predicted job size: k time the expected value of the fitted
distribution
12
HFSP Implementation Size Estimation
Job Size Estimation (2)
MAP Phase
From the size of the s samples, generate an empirical CDF
(Least-square) fit to a parametric distribution
Predicted job size: k time the expected value of the fitted
distribution
Data Locality
Experimentally, we find out it’s not an issue
For the s sample jobs, there are plenty of unprocessed blocks around
We use delay scheduling [Zaharia et al., EuroSys ’10]
12
HFSP Implementation Size Estimation
Job Size Estimation (3)
REDUCE Phase
Shuffle time: getting data to the reducer
time between scheduling a REDUCE task and executing a REDUCE
function the first time
average of sample shuffle sizes, weighted by data size
Execution time
we set a timeout ∆ (default 60s)
if the timeout is hit, estimated execution time is
∆
p
where progress p is the fraction of data processed
Compute estimated reduce time as before
13
HFSP Implementation Virtual Cluster
Virtual Cluster
Estimated job size is in a “serialized” single-machine format
Simulates a processor-sharing cluster to compute completion
time, based on
number of tasks per job
available task slots in the real cluster
Simulation is updated when
new jobs arrive
tasks complete
14
HFSP Implementation Preemption
Job Preemption
Supported in Hadoop
KILL running tasks
wastes work
WAIT for them to finish
may take long
15
HFSP Implementation Preemption
Job Preemption
Supported in Hadoop
KILL running tasks
wastes work
WAIT for them to finish
may take long
Our Choice
MAP tasks: WAIT
generally small
For REDUCE tasks, we implemented SUSPEND and RESUME
avoids the drawbacks of both WAIT and KILL
15
HFSP Implementation Preemption
Job Preemption: SUSPEND and RESUME
Our Solution
We delegate to the OS: SIGSTOP and SIGCONT
16
HFSP Implementation Preemption
Job Preemption: SUSPEND and RESUME
Our Solution
We delegate to the OS: SIGSTOP and SIGCONT
The OS will swap tasks if and when memory is needed
no risk of thrashing: swapped data is loaded only when resuming
16
HFSP Implementation Preemption
Job Preemption: SUSPEND and RESUME
Our Solution
We delegate to the OS: SIGSTOP and SIGCONT
The OS will swap tasks if and when memory is needed
no risk of thrashing: swapped data is loaded only when resuming
Configurable maximum number of suspended tasks
if reached, switch to WAIT
hard limit on memory allocated to suspended tasks
16
HFSP Implementation Preemption
Job Preemption: SUSPEND and RESUME
Our Solution
We delegate to the OS: SIGSTOP and SIGCONT
The OS will swap tasks if and when memory is needed
no risk of thrashing: swapped data is loaded only when resuming
Configurable maximum number of suspended tasks
if reached, switch to WAIT
hard limit on memory allocated to suspended tasks
If not all running tasks should be preempted, suspend the
youngest
likely to finish later
may have smaller memory footprint
16
Experiments
Outline
1 Hadoop and MapReduce
2 Fair Sojourn Protocol
3 HFSP Implementation
4 Experiments
17
Experiments Setup and Traces
Experimental Setup
Platform
100 m1.xlarge Amazon EC2 instances
4 x 2 GHz cores, 1.6 TB storage, 15 GB RAM each
Workloads
Generated with the SWIM workload generator [Chen et al., MASCOTS ’11]
Sinthetized from Facebook traces [Chen et al., VLDB ’12]
FB2009: 100 jobs, most are small; 22 minutes submission schedule
FB2010: 93 jobs, small jobs filtered out; 1h submission schedule
Configuration
We compare to Hadoop’s FAIR scheduler
similar to a processor-sharing discipline
Delay scheduling enabled both for FAIR and HFSP
18
Experiments Results
FB2009
0
0.25
0.5
0.75
1
0 0.5 1 1.5 2 2.5
Fractionofcompletedjobs
Sojourn Time [min]
HFSP
FAIR
0
0.25
0.5
0.75
1
0 20 40 60 80 100
Sojourn Time [min]
HFSP
FAIR
0
0.25
0.5
0.75
1
0 50 100 150 200 250
Sojourn Time [min]
HFSP
FAIR
Small jobs Medium jobs Large jobs
The FIFO scheduler would mostly fall outside of the graph
Small jobs (few tasks) are not problematic in either case
they are allocated enough tasks
Medium and large jobs instead require a significant amount of
the cluster resources
“focusing” all resources of the cluster pays off
19
Experiments Results
FB2010
0
0.25
0.5
0.75
1
0 100 200 300 400 500
Fractionofcompletedjobs
Map Time [min]
HFSP
FAIR
0
0.25
0.5
0.75
1
0 75 150 225 300 375
Reduce Time [min]
HFSP
FAIR
0
0.25
0.5
0.75
1
0 125 250 375 500 625 750
Sojourn Time [min]
HFSP
FAIR
MAP phase REDUCE phase Aggregate
Larger jobs, longer queues, more pressure on the scheduler
Median MAP sojourn time is more than halved
Main reason: less “waves” because cluster resources are focused
On aggregate, when the first job completes with FAIR, 20% jobs
are done with HFSP.
20
Experiments Results
Cluster Size
0
20
40
60
80
100
120
10 20 30 40 50 60 70 80 90 100
Averagesojourntime[min]
Cluster nodes number
HFSP
FAIR
Experiment done with the Mumak Hadoop official emulator and
FB2009
For smaller clusters, scheduling makes a bigger difference
21
Experiments Results
Robustness to Estimation Errors
140
150
160
170
180
190
200
210
220
230
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AverageSojournTime[s]
α
FAIR
HFSP (α=0)
Experimental settings as before: FB2009 and Mumak again
For a job size estimation of θ, we introduce an error and pick a
value uniformly in
[(1 − α) θ, (1 + α) θ]
22
Experiments Results
Preemption: Costs
Question
Could the costs associated to swapping make SUSPEND not worth it?
Measurements
Linux can read and write swap close to maximum disk speed
100 MB/s for us
Worst-Case Analysis
In the FB2010 experiment, 10% of REDUCE tasks are suspended
The JVM heap space for REDUCE tasks is 1GB
as advised in Hadoop docs
Therefore, a SUSPEND/RESUME induces swapping for at most 20 s
one order of magnitude less than average size of preempted tasks
23
Experiments Conclusions
Take-Home Messages
Size-based scheduling on Hadoop is viable, and particularly appealing
for companies with (semi-)interactive jobs and smaller clusters
Even simple approximate means for size estimation are sufficient, as
HFSP is robust with respect to errors
OS delegation to POSIX SIGSTOP and SIGCONT signals is an efficient
way to perform preemption in Hadoop
HFSP is available as free software at
http://bitbucket.org/bigfootproject/hfsp
Paper at http://arxiv.org/abs/1302.2749
24

Contenu connexe

Tendances

Storm 2012-03-29
Storm 2012-03-29Storm 2012-03-29
Storm 2012-03-29Ted Dunning
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...NAVER D2
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopLamprini Koutsokera
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time togetherTed Dunning
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Cloudera, Inc.
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outspardhavi reddy
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 

Tendances (19)

Storm 2012-03-29
Storm 2012-03-29Storm 2012-03-29
Storm 2012-03-29
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on Hadoop
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time together
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outs
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 

Similaire à HFSP: the Hadoop Fair Sojourn Protocol

Size-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And BackSize-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And BackMatteo Dell'Amico
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014Mario Pastorelli
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationMatteo Dell'Amico
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintijccsa
 
Revisiting Size-Based Scheduling with Estimated Job Sizes
Revisiting Size-Based Scheduling with Estimated Job SizesRevisiting Size-Based Scheduling with Estimated Job Sizes
Revisiting Size-Based Scheduling with Estimated Job SizesMatteo Dell'Amico
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwarePooyan Jamshidi
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopPallav Jha
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 

Similaire à HFSP: the Hadoop Fair Sojourn Protocol (20)

BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Size-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And BackSize-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And Back
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every Organization
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Revisiting Size-Based Scheduling with Estimated Job Sizes
Revisiting Size-Based Scheduling with Estimated Job SizesRevisiting Size-Based Scheduling with Estimated Job Sizes
Revisiting Size-Based Scheduling with Estimated Job Sizes
 
final report
final reportfinal report
final report
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data Software
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in Hadoop
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 

Dernier

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

HFSP: the Hadoop Fair Sojourn Protocol

  • 1. HFSP: the Hadoop Fair Sojourn Protocol Mario Pastorelli, Antonio Barbuzzi, Damiano Carra, Matteo Dell’Amico, Pietro Michiardi May 13, 2013 1
  • 2. Outline 1 Hadoop and MapReduce 2 Fair Sojourn Protocol 3 HFSP Implementation 4 Experiments 2
  • 3. Hadoop and MapReduce Outline 1 Hadoop and MapReduce 2 Fair Sojourn Protocol 3 HFSP Implementation 4 Experiments 3
  • 4. Hadoop and MapReduce MapReduce MapReduce Bring the computation to the data – split in blocks across the cluster MAP One task per block Hadoop filesystem (HDFS): 64 MB by default Stores locally key-value pairs e.g., for word count: [(manzana, 15) , (melocoton, 7) , . . .] 4
  • 5. Hadoop and MapReduce MapReduce MapReduce Bring the computation to the data – split in blocks across the cluster MAP One task per block Hadoop filesystem (HDFS): 64 MB by default Stores locally key-value pairs e.g., for word count: [(manzana, 15) , (melocoton, 7) , . . .] REDUCE # of tasks set by the programmer Mapper output is partitioned by key and pulled from “mappers” The REDUCE function operates on all values for a single key e.g., (melocoton, [7, 42, 13, . . .]) 4
  • 6. Hadoop and MapReduce Problem Statement The Problem With Scheduling Current Workloads Huge job size variance Running time: seconds to hours I/O: KBs to TBs [Chen et al., VLDB ’12; Ren et al., CMU TR ’12] Consequence Interactive jobs are delayed by long ones In smaller clusters long queues exacerbate the problem 5
  • 7. Fair Sojourn Protocol Outline 1 Hadoop and MapReduce 2 Fair Sojourn Protocol 3 HFSP Implementation 4 Experiments 6
  • 8. Fair Sojourn Protocol Introduction To FSP Fair Sojourn Protocol [Friedman & Henderson, SIGMETRICS ’03] 100 usage (%) cluster 50 10 15 37.5 42.5 50 time (s) 100 usage (%) cluster 10 5020 30 50 time (s) job 1 job 2 job 3 job 1 job 3job 2 job 1 Simulate completion time using a simulated processor sharing discipline Schedule all resources to the job that would complete first 7
  • 9. Fair Sojourn Protocol Introduction To FSP Multi-Processor FSP 10 13 3923.5 usage (%) cluster 100 50 24.5 time (s) 10 13 20 23 39 100 50 usage (%) cluster time (s) job 1 job 2 job 3 job 1 job 2 job 3 In our case, some jobs may not require all cluster resources 8
  • 10. HFSP Implementation Outline 1 Hadoop and MapReduce 2 Fair Sojourn Protocol 3 HFSP Implementation 4 Experiments 9
  • 11. HFSP Implementation HFSP In General HFSP In A Nutshell Job Size Estimation Naive estimation at first After the first s “training” tasks have run, we make a better estimation s = 5 by default On t task slots, we give priority to training tasks t avoids starving “old” jobs “shortcut” for very small jobs 10
  • 12. HFSP Implementation HFSP In General HFSP In A Nutshell Job Size Estimation Naive estimation at first After the first s “training” tasks have run, we make a better estimation s = 5 by default On t task slots, we give priority to training tasks t avoids starving “old” jobs “shortcut” for very small jobs Scheduling Policy We treat MAP and REDUCE phases as separate jobs A virtual cluster outputs a per-job simulated completion time Preempt running tasks of jobs that complete later in the virtual cluster 10
  • 13. HFSP Implementation Size Estimation Job Size Estimation (1) Initial Estimation ξ · k · l k: # of tasks l: average size of past MAP/REDUCE tasks ξ ∈ [1, ∞]: aggressivity for scheduling jobs in training phase ξ = 1 (default): tend to schedule training jobs right away they may have to be preempted ξ = ∞: wait for training to end before deciding may require more “waves” 11
  • 14. HFSP Implementation Size Estimation Job Size Estimation (2) MAP Phase From the size of the s samples, generate an empirical CDF (Least-square) fit to a parametric distribution Predicted job size: k time the expected value of the fitted distribution 12
  • 15. HFSP Implementation Size Estimation Job Size Estimation (2) MAP Phase From the size of the s samples, generate an empirical CDF (Least-square) fit to a parametric distribution Predicted job size: k time the expected value of the fitted distribution Data Locality Experimentally, we find out it’s not an issue For the s sample jobs, there are plenty of unprocessed blocks around We use delay scheduling [Zaharia et al., EuroSys ’10] 12
  • 16. HFSP Implementation Size Estimation Job Size Estimation (3) REDUCE Phase Shuffle time: getting data to the reducer time between scheduling a REDUCE task and executing a REDUCE function the first time average of sample shuffle sizes, weighted by data size Execution time we set a timeout ∆ (default 60s) if the timeout is hit, estimated execution time is ∆ p where progress p is the fraction of data processed Compute estimated reduce time as before 13
  • 17. HFSP Implementation Virtual Cluster Virtual Cluster Estimated job size is in a “serialized” single-machine format Simulates a processor-sharing cluster to compute completion time, based on number of tasks per job available task slots in the real cluster Simulation is updated when new jobs arrive tasks complete 14
  • 18. HFSP Implementation Preemption Job Preemption Supported in Hadoop KILL running tasks wastes work WAIT for them to finish may take long 15
  • 19. HFSP Implementation Preemption Job Preemption Supported in Hadoop KILL running tasks wastes work WAIT for them to finish may take long Our Choice MAP tasks: WAIT generally small For REDUCE tasks, we implemented SUSPEND and RESUME avoids the drawbacks of both WAIT and KILL 15
  • 20. HFSP Implementation Preemption Job Preemption: SUSPEND and RESUME Our Solution We delegate to the OS: SIGSTOP and SIGCONT 16
  • 21. HFSP Implementation Preemption Job Preemption: SUSPEND and RESUME Our Solution We delegate to the OS: SIGSTOP and SIGCONT The OS will swap tasks if and when memory is needed no risk of thrashing: swapped data is loaded only when resuming 16
  • 22. HFSP Implementation Preemption Job Preemption: SUSPEND and RESUME Our Solution We delegate to the OS: SIGSTOP and SIGCONT The OS will swap tasks if and when memory is needed no risk of thrashing: swapped data is loaded only when resuming Configurable maximum number of suspended tasks if reached, switch to WAIT hard limit on memory allocated to suspended tasks 16
  • 23. HFSP Implementation Preemption Job Preemption: SUSPEND and RESUME Our Solution We delegate to the OS: SIGSTOP and SIGCONT The OS will swap tasks if and when memory is needed no risk of thrashing: swapped data is loaded only when resuming Configurable maximum number of suspended tasks if reached, switch to WAIT hard limit on memory allocated to suspended tasks If not all running tasks should be preempted, suspend the youngest likely to finish later may have smaller memory footprint 16
  • 24. Experiments Outline 1 Hadoop and MapReduce 2 Fair Sojourn Protocol 3 HFSP Implementation 4 Experiments 17
  • 25. Experiments Setup and Traces Experimental Setup Platform 100 m1.xlarge Amazon EC2 instances 4 x 2 GHz cores, 1.6 TB storage, 15 GB RAM each Workloads Generated with the SWIM workload generator [Chen et al., MASCOTS ’11] Sinthetized from Facebook traces [Chen et al., VLDB ’12] FB2009: 100 jobs, most are small; 22 minutes submission schedule FB2010: 93 jobs, small jobs filtered out; 1h submission schedule Configuration We compare to Hadoop’s FAIR scheduler similar to a processor-sharing discipline Delay scheduling enabled both for FAIR and HFSP 18
  • 26. Experiments Results FB2009 0 0.25 0.5 0.75 1 0 0.5 1 1.5 2 2.5 Fractionofcompletedjobs Sojourn Time [min] HFSP FAIR 0 0.25 0.5 0.75 1 0 20 40 60 80 100 Sojourn Time [min] HFSP FAIR 0 0.25 0.5 0.75 1 0 50 100 150 200 250 Sojourn Time [min] HFSP FAIR Small jobs Medium jobs Large jobs The FIFO scheduler would mostly fall outside of the graph Small jobs (few tasks) are not problematic in either case they are allocated enough tasks Medium and large jobs instead require a significant amount of the cluster resources “focusing” all resources of the cluster pays off 19
  • 27. Experiments Results FB2010 0 0.25 0.5 0.75 1 0 100 200 300 400 500 Fractionofcompletedjobs Map Time [min] HFSP FAIR 0 0.25 0.5 0.75 1 0 75 150 225 300 375 Reduce Time [min] HFSP FAIR 0 0.25 0.5 0.75 1 0 125 250 375 500 625 750 Sojourn Time [min] HFSP FAIR MAP phase REDUCE phase Aggregate Larger jobs, longer queues, more pressure on the scheduler Median MAP sojourn time is more than halved Main reason: less “waves” because cluster resources are focused On aggregate, when the first job completes with FAIR, 20% jobs are done with HFSP. 20
  • 28. Experiments Results Cluster Size 0 20 40 60 80 100 120 10 20 30 40 50 60 70 80 90 100 Averagesojourntime[min] Cluster nodes number HFSP FAIR Experiment done with the Mumak Hadoop official emulator and FB2009 For smaller clusters, scheduling makes a bigger difference 21
  • 29. Experiments Results Robustness to Estimation Errors 140 150 160 170 180 190 200 210 220 230 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 AverageSojournTime[s] α FAIR HFSP (α=0) Experimental settings as before: FB2009 and Mumak again For a job size estimation of θ, we introduce an error and pick a value uniformly in [(1 − α) θ, (1 + α) θ] 22
  • 30. Experiments Results Preemption: Costs Question Could the costs associated to swapping make SUSPEND not worth it? Measurements Linux can read and write swap close to maximum disk speed 100 MB/s for us Worst-Case Analysis In the FB2010 experiment, 10% of REDUCE tasks are suspended The JVM heap space for REDUCE tasks is 1GB as advised in Hadoop docs Therefore, a SUSPEND/RESUME induces swapping for at most 20 s one order of magnitude less than average size of preempted tasks 23
  • 31. Experiments Conclusions Take-Home Messages Size-based scheduling on Hadoop is viable, and particularly appealing for companies with (semi-)interactive jobs and smaller clusters Even simple approximate means for size estimation are sufficient, as HFSP is robust with respect to errors OS delegation to POSIX SIGSTOP and SIGCONT signals is an efficient way to perform preemption in Hadoop HFSP is available as free software at http://bitbucket.org/bigfootproject/hfsp Paper at http://arxiv.org/abs/1302.2749 24