SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Economic Scheduling of
Hadoop Jobs
-The Dynamic Priority MapReduce Scheduler



Thomas Sandholm, HP Labs, Palo Alto
Kevin Lai, HP Labs, Palo Alto




                                            1
The Problem

»   Allocate slots on compute nodes for job tasks
»   Classic Approach: Throughput optimization
»   Cross User Priorities inferred based on heuristics
»   Social Scheduling
»   Our Approach: User value optimization
»   Users are given an incentive to scale up or down
»   Automate demand conflict resolution




                                                         2
Other Hadoop Schedulers

»   FIFO
»   HOD
»   Fairshare
»   Capacity

» Designed for no queues or few static fixed QoS
  queues
» Works well in corporate clusters




                                                   3
Dynamic Priority Scheduler Requirements

» Users may come and go frequently
» Users may be unknown to providers
» Users may want to schedule jobs across data
  centers and Hadoop installations




         manual, social scheduling of users
     (assumed to be cooperating) breaks down



                                                4
Architecture




               4/(4+1.5+2)*15=8




                                  5
Our Solution: Automated Resource Allocation




                               Budget Remaining




Running                              Share
Tasks

 Pending                  Spending Rate
 Tasks
                                                  6
Proportional-Share Scheduling

» qi = bi/(bi + p)
» p = ∑ b-i

» Huberman et al Spawn ‘92
» Waldspurger et al Lottery Scheduling ‘95
» Lai et al Tycoon ‘05




                                             7
Key Design Principles

» Pay-per-use: spending rate is only deducted from budget
  if a job performed work
» Work-conserving: users are never charged more than
  their spending rates but can get more slots if other users
  are idle
» Preemptive: higher spending users may cause tasks from
  lower spending users to be killed
» Scalable: No memory, or history-based fair-share
  smoothing




                                                        8
Implementation

» Standalone Hadoop MapReduce JobTracker Scheduler
  Plugin
» HTTP/XML/REST Servlet to provide secure management
  and monitoring of queues
» Generic queue allocation/accounting classes (could
  move into mapred core)
» Pluggable scheduler enforcing shares, when scheduling
  jobs (could be replaced by capacity/fairshare
  enforcers)




                                                    9
Configuration
Option                                    Examples
mapred.jobtracker.taskScheduler           org.apache.hadoop.mapred.DynamicPriorityScheduler



mapred.priority-scheduler.kill-interval   0



mapred.dynamic-scheduler.alloc-interval   20



mapred.dynamic-scheduler.budget-file      /etc/hadoop.budget



mapred.priority-scheduler.acl-file        /etc/hadoop.acl




                                                                                      10
Experiment


 Fairshare vs Capacity vs FIFO vs DP
 2-80 simulated users/queues
 2 Clusters
 PiEstimator Simulation




                                       11
Budget Dynamics
                                        DynPrio preempt FIFO scheduler
                     DynPrio no                   Capacity scheduler
                     preempt




Funding runs out   Budget replenished


                                                                         12
Service Differentiation




               DynPrio    FIFO

                                 13
Dynamic Adjustment




                     14
More info

» Papers
  › SIGMETRICS 2009
  › Workshop on Job Scheduling for Parallel Processing
    (JSSPP’10)
  › International Conference on Cloud Computing and
    Virtualization (CCV’10)
» HADOOP-4768 JIRA
» Source:
  http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/




                                                                                  15

Contenu connexe

Tendances

Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...
Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...
Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...LeMeniz Infotech
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Shivkumar Babshetty
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing FrameworksAntonios Katsarakis
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...VMware Tanzu
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018KongYew Chan, MBA
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Sumeet Singh
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
 

Tendances (20)

Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...
Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...
Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Anju
AnjuAnju
Anju
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019
 

En vedette

My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentreSteve Loughran
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliadcompd
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and CloudsSteve Loughran
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-statusSteve Loughran
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart FrogSteve Loughran
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organizationcompd
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?Steve Loughran
 

En vedette (14)

My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentre
 
Extended essay overview
Extended essay overviewExtended essay overview
Extended essay overview
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliad
 
H is for_hadoop
H is for_hadoopH is for_hadoop
H is for_hadoop
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organization
 
Graphs
GraphsGraphs
Graphs
 
Scholarly articles
Scholarly articlesScholarly articles
Scholarly articles
 
Echolocation
EcholocationEcholocation
Echolocation
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?
 

Similaire à Economic Scheduling of Hadoop Jobs

Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...iosrjce
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...Govt.Engineering college, Idukki
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsNamuk Park
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiersRim Moussa
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 

Similaire à Economic Scheduling of Hadoop Jobs (20)

Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
L017656475
L017656475L017656475
L017656475
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiers
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 

Plus de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Plus de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Dernier

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Economic Scheduling of Hadoop Jobs

  • 1. Economic Scheduling of Hadoop Jobs -The Dynamic Priority MapReduce Scheduler Thomas Sandholm, HP Labs, Palo Alto Kevin Lai, HP Labs, Palo Alto 1
  • 2. The Problem » Allocate slots on compute nodes for job tasks » Classic Approach: Throughput optimization » Cross User Priorities inferred based on heuristics » Social Scheduling » Our Approach: User value optimization » Users are given an incentive to scale up or down » Automate demand conflict resolution 2
  • 3. Other Hadoop Schedulers » FIFO » HOD » Fairshare » Capacity » Designed for no queues or few static fixed QoS queues » Works well in corporate clusters 3
  • 4. Dynamic Priority Scheduler Requirements » Users may come and go frequently » Users may be unknown to providers » Users may want to schedule jobs across data centers and Hadoop installations manual, social scheduling of users (assumed to be cooperating) breaks down 4
  • 5. Architecture 4/(4+1.5+2)*15=8 5
  • 6. Our Solution: Automated Resource Allocation Budget Remaining Running Share Tasks Pending Spending Rate Tasks 6
  • 7. Proportional-Share Scheduling » qi = bi/(bi + p) » p = ∑ b-i » Huberman et al Spawn ‘92 » Waldspurger et al Lottery Scheduling ‘95 » Lai et al Tycoon ‘05 7
  • 8. Key Design Principles » Pay-per-use: spending rate is only deducted from budget if a job performed work » Work-conserving: users are never charged more than their spending rates but can get more slots if other users are idle » Preemptive: higher spending users may cause tasks from lower spending users to be killed » Scalable: No memory, or history-based fair-share smoothing 8
  • 9. Implementation » Standalone Hadoop MapReduce JobTracker Scheduler Plugin » HTTP/XML/REST Servlet to provide secure management and monitoring of queues » Generic queue allocation/accounting classes (could move into mapred core) » Pluggable scheduler enforcing shares, when scheduling jobs (could be replaced by capacity/fairshare enforcers) 9
  • 10. Configuration Option Examples mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler mapred.priority-scheduler.kill-interval 0 mapred.dynamic-scheduler.alloc-interval 20 mapred.dynamic-scheduler.budget-file /etc/hadoop.budget mapred.priority-scheduler.acl-file /etc/hadoop.acl 10
  • 11. Experiment Fairshare vs Capacity vs FIFO vs DP 2-80 simulated users/queues 2 Clusters PiEstimator Simulation 11
  • 12. Budget Dynamics DynPrio preempt FIFO scheduler DynPrio no Capacity scheduler preempt Funding runs out Budget replenished 12
  • 13. Service Differentiation DynPrio FIFO 13
  • 15. More info » Papers › SIGMETRICS 2009 › Workshop on Job Scheduling for Parallel Processing (JSSPP’10) › International Conference on Cloud Computing and Virtualization (CCV’10) » HADOOP-4768 JIRA » Source: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/ 15