SlideShare une entreprise Scribd logo
1  sur  29
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Policies in
YARN
Wangda Tan, Varun Vasudev
San Jose, June 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who we are
⬢ Wangda Tan
– Apache Hadoop PMC member
⬢ Varun Vasudev
– Apache Hadoop committer
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Existing scheduling in YARN
⬢ Adding resource types and resource profiles
⬢ Resource scheduling for services
⬢ GUTS(Grand Unified Theory of Scheduling) API
⬢ Q & A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Existing scheduling in YARN
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current resource types
⬢ Currently only support scheduling based on memory and cpu
⬢ Depending on the calculator, scheduler will take cpu into account
⬢ Most applications are unaware of the resources being used for scheduling
–Applications may not get the containers they expect due to a mismatch
⬢ No support for resources like gpu, disk, network
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Locality for containers
⬢ Applications can request for host or rack locality
–If the request can’t be satisfied in a certain number of tries, the container is allocated on the next
node to heartbeat
–Good for MapReduce type applications
⬢ Insufficient for services
–Services need support for affinity, anti-affinity, gang scheduling
–Need support for fallback strategies
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Placement and capacity options
⬢ Node partitions
–End up partitioning the cluster – akin to sub-clusters
– Support for non-exclusive partitions is available
⬢ Reservations
–Let you plan for capacity in advance
–Help you guarantee capacity for high priority large jobs
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource types and resource profiles
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extending resource types in YARN
⬢ Add support for generalized resource types
⬢ Users can use configuration to add and remove resource types from the scheduler
⬢ Allows users to experiment with resource types
–For resources like network, modeling is hard - should you use ops or bandwidth?
–No need to touch the code
⬢ Current work is for countable resource types
–Support for exclusive resource types(like ports) is future work
1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles
⬢ Analogous to instance types in EC2
⬢ Hard for users to conceptualize concepts like disk bandwidth
–Collection of resource types
–Allows admins to define a set of profiles that can users can use to request containers
–Users don’t need to worry about resource types like disk bandwidth
–New resource types can be added and removed without users needing to change their job
submissions
⬢ Profiles are stored on the RM
–users just pass on the name of the profile they want(“small”, “medium”, “large”)
⬢ YARN-3926 is the umbrella jira for the feature
1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles examples
resource-profiles.json
{
“minimum”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“maximum”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
},
“default”: {
“yarn.io/memory”: 2048, “yarn.io/cpu”: 2
}
}
resource-profiles.json
{
“minimum”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“maximum”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
},
“default”: {
“yarn.io/memory”: 2048, “yarn.io/cpu”: 2
}
“small”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“medium”: {
“yarn.io/memory”: 3072,
“yarn.io/cpu”: 3
},
“large”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
}
}
1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Scheduling for Services
1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ Anti-Affinity
–Some services don’t want their daemons run on the same host/rack for better fault recovering or
performance.
–For example, don’t run >1 HBase region server on the same fault zone.
Overview
1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ Affinity
–Some services want to run their daemons close to each other, etc. for performance.
–For example, run Storm workers as close as possible for better data exchanging performance.
(SW = Storm Worker)
Overview
1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Requirements
–Be able to specify affinity/anti-affinity for
intra/inter application(s)
•Intra-application
•Inter-application
•Example of inter-application anti-affinity
–Hard and soft affinity/anti-affinity
•Hard: Reject not expected resources.
•Soft: Best effort
•Example of inter-application soft anti-
affinity
Requirements
Affinity and Anti-affinity
1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ YARN-1042 is the umbrella JIRA
⬢ Demo
1
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity/Anti-affinity Demo
1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container Resizing
⬢ Use cases
–Services can modify size of their running
container according to workload changes.
–For example: when HBase region servers are
running, when workload changes . We can
return excessive resources of RM to improve
utilization.
⬢ Before this feature
–Application has to re-ask container with
different size from YARN.
–Contexts in task memory will be lost.
⬢ Status
–α-feature will be included by Hadoop 2.8
–YARN-1197 is the umbrella jira
Overview
1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
GUTS (Grand Unified Theory of Scheduling) API
2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
⬢ We have more and more new scheduling requirements:
–Scheduling fallbacks
•Try plan-A first, fall back to plan-B if plan-A cannot be satisfied in X secs.
•Currently YARN only supports one scheduling fallbacks: node/rack/off-switch fallbacks by
delay scheduling, but user cannot specify order of fallbacks.
– Affinity / Anti-affinity
2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
–Node partitions
•Already supported by YARN-796, which
can divide a big cluster to several smaller
clusters according to hardware and
purpose, we can specify capacities and
ACLs for node partitions.
–Node constraints
•Is a way to tag nodes without
complexities like ACLs/capacity-
configurations. (YARN-3409)
2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
–Gang scheduling
•Give me N containers at once or nothing.
– Resource reservation
•Give me resource at time T. This is
supported since YARN-1051 (Hadoop
2.6), we need to consider unifying APIs.
– Combination of above
•Gang scheduling + anti-affinity: give me
10 containers at once but avoid nodes
which have containers from application-
X.
•Scheduling fallbacks + node partition:
give me 10 containers from partition X, if
I cannot get them within 5 mins, any
hosts are fine.
2
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems of existing ResourceRequest API
⬢ Existing Resource Request API is not extensible
–Cannot specify relationships between ResourceRequest
–Fragmentation of resource request APIs
•We have ResourceRequest (what I want now), BlacklistRequest (dislike), ReservationRequest
(what I want in the future) API for different purposes.
2
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Proposal
⬢ We need an unified API to specify
resource requirements, following
requirements will be considered:
–Allocation tag
•Tag the purpose of allocated container
(like Hbase_regionserver)
–Quantities of request
• Total number of containers
• Minimum concurrency (give me at least
N containers at once)
• Maximum concurrency (don’t give me
more than N container at once)
– Relationships between placement request
• And/Or/Not: give me resource according
to specified conditions
• Order and delay of fallbacks: Try to
allocate request#1 first, fall back to
request#2 after waits for X seconds
– Time:
• Give me resource between [T1, T2]
2
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
In simple words …
⬢ Application can use unified API to request resource with different
constraints/conditions.
⬢ Easier to be understood, combination of resource requests can be supported.
⬢ Let’s see some examples:
2
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Examples:
⬢ Gang scheduling: I want 8 containers allocate to
me at once.
⬢ Reservation + anti-affinity: Give me 5 containers
tomorrow and not on the same host of
application_..._0005
“12345”: { // Allocation_id
// Other fields..
// Quantity conditions
allocation_size: 2G,
maximum_allocations: 8,
minimum_concurrency: 8,
}
“12345”: { // Allocation_id
allocation_size = 1G,
maximum_allocations = 5,
placement_strategy: {
NOT {
// do not take me to this application
target_app_id: application_123456789_0015
}
},
time_conditions: {
allocation_start_time:
[ 10:50 pm tomorrow - *]
}
}
2
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Examples:
⬢ Request with fallbacks: Try to allocate on GPU partition first, then fall back to any
hosts after 5 mins.
“567890”: { // allocation_id
allocation_size: 2G,
maximum_allocations = 10,
placement_strategy: {
ORDERED_OR [
{
node_partition: GPU,
delay_to_next: 5 min
},
{
host: *
}
]
}
}
2
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Status & Plan
⬢ Working on API definition to make sure it covers all target scenarios.
⬢ Will start POC soon
⬢ This should be a replacement of existing ResourceRequest API, old API will be kept and automatically
converted to new request (old application will not be affected).
⬢ If you want to get more details, please take a look at design doc and discussions of YARN-4902.
2
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A
⬢ Thank you!

Contenu connexe

Tendances

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 

Tendances (20)

LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 

En vedette

Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
DataWorks Summit
 
Dynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNDynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARN
Tsuyoshi OZAWA
 

En vedette (8)

Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
RDD
RDDRDD
RDD
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Dynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNDynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARN
 

Similaire à Scheduling Policies in YARN

June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 

Similaire à Scheduling Policies in YARN (20)

Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop summit-diverse-workload
Hadoop summit-diverse-workloadHadoop summit-diverse-workload
Hadoop summit-diverse-workload
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 

Plus de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Scheduling Policies in YARN

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Policies in YARN Wangda Tan, Varun Vasudev San Jose, June 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who we are ⬢ Wangda Tan – Apache Hadoop PMC member ⬢ Varun Vasudev – Apache Hadoop committer
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Existing scheduling in YARN ⬢ Adding resource types and resource profiles ⬢ Resource scheduling for services ⬢ GUTS(Grand Unified Theory of Scheduling) API ⬢ Q & A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Existing scheduling in YARN
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current resource types ⬢ Currently only support scheduling based on memory and cpu ⬢ Depending on the calculator, scheduler will take cpu into account ⬢ Most applications are unaware of the resources being used for scheduling –Applications may not get the containers they expect due to a mismatch ⬢ No support for resources like gpu, disk, network
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Locality for containers ⬢ Applications can request for host or rack locality –If the request can’t be satisfied in a certain number of tries, the container is allocated on the next node to heartbeat –Good for MapReduce type applications ⬢ Insufficient for services –Services need support for affinity, anti-affinity, gang scheduling –Need support for fallback strategies
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Placement and capacity options ⬢ Node partitions –End up partitioning the cluster – akin to sub-clusters – Support for non-exclusive partitions is available ⬢ Reservations –Let you plan for capacity in advance –Help you guarantee capacity for high priority large jobs
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource types and resource profiles
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extending resource types in YARN ⬢ Add support for generalized resource types ⬢ Users can use configuration to add and remove resource types from the scheduler ⬢ Allows users to experiment with resource types –For resources like network, modeling is hard - should you use ops or bandwidth? –No need to touch the code ⬢ Current work is for countable resource types –Support for exclusive resource types(like ports) is future work
  • 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles ⬢ Analogous to instance types in EC2 ⬢ Hard for users to conceptualize concepts like disk bandwidth –Collection of resource types –Allows admins to define a set of profiles that can users can use to request containers –Users don’t need to worry about resource types like disk bandwidth –New resource types can be added and removed without users needing to change their job submissions ⬢ Profiles are stored on the RM –users just pass on the name of the profile they want(“small”, “medium”, “large”) ⬢ YARN-3926 is the umbrella jira for the feature
  • 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles examples resource-profiles.json { “minimum”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “maximum”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 }, “default”: { “yarn.io/memory”: 2048, “yarn.io/cpu”: 2 } } resource-profiles.json { “minimum”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “maximum”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 }, “default”: { “yarn.io/memory”: 2048, “yarn.io/cpu”: 2 } “small”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “medium”: { “yarn.io/memory”: 3072, “yarn.io/cpu”: 3 }, “large”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 } }
  • 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource Scheduling for Services
  • 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ Anti-Affinity –Some services don’t want their daemons run on the same host/rack for better fault recovering or performance. –For example, don’t run >1 HBase region server on the same fault zone. Overview
  • 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ Affinity –Some services want to run their daemons close to each other, etc. for performance. –For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker) Overview
  • 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Requirements –Be able to specify affinity/anti-affinity for intra/inter application(s) •Intra-application •Inter-application •Example of inter-application anti-affinity –Hard and soft affinity/anti-affinity •Hard: Reject not expected resources. •Soft: Best effort •Example of inter-application soft anti- affinity Requirements Affinity and Anti-affinity
  • 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ YARN-1042 is the umbrella JIRA ⬢ Demo
  • 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity/Anti-affinity Demo
  • 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container Resizing ⬢ Use cases –Services can modify size of their running container according to workload changes. –For example: when HBase region servers are running, when workload changes . We can return excessive resources of RM to improve utilization. ⬢ Before this feature –Application has to re-ask container with different size from YARN. –Contexts in task memory will be lost. ⬢ Status –α-feature will be included by Hadoop 2.8 –YARN-1197 is the umbrella jira Overview
  • 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GUTS (Grand Unified Theory of Scheduling) API
  • 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements ⬢ We have more and more new scheduling requirements: –Scheduling fallbacks •Try plan-A first, fall back to plan-B if plan-A cannot be satisfied in X secs. •Currently YARN only supports one scheduling fallbacks: node/rack/off-switch fallbacks by delay scheduling, but user cannot specify order of fallbacks. – Affinity / Anti-affinity
  • 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements –Node partitions •Already supported by YARN-796, which can divide a big cluster to several smaller clusters according to hardware and purpose, we can specify capacities and ACLs for node partitions. –Node constraints •Is a way to tag nodes without complexities like ACLs/capacity- configurations. (YARN-3409)
  • 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements –Gang scheduling •Give me N containers at once or nothing. – Resource reservation •Give me resource at time T. This is supported since YARN-1051 (Hadoop 2.6), we need to consider unifying APIs. – Combination of above •Gang scheduling + anti-affinity: give me 10 containers at once but avoid nodes which have containers from application- X. •Scheduling fallbacks + node partition: give me 10 containers from partition X, if I cannot get them within 5 mins, any hosts are fine.
  • 23. 2 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problems of existing ResourceRequest API ⬢ Existing Resource Request API is not extensible –Cannot specify relationships between ResourceRequest –Fragmentation of resource request APIs •We have ResourceRequest (what I want now), BlacklistRequest (dislike), ReservationRequest (what I want in the future) API for different purposes.
  • 24. 2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Proposal ⬢ We need an unified API to specify resource requirements, following requirements will be considered: –Allocation tag •Tag the purpose of allocated container (like Hbase_regionserver) –Quantities of request • Total number of containers • Minimum concurrency (give me at least N containers at once) • Maximum concurrency (don’t give me more than N container at once) – Relationships between placement request • And/Or/Not: give me resource according to specified conditions • Order and delay of fallbacks: Try to allocate request#1 first, fall back to request#2 after waits for X seconds – Time: • Give me resource between [T1, T2]
  • 25. 2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In simple words … ⬢ Application can use unified API to request resource with different constraints/conditions. ⬢ Easier to be understood, combination of resource requests can be supported. ⬢ Let’s see some examples:
  • 26. 2 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Examples: ⬢ Gang scheduling: I want 8 containers allocate to me at once. ⬢ Reservation + anti-affinity: Give me 5 containers tomorrow and not on the same host of application_..._0005 “12345”: { // Allocation_id // Other fields.. // Quantity conditions allocation_size: 2G, maximum_allocations: 8, minimum_concurrency: 8, } “12345”: { // Allocation_id allocation_size = 1G, maximum_allocations = 5, placement_strategy: { NOT { // do not take me to this application target_app_id: application_123456789_0015 } }, time_conditions: { allocation_start_time: [ 10:50 pm tomorrow - *] } }
  • 27. 2 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Examples: ⬢ Request with fallbacks: Try to allocate on GPU partition first, then fall back to any hosts after 5 mins. “567890”: { // allocation_id allocation_size: 2G, maximum_allocations = 10, placement_strategy: { ORDERED_OR [ { node_partition: GPU, delay_to_next: 5 min }, { host: * } ] } }
  • 28. 2 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Status & Plan ⬢ Working on API definition to make sure it covers all target scenarios. ⬢ Will start POC soon ⬢ This should be a replacement of existing ResourceRequest API, old API will be kept and automatically converted to new request (old application will not be affected). ⬢ If you want to get more details, please take a look at design doc and discussions of YARN-4902.
  • 29. 2 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A ⬢ Thank you!