Contenu connexe
Similaire à Scheduling Policies in YARN (20)
Plus de DataWorks Summit/Hadoop Summit (20)
Scheduling Policies in YARN
- 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Policies in
YARN
Wangda Tan, Varun Vasudev
San Jose, June 2016
- 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who we are
⬢ Wangda Tan
– Apache Hadoop PMC member
⬢ Varun Vasudev
– Apache Hadoop committer
- 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Existing scheduling in YARN
⬢ Adding resource types and resource profiles
⬢ Resource scheduling for services
⬢ GUTS(Grand Unified Theory of Scheduling) API
⬢ Q & A
- 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Existing scheduling in YARN
- 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current resource types
⬢ Currently only support scheduling based on memory and cpu
⬢ Depending on the calculator, scheduler will take cpu into account
⬢ Most applications are unaware of the resources being used for scheduling
–Applications may not get the containers they expect due to a mismatch
⬢ No support for resources like gpu, disk, network
- 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Locality for containers
⬢ Applications can request for host or rack locality
–If the request can’t be satisfied in a certain number of tries, the container is allocated on the next
node to heartbeat
–Good for MapReduce type applications
⬢ Insufficient for services
–Services need support for affinity, anti-affinity, gang scheduling
–Need support for fallback strategies
- 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Placement and capacity options
⬢ Node partitions
–End up partitioning the cluster – akin to sub-clusters
– Support for non-exclusive partitions is available
⬢ Reservations
–Let you plan for capacity in advance
–Help you guarantee capacity for high priority large jobs
- 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource types and resource profiles
- 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extending resource types in YARN
⬢ Add support for generalized resource types
⬢ Users can use configuration to add and remove resource types from the scheduler
⬢ Allows users to experiment with resource types
–For resources like network, modeling is hard - should you use ops or bandwidth?
–No need to touch the code
⬢ Current work is for countable resource types
–Support for exclusive resource types(like ports) is future work
- 10. 1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles
⬢ Analogous to instance types in EC2
⬢ Hard for users to conceptualize concepts like disk bandwidth
–Collection of resource types
–Allows admins to define a set of profiles that can users can use to request containers
–Users don’t need to worry about resource types like disk bandwidth
–New resource types can be added and removed without users needing to change their job
submissions
⬢ Profiles are stored on the RM
–users just pass on the name of the profile they want(“small”, “medium”, “large”)
⬢ YARN-3926 is the umbrella jira for the feature
- 11. 1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles examples
resource-profiles.json
{
“minimum”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“maximum”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
},
“default”: {
“yarn.io/memory”: 2048, “yarn.io/cpu”: 2
}
}
resource-profiles.json
{
“minimum”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“maximum”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
},
“default”: {
“yarn.io/memory”: 2048, “yarn.io/cpu”: 2
}
“small”: {
“yarn.io/memory”: 1024,
“yarn.io/cpu”: 1
},
“medium”: {
“yarn.io/memory”: 3072,
“yarn.io/cpu”: 3
},
“large”: {
“yarn.io/memory”: 8192,
“yarn.io/cpu”: 8
}
}
- 12. 1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Scheduling for Services
- 13. 1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ Anti-Affinity
–Some services don’t want their daemons run on the same host/rack for better fault recovering or
performance.
–For example, don’t run >1 HBase region server on the same fault zone.
Overview
- 14. 1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ Affinity
–Some services want to run their daemons close to each other, etc. for performance.
–For example, run Storm workers as close as possible for better data exchanging performance.
(SW = Storm Worker)
Overview
- 15. 1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Requirements
–Be able to specify affinity/anti-affinity for
intra/inter application(s)
•Intra-application
•Inter-application
•Example of inter-application anti-affinity
–Hard and soft affinity/anti-affinity
•Hard: Reject not expected resources.
•Soft: Best effort
•Example of inter-application soft anti-
affinity
Requirements
Affinity and Anti-affinity
- 16. 1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Affinity and Anti-affinity
⬢ YARN-1042 is the umbrella JIRA
⬢ Demo
- 18. 1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container Resizing
⬢ Use cases
–Services can modify size of their running
container according to workload changes.
–For example: when HBase region servers are
running, when workload changes . We can
return excessive resources of RM to improve
utilization.
⬢ Before this feature
–Application has to re-ask container with
different size from YARN.
–Contexts in task memory will be lost.
⬢ Status
–α-feature will be included by Hadoop 2.8
–YARN-1197 is the umbrella jira
Overview
- 19. 1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
GUTS (Grand Unified Theory of Scheduling) API
- 20. 2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
⬢ We have more and more new scheduling requirements:
–Scheduling fallbacks
•Try plan-A first, fall back to plan-B if plan-A cannot be satisfied in X secs.
•Currently YARN only supports one scheduling fallbacks: node/rack/off-switch fallbacks by
delay scheduling, but user cannot specify order of fallbacks.
– Affinity / Anti-affinity
- 21. 2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
–Node partitions
•Already supported by YARN-796, which
can divide a big cluster to several smaller
clusters according to hardware and
purpose, we can specify capacities and
ACLs for node partitions.
–Node constraints
•Is a way to tag nodes without
complexities like ACLs/capacity-
configurations. (YARN-3409)
- 22. 2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements
–Gang scheduling
•Give me N containers at once or nothing.
– Resource reservation
•Give me resource at time T. This is
supported since YARN-1051 (Hadoop
2.6), we need to consider unifying APIs.
– Combination of above
•Gang scheduling + anti-affinity: give me
10 containers at once but avoid nodes
which have containers from application-
X.
•Scheduling fallbacks + node partition:
give me 10 containers from partition X, if
I cannot get them within 5 mins, any
hosts are fine.
- 23. 2
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems of existing ResourceRequest API
⬢ Existing Resource Request API is not extensible
–Cannot specify relationships between ResourceRequest
–Fragmentation of resource request APIs
•We have ResourceRequest (what I want now), BlacklistRequest (dislike), ReservationRequest
(what I want in the future) API for different purposes.
- 24. 2
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Proposal
⬢ We need an unified API to specify
resource requirements, following
requirements will be considered:
–Allocation tag
•Tag the purpose of allocated container
(like Hbase_regionserver)
–Quantities of request
• Total number of containers
• Minimum concurrency (give me at least
N containers at once)
• Maximum concurrency (don’t give me
more than N container at once)
– Relationships between placement request
• And/Or/Not: give me resource according
to specified conditions
• Order and delay of fallbacks: Try to
allocate request#1 first, fall back to
request#2 after waits for X seconds
– Time:
• Give me resource between [T1, T2]
- 25. 2
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
In simple words …
⬢ Application can use unified API to request resource with different
constraints/conditions.
⬢ Easier to be understood, combination of resource requests can be supported.
⬢ Let’s see some examples:
- 26. 2
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Examples:
⬢ Gang scheduling: I want 8 containers allocate to
me at once.
⬢ Reservation + anti-affinity: Give me 5 containers
tomorrow and not on the same host of
application_..._0005
“12345”: { // Allocation_id
// Other fields..
// Quantity conditions
allocation_size: 2G,
maximum_allocations: 8,
minimum_concurrency: 8,
}
“12345”: { // Allocation_id
allocation_size = 1G,
maximum_allocations = 5,
placement_strategy: {
NOT {
// do not take me to this application
target_app_id: application_123456789_0015
}
},
time_conditions: {
allocation_start_time:
[ 10:50 pm tomorrow - *]
}
}
- 27. 2
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Examples:
⬢ Request with fallbacks: Try to allocate on GPU partition first, then fall back to any
hosts after 5 mins.
“567890”: { // allocation_id
allocation_size: 2G,
maximum_allocations = 10,
placement_strategy: {
ORDERED_OR [
{
node_partition: GPU,
delay_to_next: 5 min
},
{
host: *
}
]
}
}
- 28. 2
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Status & Plan
⬢ Working on API definition to make sure it covers all target scenarios.
⬢ Will start POC soon
⬢ This should be a replacement of existing ResourceRequest API, old API will be kept and automatically
converted to new request (old application will not be affected).
⬢ If you want to get more details, please take a look at design doc and discussions of YARN-4902.