SlideShare a Scribd company logo
1 of 38
Download to read offline
1Document Title
YARN 3.1 and Beyond !
Naganarasimha
ApacheCon
2Document Title
Introduction
Naganarasimha GR
 Apache Hadoop PMC
 Contributing since 5 years
 Senior BigData Architect @ Standard Chartered Bank SG
 Contributed in key YARN features
o Node attributes
o ATS V2
3Document Title
4Document Title
• Introduction
• Present
• Upcoming
5Document Title
Introduction
6Document Title
A brief timeline of Hadoop Releases:
2.9.0: Nov’17
• YARN Federation
• Opportunistic
Container
Backported from 3.0 :
• New YARN UI
• Timeline V2
• Global Scheduling
• Multiple Resource
types
• New YARN UI
• Timeline service V2
• GPU/FPGA
• YARN Native Service
• Placement
Constraints
• Node Attributes
• Multi Node
Scheduling
• Hadoop Submarine
Project
• Support service
upgrade
3.0.0: Dec’17 3.1.0: Apr’18 3.2.0: Sep’18
7Document Title
Community Update: JIRAs in 3.1.0
8Document Title
Community Update: Source code changes
9Document Title
Categorization:
Topics
User Centric
Right Node Selection
Admin Centric
Ease of use
Right resource to select
Faster Allocation
Ease of monitoring
Resource Isolation
Ease of Configuration
10Document Title
YARN Overview :
11Document Title
Apache Hadoop 3.1
12Document Title
Moving towards Global & Fast Scheduling
YARN-5139
• Problems
 Current design of one-node-at-a-time allocation cycle can lead to suboptimal
decisions.
 Several coarse grained locks.
• With this, we improved to
 Look at several nodes at a time
 Fine grained locks
 Multiple allocator threads
 YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
 10X throughput gains
 Much better placement decisions
Admin Centric
13Document Title
Traditional scheduling
Root
(memory=60G, vcores=40, gpu=10)
Compliance
(min=10%, max =30%)
Retail
(min=40%, max =50%)
Engineering
(min=50%, max =100%)
Compliance1
(min=50%, max =50%)
Compliance2
(min=50%, max =50%)
Retail1
(min=50%, max =50%)
Retail2
(min=50%, max =50%)
Engineering1
(min=50%, max =50%)
Enginering2
(min=50%, max =50%)
14Document Title
Global Scheduling explained
15Document Title
Better placement strategies (YARN-6592)
• Past
 Supported constraints in form of Node Locality.
• Now YARN can support a lot more use cases
 Co-locate the allocations of a job on the same rack (affinity)
 Spread allocations across machines (anti-affinity) to minimize resource
interference
 Allow up to a specific number of allocations in a node group (cardinality)
User Centric
16Document Title
Better placement strategies (YARN-6592)
• Affinity
• Anti-affinity
17Document Title
Absolute Resources Configuration in CS – YARN-5881
• Gives ability to configure Queue resources as below
<memory=24GB, vcores=20, yarn.io/gpu=2>
• Enables admins to assign different quotas of different resource-types
• No more “Single percentage value limitation for all resource-types”
Root
(memory=60G, vcores=40, gpu=10)
Compliance
(memory=60G, vcores=40, gpu=10)
Retail
(memory=60G, vcores=40, gpu=0)
Engineering
(memory=20G, vcores=40, gpu=10)
Admin Centric
18Document Title
Auto Creation of Leaf Queues - YARN-7117
• Easily map a queue explicitly to user or group with out additional configs
• For e.g, User X comes in, automatically create a queue for user X
with a templated capacity requirements
• Auto created Queues will be
• created runtime based on user mapping
• cleaned up after use
• adhering to ACLs
Admin Centric
19Document Title
Usability : UI 1/2 Admin Centric
20Document Title
Usability : UI 2/2
21Document Title
Timeline Service 2.0 Improvements
Understanding and Monitoring a Hadoop cluster
itself is a BigData problem
• Using HBase as backend for better
scalability for read/write
• More robust storage fault tolerance
• Migration and compatibility with v.1.5
User Centric Admin Centric
22Document Title
Resource profiles and custom resource types
• YARN supported only Memory and CPU
• Now
• A generalized vector for all resources
• Admin could add arbitrary resource
types!
Ease of resource requesting model
using profiles for apps
User Centric Admin Centric
23Document Title
GPU support on YARN
• Why?
 No need to setup separate clusters
 Leverage shared compute!
• Why need isolation?
 Multiple processes use the single GPU will be:
 Serialized.
 Cause OOM easily.
 GPU isolation on YARN:
 Granularity is for per-GPU device.
 Use cgroups / docker to enforce isolation.
User Centric
24Document Title
FPGA on YARN
• FPGA isolation on YARN:
 Granularity is for per-FPGA device
 Use Cgroups to enforce the isolation
• Currently, only Intel OpenCL SDK for FPGA is supported.
• Implementation is extensible to other FPGA SDK.
User Centric
25Document Title
Services support in YARN
• A native YARN services framework (YARN-5079)
 Native Yarn support to Services
 Apache Slider retired from Incubator – lessons and key code carried over to
YARN
• Simplified discovery of services via DNS mechanisms: YARN-4757
 regionserver-0.hbase-app-3.hadoop.yarn.site
 Application & Services upgrades: YARN-4726
 “Do an upgrade of my HBase app with minimal impact to end-users”.
User Centric
26Document Title
Simplified APIs for service definitions
• Applications need simple APIs
• Need to be deployable “easily”
• Simple REST API layer (YARN-4793)
• Spawn services & Manage them
User Centric
27Document Title
How to run a new service in YARN ?
28Document Title
Apache Hadoop 3.2
29Document Title
Node Attributes (YARN-3409)
• “Take me to a node with JDK 10”
• Node Partition vs. Node Attribute
• Node Partition
 One partition for one node
 ACL
 Shares between queues
 Pre emption enforced
• Attribute
 For Container placement
 No ACL’s and Shares
 First come first serve
User Centric
30Document Title
Node Attributes (YARN-3409)
• Distributed Node Attributes
 NM can detect its attributes
 Script based and Config based
detection.
 Attribute prefix : yarn.nm.io
• Centralised Node Attributes
 Configured through CLI and REST
 Admin ACL’s to configure
 Attribute prefix : yarn.rm.io
User Centric
31Document Title
Node Attributes (YARN-3409)
Use cases :
 Hardware Constraints : To identify specific kind of resources like
 GPU, FPGA,
 SSD, # of disks,
 InfiniBand,
 (dual) network cards,
 Task Constraints: Task or container specific constraints like
 To run on specific Operating system versions.
 Processor architecture
 software library versions
 Experimental : Based on dynamic attributes like
 Load average,
 disk usage
 Network
•
User Centric
32Document Title
Container overcommit (YARN-1011)
• Every user says “Give me 16GB for my task”, even though it’s only needed at peak
• Each node has some allocated but unutilized capacity. Use such capacity to run
opportunistic tasks
• Preempt such tasks when needed
Admin Centric
33Document Title
Auto-spawning of system services (YARN-8048)
• “Start this service when YARN starts”
• “initd for YARN”
• Services are started during the yarn
bootstrap
 For example YARN ATSv2 needs Hbase,
so Hbase is system service of YARN.
 Only Admin can configure
 Started along with ResourceManager
 Place spec files under
yarn.service.systemservice.dir FS path
Admin Centric
34Document Title
TensorFlow on YARN (YARN-8220)
• Run deep learning workloads on the same cluster as analytics, stream processing etc!
• Integrated with latest TensorFlow 1.8 and has GPU support
 Use simple command to run TensorFlow app by using Native Service spec file
(Yarnfile)
yarn app -launch distributed-tf <path-to-saved-yarnfile>
 A simple python command line utility also could be used to auto-create Yarnfile
python submit_tf_job.py
--remote_conf_path hdfs:///tf-job-conf
--input_spec example_tf_job_spec.json
--docker_image gpu.cuda_9.0.tf_1.8.0
--job_name distributed-tf-gpu
--user tf-user
--domain tensorflow.site
--distributed --kerberos
35Document Title
TensorFlow on YARN (YARN-8220)
• Run deep learning workloads on the same cluster as analytics, stream processing etc!
• Integrated with latest TensorFlow 1.8 and has GPU support
 Use simple command to run TensorFlow app by using Native Service spec file
(Yarnfile)
yarn app -launch distributed-tf <path-to-saved-yarnfile>
 A simple python command line utility also could be used to auto-create Yarnfile
python submit_tf_job.py
--remote_conf_path hdfs:///tf-job-conf
--input_spec example_tf_job_spec.json
--docker_image gpu.cuda_9.0.tf_1.8.0
--job_name distributed-tf-gpu
--user tf-user
--domain tensorflow.site
--distributed --kerberos
36Document Title
TensorFlow on YARN (YARN-8220)
• Sample Yarnfile for TensorFlow job
User Centric
37Document Title
Other related talks :
• Deep learning on YARN - Running distributed Tensorflow / MXNet / Caffe /
XGBoost on Hadoop clusters
• Speakers : Wangda Tan
• Thursday, 27th Sep, 11:20
• Ballroom
• Running distributed TensorFlow in production: challenges and solutions on
YARN 3.0
• Speakers : Wangda Tan & Yanbo Liang
• Thursday, 27th Sep, 15:40
• Ballroom
38Document Title
Queries ?

More Related Content

What's hot

How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cachearagozin
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Hazelcast Jet v0.4 - August 9, 2017
Hazelcast Jet v0.4 - August 9, 2017Hazelcast Jet v0.4 - August 9, 2017
Hazelcast Jet v0.4 - August 9, 2017Rahul Gupta
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginnershpcexperiment
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Spark Summit
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 

What's hot (20)

Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cache
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Hazelcast Jet v0.4 - August 9, 2017
Hazelcast Jet v0.4 - August 9, 2017Hazelcast Jet v0.4 - August 9, 2017
Hazelcast Jet v0.4 - August 9, 2017
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginners
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 

Similar to Yarn 3.1 and Beyond in ApacheCon North America 2018

Oracle db architecture
Oracle db architectureOracle db architecture
Oracle db architectureSimon Huang
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022Stefano Stabellini
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsS N
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerSpark Summit
 
virtualization-vs-containerization-paas
virtualization-vs-containerization-paasvirtualization-vs-containerization-paas
virtualization-vs-containerization-paasrajdeep
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Eran Gampel
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
 

Similar to Yarn 3.1 and Beyond in ApacheCon North America 2018 (20)

Oracle db architecture
Oracle db architectureOracle db architecture
Oracle db architecture
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
virtualization-vs-containerization-paas
virtualization-vs-containerization-paasvirtualization-vs-containerization-paas
virtualization-vs-containerization-paas
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 

Recently uploaded

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 

Recently uploaded (20)

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

Yarn 3.1 and Beyond in ApacheCon North America 2018

  • 1. 1Document Title YARN 3.1 and Beyond ! Naganarasimha ApacheCon
  • 2. 2Document Title Introduction Naganarasimha GR  Apache Hadoop PMC  Contributing since 5 years  Senior BigData Architect @ Standard Chartered Bank SG  Contributed in key YARN features o Node attributes o ATS V2
  • 6. 6Document Title A brief timeline of Hadoop Releases: 2.9.0: Nov’17 • YARN Federation • Opportunistic Container Backported from 3.0 : • New YARN UI • Timeline V2 • Global Scheduling • Multiple Resource types • New YARN UI • Timeline service V2 • GPU/FPGA • YARN Native Service • Placement Constraints • Node Attributes • Multi Node Scheduling • Hadoop Submarine Project • Support service upgrade 3.0.0: Dec’17 3.1.0: Apr’18 3.2.0: Sep’18
  • 8. 8Document Title Community Update: Source code changes
  • 9. 9Document Title Categorization: Topics User Centric Right Node Selection Admin Centric Ease of use Right resource to select Faster Allocation Ease of monitoring Resource Isolation Ease of Configuration
  • 12. 12Document Title Moving towards Global & Fast Scheduling YARN-5139 • Problems  Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.  Several coarse grained locks. • With this, we improved to  Look at several nodes at a time  Fine grained locks  Multiple allocator threads  YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!  10X throughput gains  Much better placement decisions Admin Centric
  • 13. 13Document Title Traditional scheduling Root (memory=60G, vcores=40, gpu=10) Compliance (min=10%, max =30%) Retail (min=40%, max =50%) Engineering (min=50%, max =100%) Compliance1 (min=50%, max =50%) Compliance2 (min=50%, max =50%) Retail1 (min=50%, max =50%) Retail2 (min=50%, max =50%) Engineering1 (min=50%, max =50%) Enginering2 (min=50%, max =50%)
  • 15. 15Document Title Better placement strategies (YARN-6592) • Past  Supported constraints in form of Node Locality. • Now YARN can support a lot more use cases  Co-locate the allocations of a job on the same rack (affinity)  Spread allocations across machines (anti-affinity) to minimize resource interference  Allow up to a specific number of allocations in a node group (cardinality) User Centric
  • 16. 16Document Title Better placement strategies (YARN-6592) • Affinity • Anti-affinity
  • 17. 17Document Title Absolute Resources Configuration in CS – YARN-5881 • Gives ability to configure Queue resources as below <memory=24GB, vcores=20, yarn.io/gpu=2> • Enables admins to assign different quotas of different resource-types • No more “Single percentage value limitation for all resource-types” Root (memory=60G, vcores=40, gpu=10) Compliance (memory=60G, vcores=40, gpu=10) Retail (memory=60G, vcores=40, gpu=0) Engineering (memory=20G, vcores=40, gpu=10) Admin Centric
  • 18. 18Document Title Auto Creation of Leaf Queues - YARN-7117 • Easily map a queue explicitly to user or group with out additional configs • For e.g, User X comes in, automatically create a queue for user X with a templated capacity requirements • Auto created Queues will be • created runtime based on user mapping • cleaned up after use • adhering to ACLs Admin Centric
  • 19. 19Document Title Usability : UI 1/2 Admin Centric
  • 21. 21Document Title Timeline Service 2.0 Improvements Understanding and Monitoring a Hadoop cluster itself is a BigData problem • Using HBase as backend for better scalability for read/write • More robust storage fault tolerance • Migration and compatibility with v.1.5 User Centric Admin Centric
  • 22. 22Document Title Resource profiles and custom resource types • YARN supported only Memory and CPU • Now • A generalized vector for all resources • Admin could add arbitrary resource types! Ease of resource requesting model using profiles for apps User Centric Admin Centric
  • 23. 23Document Title GPU support on YARN • Why?  No need to setup separate clusters  Leverage shared compute! • Why need isolation?  Multiple processes use the single GPU will be:  Serialized.  Cause OOM easily.  GPU isolation on YARN:  Granularity is for per-GPU device.  Use cgroups / docker to enforce isolation. User Centric
  • 24. 24Document Title FPGA on YARN • FPGA isolation on YARN:  Granularity is for per-FPGA device  Use Cgroups to enforce the isolation • Currently, only Intel OpenCL SDK for FPGA is supported. • Implementation is extensible to other FPGA SDK. User Centric
  • 25. 25Document Title Services support in YARN • A native YARN services framework (YARN-5079)  Native Yarn support to Services  Apache Slider retired from Incubator – lessons and key code carried over to YARN • Simplified discovery of services via DNS mechanisms: YARN-4757  regionserver-0.hbase-app-3.hadoop.yarn.site  Application & Services upgrades: YARN-4726  “Do an upgrade of my HBase app with minimal impact to end-users”. User Centric
  • 26. 26Document Title Simplified APIs for service definitions • Applications need simple APIs • Need to be deployable “easily” • Simple REST API layer (YARN-4793) • Spawn services & Manage them User Centric
  • 27. 27Document Title How to run a new service in YARN ?
  • 29. 29Document Title Node Attributes (YARN-3409) • “Take me to a node with JDK 10” • Node Partition vs. Node Attribute • Node Partition  One partition for one node  ACL  Shares between queues  Pre emption enforced • Attribute  For Container placement  No ACL’s and Shares  First come first serve User Centric
  • 30. 30Document Title Node Attributes (YARN-3409) • Distributed Node Attributes  NM can detect its attributes  Script based and Config based detection.  Attribute prefix : yarn.nm.io • Centralised Node Attributes  Configured through CLI and REST  Admin ACL’s to configure  Attribute prefix : yarn.rm.io User Centric
  • 31. 31Document Title Node Attributes (YARN-3409) Use cases :  Hardware Constraints : To identify specific kind of resources like  GPU, FPGA,  SSD, # of disks,  InfiniBand,  (dual) network cards,  Task Constraints: Task or container specific constraints like  To run on specific Operating system versions.  Processor architecture  software library versions  Experimental : Based on dynamic attributes like  Load average,  disk usage  Network • User Centric
  • 32. 32Document Title Container overcommit (YARN-1011) • Every user says “Give me 16GB for my task”, even though it’s only needed at peak • Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks • Preempt such tasks when needed Admin Centric
  • 33. 33Document Title Auto-spawning of system services (YARN-8048) • “Start this service when YARN starts” • “initd for YARN” • Services are started during the yarn bootstrap  For example YARN ATSv2 needs Hbase, so Hbase is system service of YARN.  Only Admin can configure  Started along with ResourceManager  Place spec files under yarn.service.systemservice.dir FS path Admin Centric
  • 34. 34Document Title TensorFlow on YARN (YARN-8220) • Run deep learning workloads on the same cluster as analytics, stream processing etc! • Integrated with latest TensorFlow 1.8 and has GPU support  Use simple command to run TensorFlow app by using Native Service spec file (Yarnfile) yarn app -launch distributed-tf <path-to-saved-yarnfile>  A simple python command line utility also could be used to auto-create Yarnfile python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed --kerberos
  • 35. 35Document Title TensorFlow on YARN (YARN-8220) • Run deep learning workloads on the same cluster as analytics, stream processing etc! • Integrated with latest TensorFlow 1.8 and has GPU support  Use simple command to run TensorFlow app by using Native Service spec file (Yarnfile) yarn app -launch distributed-tf <path-to-saved-yarnfile>  A simple python command line utility also could be used to auto-create Yarnfile python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed --kerberos
  • 36. 36Document Title TensorFlow on YARN (YARN-8220) • Sample Yarnfile for TensorFlow job User Centric
  • 37. 37Document Title Other related talks : • Deep learning on YARN - Running distributed Tensorflow / MXNet / Caffe / XGBoost on Hadoop clusters • Speakers : Wangda Tan • Thursday, 27th Sep, 11:20 • Ballroom • Running distributed TensorFlow in production: challenges and solutions on YARN 3.0 • Speakers : Wangda Tan & Yanbo Liang • Thursday, 27th Sep, 15:40 • Ballroom