SlideShare une entreprise Scribd logo
1  sur  39
Apache Hadoop YARN:
State of the union
Wangda Tan, Billie Rinaldi
@ Hortonworks
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speaker intro
 Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on
YARN, worked on features of scheduler like node label / preemption, etc.
 Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level
Apache projects and incubating projects, currently focusing on long running services
and Docker containers on Apache Hadoop YARN
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 Past
 State of the Union
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi colored YARN
 Multi-colored YARN
– Apps
– Long running services
 It’s all about data!
 Layers that enable applications and
higher order frameworks that interact
with data
https://www.flickr.com/photos/happyskrappy/15699919424
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page 5
Containerization
Containers
GPUs /
FPGAs
More
powerful
scheduling
Much faster
scheduling
Scale
SLAs
Usability
Service
workloads
Categories of recent initiatives
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform – Today and Tomorrow
Platform Services
Storage
Resource
Management
Service
Discovery Cluster Mgmt
Monitoring
Alerts
IOT Assembly
Kafk
a
Storm HBase Solr
Security
Governance
MR Tez
Spark
Hive / Pig
LLAP
Flink
REEF
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Past: A quick history
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: Pre GA
• Sub-project of Apache Hadoop
• Alphas and betas
– In production at several large sites for MapReduce already by that time
June-July 2010 August 2011 May 2012 August 2013
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 1/3
2.2 2.3 2.4 2.5 2.6 2.7
15 October 2013 24 February 2014 07 April 2014 11 August 2014
• 1st GA
• MR binary
compatibility
• YARN API
cleanup
• Testing!
• 1st Post GA
• Bug fixes
• Alpha features
• RM Fail-over
• CS
Preemption
• Timeline
Service V1
• Writable
REST APIs
• Timeline
Service V1
security
• Rolling
Upgrades
• Docker
• Node labels
18 November 2014
• Moving to
JDK 7+
• Pluggable
YARN
authentication
21 Apr 2015
Most Essential Requirements for enterprise usage
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 2/3
2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3
25 January 2016 25 August 2016 18 October 2016 04 August 2017
• Application
Priority
• Reservations
• Node labels
improvements
22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017
Enterprise consumption, need stablization
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 3/3
3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0
Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017
• GPU/FPGA
• Native
Service
• Placement
Constraints
06 April 201825 March 201817 Nov 2017
• YARN
Federation
• Opportunistic
Container
• Resource
types
• New YARN UI
• Timeline
service V2
More requirements comes (computation intensive, larger, services)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 2.8/2.9
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application priorities – YARN-1963
• Allocate resource to important apps first.
• Within a leaf-queue
FIFO Policy App 1 App 2 App 3 App 4
FIFO Policy
With priorities
App 1App 2App 3App 4
Higher priority  Lower priority
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Queue priorities – YARN-5864
 For interactive / SLA sentitive workload
 Today
– Give to the least satisfied queue first
 With priorities
– Give to the highest priority queue first (for important workload).
root
A
20% Configured Capacity
But 5% of used Capacity
Usage = 5/20 = 25%
B
80% Configured Capacity
But 8% of used Capacity
Usage 8/80 = 10%
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reservations – YARN-1051
• “Run my workload tomorrow at 6AM”
• Persistence of the plans with RM failover: YARN-2573
Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0/3.1
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Looking at the Scale!
 Tons of sites with clusters made up of large amount of nodes
– Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.
 Previously, largest clusters
– 6K-8K
 Now: 40K nodes (federated), 20K nodes (single cluster).
 Roadmap: To 100K and beyond
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving towards Global & Fast Scheduling
 Problems
– Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
– Several coarse grained locks
 Current effort made us where we improved to
– Look at several nodes at a time
– Fine grained locks
– Multiple allocator threads
– YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
– 10X throughput gains with enhancement added recently
– Much better placement decisions
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles and custom resource types
 Past
– Supports only Memory and CPU
 Now
– A generalized vector
– Custom Resource Types!
 Ease of resource requesting model using
profiles
NodeManager
Memory
CPU
GPU
FPGA
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 Cores
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GPU support on YARN
 Why need isolation?
– Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
 GPU isolation on YARN: .
– Granularity is for per-GPU device.
– Use Cgroups / docker to enforce the isolation.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FPGA on YARN!
 FPGA isolation on YARN: .
– Granularity is for per-FPGA device.
– Use Cgroups to enforce the isolation.
 Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other
FPGA SDK.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better placement strategies (YARN-6592)
 Affinity  Anti-affinity
HBase Sto
rm
Hbase-
Region
Server
Hbase-
Region
Server
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Federation!
 Enables applications to scale to 100k of thousands of nodes
 Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters
 Federation negotiates with sub-clusters RM’s and provide resources to the application
 Applications can schedule tasks on any node
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services support
 Application & Services upgrades
– “Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
– regionserver-0.hbase-app-3.hadoop.yarn.site
 Placement policies
 Container restart
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified APIs for service definitions
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– YARN-4793 Simplified API layer for services and beyond
 Spawn services & Manage them
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services Framework
 Platform is only as good as the tools
 A native YARN services framework
– YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Assembly: Supporting a DAG of apps:
– SLIDER-875
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service
 Application History
– “Where did my containers run?”
– “Why is my application slow?”
– “Is it really slow?”
– “Why is my application failing?”
– “What happened with my application?
Succeeded?”
 Cluster History
– Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
– “Why is my cluster slow?”
– “Why is my cluster down?”
– “What happened in my clusters?”
 Collect and use past data
– To schedule “my application” better
– To do better capacity planning
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service 2.0
• Next generation
– Today’s solution helped us understand the space
– Limited scalability and availability
• “Analyzing Hadoop Clusters is becoming a big-data problem”
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.2 and beyond
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Node Attributes (YARN-3409)
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container overcommit (YARN-1011)
 Each node has some allocated but unutilized capacities
 Use such capacity to run opportunistic tasks
 Preemption such tasks when needed
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-spawning of system services
(YARN-8048)
• System services is services required by
YARN, need to be started during
bootstrap.
• For example YARN ATSv2 needs Hbase, so
Hbase is system service of YARN.
• Only Admin can configure
• Started along with ResourceManager
• Place spec files under
yarn.service.system-service.dir FS path
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons learned running a container cloud
on YARN
https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on-
yarn/
4PM, Room I, Wed April 18th
-- Related Session --
Billie Rinaldi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep learning on YARN: running
distributed Tensorflow, etc. on Hadoop
clusters
https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed-
tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/
2PM, Room II, Wed April 18th
-- Related Session --
Wangda Tan
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BoF’s: Apache Hadoop – YARN, HDFS
https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs
Thursday April 19th
-- Related Session --

Contenu connexe

Tendances

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...DataWorks Summit/Hadoop Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & FutureDataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real WorldDataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 

Tendances (20)

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real World
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJIntro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 

Similaire à Apache Hadoop YARN: state of the union

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 

Similaire à Apache Hadoop YARN: state of the union (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Dernier (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Apache Hadoop YARN: state of the union

  • 1. Apache Hadoop YARN: State of the union Wangda Tan, Billie Rinaldi @ Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speaker intro  Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on YARN, worked on features of scheduler like node label / preemption, etc.  Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level Apache projects and incubating projects, currently focusing on long running services and Docker containers on Apache Hadoop YARN
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Introduction  Past  State of the Union
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi colored YARN  Multi-colored YARN – Apps – Long running services  It’s all about data!  Layers that enable applications and higher order frameworks that interact with data https://www.flickr.com/photos/happyskrappy/15699919424
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 5 Containerization Containers GPUs / FPGAs More powerful scheduling Much faster scheduling Scale SLAs Usability Service workloads Categories of recent initiatives
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform – Today and Tomorrow Platform Services Storage Resource Management Service Discovery Cluster Mgmt Monitoring Alerts IOT Assembly Kafk a Storm HBase Solr Security Governance MR Tez Spark Hive / Pig LLAP Flink REEF
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Past: A quick history
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: Pre GA • Sub-project of Apache Hadoop • Alphas and betas – In production at several large sites for MapReduce already by that time June-July 2010 August 2011 May 2012 August 2013
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 1/3 2.2 2.3 2.4 2.5 2.6 2.7 15 October 2013 24 February 2014 07 April 2014 11 August 2014 • 1st GA • MR binary compatibility • YARN API cleanup • Testing! • 1st Post GA • Bug fixes • Alpha features • RM Fail-over • CS Preemption • Timeline Service V1 • Writable REST APIs • Timeline Service V1 security • Rolling Upgrades • Docker • Node labels 18 November 2014 • Moving to JDK 7+ • Pluggable YARN authentication 21 Apr 2015 Most Essential Requirements for enterprise usage
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 2/3 2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3 25 January 2016 25 August 2016 18 October 2016 04 August 2017 • Application Priority • Reservations • Node labels improvements 22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017 Enterprise consumption, need stablization
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 3/3 3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0 Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017 • GPU/FPGA • Native Service • Placement Constraints 06 April 201825 March 201817 Nov 2017 • YARN Federation • Opportunistic Container • Resource types • New YARN UI • Timeline service V2 More requirements comes (computation intensive, larger, services)
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 2.8/2.9
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application priorities – YARN-1963 • Allocate resource to important apps first. • Within a leaf-queue FIFO Policy App 1 App 2 App 3 App 4 FIFO Policy With priorities App 1App 2App 3App 4 Higher priority  Lower priority
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Queue priorities – YARN-5864  For interactive / SLA sentitive workload  Today – Give to the least satisfied queue first  With priorities – Give to the highest priority queue first (for important workload). root A 20% Configured Capacity But 5% of used Capacity Usage = 5/20 = 25% B 80% Configured Capacity But 8% of used Capacity Usage 8/80 = 10%
  • 15. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reservations – YARN-1051 • “Run my workload tomorrow at 6AM” • Persistence of the plans with RM failover: YARN-2573 Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
  • 16. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0/3.1
  • 17. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Looking at the Scale!  Tons of sites with clusters made up of large amount of nodes – Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.  Previously, largest clusters – 6K-8K  Now: 40K nodes (federated), 20K nodes (single cluster).  Roadmap: To 100K and beyond
  • 18. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving towards Global & Fast Scheduling  Problems – Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. – Several coarse grained locks  Current effort made us where we improved to – Look at several nodes at a time – Fine grained locks – Multiple allocator threads – YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! – 10X throughput gains with enhancement added recently – Much better placement decisions
  • 19. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles and custom resource types  Past – Supports only Memory and CPU  Now – A generalized vector – Custom Resource Types!  Ease of resource requesting model using profiles NodeManager Memory CPU GPU FPGA Profile Memory CPU GPU Small 2 GB 4 Cores 0 Cores Medium 4 GB 8 Cores 0 Cores Large 16 GB 16 Cores 4 Cores
  • 20. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GPU support on YARN  Why need isolation? – Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily.  GPU isolation on YARN: . – Granularity is for per-GPU device. – Use Cgroups / docker to enforce the isolation.
  • 21. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FPGA on YARN!  FPGA isolation on YARN: . – Granularity is for per-FPGA device. – Use Cgroups to enforce the isolation.  Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other FPGA SDK.
  • 22. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better placement strategies (YARN-6592)  Affinity  Anti-affinity HBase Sto rm Hbase- Region Server Hbase- Region Server
  • 23. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Federation!  Enables applications to scale to 100k of thousands of nodes  Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters  Federation negotiates with sub-clusters RM’s and provide resources to the application  Applications can schedule tasks on any node
  • 24. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 25. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services support  Application & Services upgrades – “Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757 – regionserver-0.hbase-app-3.hadoop.yarn.site  Placement policies  Container restart
  • 26. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified APIs for service definitions  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – YARN-4793 Simplified API layer for services and beyond  Spawn services & Manage them
  • 27. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services Framework  Platform is only as good as the tools  A native YARN services framework – YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Assembly: Supporting a DAG of apps: – SLIDER-875
  • 28. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience API based queue management Decentralized (YARN-5734) Improved logs management (YARN-4904) Live application logs
  • 29. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 30. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 31. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service  Application History – “Where did my containers run?” – “Why is my application slow?” – “Is it really slow?” – “Why is my application failing?” – “What happened with my application? Succeeded?”  Cluster History – Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” – “Why is my cluster slow?” – “Why is my cluster down?” – “What happened in my clusters?”  Collect and use past data – To schedule “my application” better – To do better capacity planning
  • 32. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service 2.0 • Next generation – Today’s solution helped us understand the space – Limited scalability and availability • “Analyzing Hadoop Clusters is becoming a big-data problem” – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries
  • 33. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.2 and beyond
  • 34. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node Attributes (YARN-3409) • Node Partition vs. Node Attribute • Partition: • One partition for one node • ACL • Shares between queues • Preemption enforced. • Attribute: • For container placement • No ACL/Shares on attributes • First-come-first-serve
  • 35. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container overcommit (YARN-1011)  Each node has some allocated but unutilized capacities  Use such capacity to run opportunistic tasks  Preemption such tasks when needed
  • 36. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-spawning of system services (YARN-8048) • System services is services required by YARN, need to be started during bootstrap. • For example YARN ATSv2 needs Hbase, so Hbase is system service of YARN. • Only Admin can configure • Started along with ResourceManager • Place spec files under yarn.service.system-service.dir FS path
  • 37. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons learned running a container cloud on YARN https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on- yarn/ 4PM, Room I, Wed April 18th -- Related Session -- Billie Rinaldi
  • 38. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep learning on YARN: running distributed Tensorflow, etc. on Hadoop clusters https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed- tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/ 2PM, Room II, Wed April 18th -- Related Session -- Wangda Tan
  • 39. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BoF’s: Apache Hadoop – YARN, HDFS https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs Thursday April 19th -- Related Session --

Notes de l'éditeur

  1. For new people, 10%
  2. Appplication centric
  3. Categories of recent initiatives
  4. Many users are requesting most necessary features, so that’s why we release so fast. Many of this featuress a necessary to run YARN cluster, such as RM fail-over/ HA, etc
  5. 2.6/2.7/2.8 are versions which most prod cluster are using, so many feature development remains in the background and many community effort are focusing on stablizing features.
  6. Again, the most essential YARN doesn’t meet user’s requirement anymore, that’s why we released 3 minor releases in 6 months, and include features like: YARN Federation (larger cluster) GPU / FPGA, etc.
  7. Even though TF provide options to use GPU memory less than whole device provided. But we cannot enforce this from external.
  8. High level talk on ATSv2, it is scalable solution compared to 1.5.