SlideShare une entreprise Scribd logo
1  sur  39
© Hortonworks Inc. 2015
Apache Hadoop YARN 2015
Present and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Page 1
© Hortonworks Inc. 2015
Who am I?
• 7.75 Hadoop-years old
– Don’t fall for the job postings asking
for 10 years #Hadoop Experience yet

• Past
– 2007: Last thing at School – a two
node Tomcat cluster. Three months
later, first thing at job, brought down a
800 node cluster ;)
– Team that ran Hadoop @ Yahoo!
• Present: @Hortonworks
• Two hats
– Hortonworks: Hadoop MapReduce
and YARN Development lead
– Apache: Apache Hadoop PMC,
Apache Member
• Worked/working on
– YARN, Hadoop MapReduce,
HadoopOnDemand,
CapacityScheduler, Hadoop security
– Apache Ambari: Kickstarted the
project’s first release
– Stinger: High performance data
processing with Hadoop/Hive
• Lots of trouble shooting on
clusters (@tshooter)
• 99% + code in Apache, Hadoop
– Open Source
– Community driven
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Agenda
• Apache Hadoop YARN : Overview
• Past
• Present
• Future
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Overview
The Why and the What
Architecting the Future of Big Data
Page 4
© Hortonworks Inc. 2015
Why Hadoop YARN?
• Resource Management
• A messy problem
– Multiple apps, frameworks, their life-
cycles and evolution
• Varied expectations
– On isolation, capacity allocations,
scheduling
– Admin: “Best use of my cluster”
– Users: “Get me as much as possible,
as fast as possible”
• Tenancy
– “I am running this cluster for one
user”
– It almost never stops there
– Groups, Teams, Users
• Adhoc structures get bad real fast
• What’s different?
– Centered around Data
• ‘iIities
– Admission policies. Sharing. Security.
Elasticity. SLAs. ROI
Page 5
Architecting the Future of Big Data
Data
?
Applications
Admins Users
© Hortonworks Inc. 2015
What is Hadoop YARN?
Page 6
HDFS (Scalable, Reliable Storage)
YARN (Cluster Resource Management)
Applications (Running Natively in Hadoop)
• Store all your data in one place … (HDFS)
• Interact with that data in multiple ways … (YARN Platform + Apps)
• Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack)
Queues Admins/Users
Cluster Resources
Pipelines
© Hortonworks Inc. 2015
Past
A quick history
Architecting the Future of Big Data
Page 7
© Hortonworks Inc. 2015
A brief Timeline before the BigBang
• Sub-project of Apache Hadoop
• Releases tied to Hadoop releases
• Gmail like alphas and betas 
– In production at several large sites for
MapReduce already by that time
Page 8
Architecting the Future of Big Data
1st line of Code Open sourced First 2.0 alpha First 2.0 beta
June-July 2010 August 2011 May 2012 August 2013
© Hortonworks Inc. 2015
Apache Hadoop YARN releases
• 15 October, 2013
• The 1st GA release of Apache Hadoop 2.x
• YARN
– First stable and supported release of YARN
– Binary Compatibility for MapReduce applications built on Hadoop-1.x
– YARN level APIs solidified for the future
– Performance
– Scale from the get-go!
• Support for running Hadoop on Microsoft Windows
• Substantial amount of integration testing with rest of projects in the
ecosystem
Page 9
Architecting the Future of Big Data
Apache Hadoop 2.2
© Hortonworks Inc. 2015
Releases (contd)
• 24 February, 2014
• First post GA release for the year 2014
• Number of bug-fixes, enhancements
• Alpha features in YARN
– ResourceManager Failover
– Application History
Page 10
Architecting the Future of Big Data
Apache Hadoop 2.3
© Hortonworks Inc. 2015
Releases (contd)
• 07 April, 2014
• YARN
– ResourceManager Fail-over
– Preemption aided Scheduling
– Application History and Timeline Service V1
Page 11
Architecting the Future of Big Data
Apache Hadoop 2.4
© Hortonworks Inc. 2015
Releases (contd)
• 11 August, 2014
• YARN
– YARN's REST APIs
– Submitting & killing applications.
– Timeline Service V1 Security
Page 12
Architecting the Future of Big Data
Apache Hadoop 2.5
© Hortonworks Inc. 2015
Present
Architecting the Future of Big Data
Page 13
© Hortonworks Inc. 2015
Apache Hadoop releases (contd)
• 18 November 2014
• Last major release at the time of this talk
• YARN
– Support for rolling upgrades
– Support for long running services
– Support for node labels
– Alpha/Beta features: Time-based resource reservations, running applications
natively in Docker containers
Page 14
Architecting the Future of Big Data
Apache Hadoop 2.6
© Hortonworks Inc. 2015
Rolling Upgrades
At a click of a button
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2015
Work preserving ResourceManager restart
Page 16
Architecting the Future of Big Data
• ResourceManager remembers some state
• Reconstructs the remaining from nodes and apps
© Hortonworks Inc. 2015
Work preserving NodeManager restart
Page 17
Architecting the Future of Big Data
• NodeManager remembers state on each machine
• Reconnects to running containers
© Hortonworks Inc. 2015
ResourceManager Fail-over
• Active/Standby Mode
• Depends on fast-recovery
Page 18
Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2015
YARN Rolling Upgrades Workflow
Page 19
Architecting the Future of Big Data
• Servers first
– Masters followed by Slaves
• Upgrade of Applications/Frameworks is decoupled!
© Hortonworks Inc. 2015
YARN Rolling Upgrades Snapshot
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Stack Rolling Upgrades
Page 21
Architecting the Future of Big Data
Rolling Updates Session by Sanjay Radia
Thursday April 16, 2015 11:45-12:25
@ Silver Hall
© Hortonworks Inc. 2015
Services on YARN
Architecting the Future of Big Data
Page 22
© Hortonworks Inc. 2015
Long running services
• You could run them already before
2.6!
• Enhancements needed
– Logs
– Security
– Management/monitoring
– Sharing and Placement
– Discovery
• Resource sharing across
workload types
• Fault tolerance of long running
services
– Work preserving AM restart
– AM forgetting faults
• Service registry
• Project Slider:
http://slider.incubator.apache.org/
• HBase, Storm, Kafka already!
Page 23
Architecting the Future of Big Data
“Bringing Long Running Services to Hadoop YARN”
by Steve Loughran
Thursday April 16, 2015 12:40-13:20
@ Copper Hall
© Hortonworks Inc. 2015
Cluster Management Features
Architecting the Future of Big Data
Page 24
© Hortonworks Inc. 2015
Preemption aided Scheduling
• Admins
– “Make the best use of cluster resources”
• Users
– “Give me resources fast”
• Solution
– Elastic queues
– Loan idle capacities to others
– Take it back on demand
– Balance across queues: In
– Balance across users in a queue: WIP
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Fine-grain isolation for multi-tenancy
• Memory
– Custom monitoring
– Inelastic Resource
• CPU
– Cgroups on Linux
– Elastic Resource
• Support on Windows
– WIP
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Multi-resource scheduling
• Multi-dimensional bin-packing
– Application A says “I want 8GB RAM
and 2 CPUs”
– Application B says “I want 1GB RAM
and 10 CPUs”
• Today – memory & cpu
– Physical memory / virtual memory
– Cpu Cores – Virtual cores
• Scheduling constrained based on
the “bottleneck” resource
– Watch out for utilization drop on the
non-scarce resource
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Node Labels
• Partitions
– Admin: “I have machines of different
types”
– Impact on capacity planning: “Hey,
we bought those Windows machines”
• Types
– Exclusive: “This is my Precious!”
– Non-exclusive: “I get binding
preference. Use it for others when
idle”
• Constraints
– “Take me to a machine running JDK
version 9”
– No impact on capacity planning
– WIP
Page 28
Architecting the Future of Big Data
Default Partition
Partition B
Linux
Partition C
Windows
JDK 8 JDK 7 JDK 7
© Hortonworks Inc. 2015
Operational and Developer tooling
Architecting the Future of Big Data
Page 29
© Hortonworks Inc. 2015
Application History and Timeline Service
• Before
– Few MR specific implementations:
History and web-UI
• Not just MR anymore!
• History
– “Why was my application slow?”
– “Where did my containers run?”
– MapReduce specific Job History
Server
– Need a generic solution beyond
ResourceManager Restart
• Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
• Application Timeline
– Framework specific event collection
and UIs
– “Show me the Counters for my
running MapReduce task”
– “Show me the slowest Storm stream
processing bolt while it is running”
• Present
– A LevelDB based implementation
– Integrated into MapReduce, Apache
Tez, Apache Hive
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Other features
• Web Services
– No need for installed Hadoop Clients
– Submit an app
– Monitor / Kill it
• Multi-homing Environments
– Clients on a public networks
– Cluster traffic on a private network
– Fault tolerance
– Security
Page 31
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future
Architecting the Future of Big Data
Page 32
© Hortonworks Inc. 2015
Apache Hadoop releases (contd)
• Hadoop 2.7
– Likely April 19-24 week, 2014
– Moving to JDK 7 and beyond
• Future
Page 33
Architecting the Future of Big Data
Apache Hadoop 2.7,
2.8 and beyond
© Hortonworks Inc. 2015
Future: Timeline Service Next Generation
• Next generation
– Today’s solution helped understand the space
– Limited scalability and availability
• Analyzing Hadoop Clusters is a big-data problem
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the
FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Improved Usability
• Generic run-time information
– “What is my actual usage by the running container?”
– “How many rack local containers did I get”
– “How healthy is the scheduler”
– “Why is my application stuck? What limits did it hit?”
• With Timeline Service
– Why is my application slow?
– Why is my cluster slow?
– Why is my application failing?
– Why is my cluster down?
– What happened with my application? Succeeded?
– What happened in my clusters?
• Collect and use past data
– To schedule my application better
– To do better capacity planning
Page 35
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Containerized Applications
• Running Containerized
Applications on YARN
• Docker
• Multiple use-cases
– Run my existing service on YARN
– Slider + Docker
– Run my existing MapReduce
application on YARN via a docker
image
Page 36
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Scheduling
• Support priorities across
applications within the same
queue
• Policy Driven scheduling
– “I want app level fairness in queue A,
user level fairness in queue B, and
throughput focus in all other queues”
• Node anti-affinity
– “Do not run two copies of my service
daemon on the same machine”
• Gang scheduling
– “Run all of my app at once”
• Dynamic scheduling of containers
based on actual utilization
• Stabilized App Reservations
– “Create a reservation for my app with
X resources to run at 6AM tomorrow”
• Time based policies
– “10% cluster capacity for queue A
from 6-9AM, but 20% from 9-12AM”
• Prioritized queues
– Admin’s queue takes precedence
over everything else
• Lot more ..
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: More Resource Types
• Node level Isolation and Cluster
level Scheduling
• Disks
– Space
– IOPS: Read/Write
• Network
– Incoming bandwidth
– Outgoing bandwidth
Page 38
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Thank you!
Page 39
Architecting the Future of Big Data
Sandbox: Hadoop in a VM!
Questions Time!

Contenu connexe

Tendances

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 

Tendances (20)

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Yarn
YarnYarn
Yarn
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 

En vedette

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 

En vedette (14)

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Similaire à Apache Hadoop YARN 2015: Present and Future

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDataWorks Summit
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 

Similaire à Apache Hadoop YARN 2015: Present and Future (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARI
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Apache Hadoop YARN 2015: Present and Future

  • 1. © Hortonworks Inc. 2015 Apache Hadoop YARN 2015 Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 1
  • 2. © Hortonworks Inc. 2015 Who am I? • 7.75 Hadoop-years old – Don’t fall for the job postings asking for 10 years #Hadoop Experience yet  • Past – 2007: Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) – Team that ran Hadoop @ Yahoo! • Present: @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project’s first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters (@tshooter) • 99% + code in Apache, Hadoop – Open Source – Community driven Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2015 Agenda • Apache Hadoop YARN : Overview • Past • Present • Future Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2015 Overview The Why and the What Architecting the Future of Big Data Page 4
  • 5. © Hortonworks Inc. 2015 Why Hadoop YARN? • Resource Management • A messy problem – Multiple apps, frameworks, their life- cycles and evolution • Varied expectations – On isolation, capacity allocations, scheduling – Admin: “Best use of my cluster” – Users: “Get me as much as possible, as fast as possible” • Tenancy – “I am running this cluster for one user” – It almost never stops there – Groups, Teams, Users • Adhoc structures get bad real fast • What’s different? – Centered around Data • ‘iIities – Admission policies. Sharing. Security. Elasticity. SLAs. ROI Page 5 Architecting the Future of Big Data Data ? Applications Admins Users
  • 6. © Hortonworks Inc. 2015 What is Hadoop YARN? Page 6 HDFS (Scalable, Reliable Storage) YARN (Cluster Resource Management) Applications (Running Natively in Hadoop) • Store all your data in one place … (HDFS) • Interact with that data in multiple ways … (YARN Platform + Apps) • Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack) Queues Admins/Users Cluster Resources Pipelines
  • 7. © Hortonworks Inc. 2015 Past A quick history Architecting the Future of Big Data Page 7
  • 8. © Hortonworks Inc. 2015 A brief Timeline before the BigBang • Sub-project of Apache Hadoop • Releases tied to Hadoop releases • Gmail like alphas and betas  – In production at several large sites for MapReduce already by that time Page 8 Architecting the Future of Big Data 1st line of Code Open sourced First 2.0 alpha First 2.0 beta June-July 2010 August 2011 May 2012 August 2013
  • 9. © Hortonworks Inc. 2015 Apache Hadoop YARN releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on Hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale from the get-go! • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 9 Architecting the Future of Big Data Apache Hadoop 2.2
  • 10. © Hortonworks Inc. 2015 Releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Number of bug-fixes, enhancements • Alpha features in YARN – ResourceManager Failover – Application History Page 10 Architecting the Future of Big Data Apache Hadoop 2.3
  • 11. © Hortonworks Inc. 2015 Releases (contd) • 07 April, 2014 • YARN – ResourceManager Fail-over – Preemption aided Scheduling – Application History and Timeline Service V1 Page 11 Architecting the Future of Big Data Apache Hadoop 2.4
  • 12. © Hortonworks Inc. 2015 Releases (contd) • 11 August, 2014 • YARN – YARN's REST APIs – Submitting & killing applications. – Timeline Service V1 Security Page 12 Architecting the Future of Big Data Apache Hadoop 2.5
  • 13. © Hortonworks Inc. 2015 Present Architecting the Future of Big Data Page 13
  • 14. © Hortonworks Inc. 2015 Apache Hadoop releases (contd) • 18 November 2014 • Last major release at the time of this talk • YARN – Support for rolling upgrades – Support for long running services – Support for node labels – Alpha/Beta features: Time-based resource reservations, running applications natively in Docker containers Page 14 Architecting the Future of Big Data Apache Hadoop 2.6
  • 15. © Hortonworks Inc. 2015 Rolling Upgrades At a click of a button Architecting the Future of Big Data Page 15
  • 16. © Hortonworks Inc. 2015 Work preserving ResourceManager restart Page 16 Architecting the Future of Big Data • ResourceManager remembers some state • Reconstructs the remaining from nodes and apps
  • 17. © Hortonworks Inc. 2015 Work preserving NodeManager restart Page 17 Architecting the Future of Big Data • NodeManager remembers state on each machine • Reconnects to running containers
  • 18. © Hortonworks Inc. 2015 ResourceManager Fail-over • Active/Standby Mode • Depends on fast-recovery Page 18 Architecting the Future of Big Data ZooKeeper
  • 19. © Hortonworks Inc. 2015 YARN Rolling Upgrades Workflow Page 19 Architecting the Future of Big Data • Servers first – Masters followed by Slaves • Upgrade of Applications/Frameworks is decoupled!
  • 20. © Hortonworks Inc. 2015 YARN Rolling Upgrades Snapshot Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2015 Stack Rolling Upgrades Page 21 Architecting the Future of Big Data Rolling Updates Session by Sanjay Radia Thursday April 16, 2015 11:45-12:25 @ Silver Hall
  • 22. © Hortonworks Inc. 2015 Services on YARN Architecting the Future of Big Data Page 22
  • 23. © Hortonworks Inc. 2015 Long running services • You could run them already before 2.6! • Enhancements needed – Logs – Security – Management/monitoring – Sharing and Placement – Discovery • Resource sharing across workload types • Fault tolerance of long running services – Work preserving AM restart – AM forgetting faults • Service registry • Project Slider: http://slider.incubator.apache.org/ • HBase, Storm, Kafka already! Page 23 Architecting the Future of Big Data “Bringing Long Running Services to Hadoop YARN” by Steve Loughran Thursday April 16, 2015 12:40-13:20 @ Copper Hall
  • 24. © Hortonworks Inc. 2015 Cluster Management Features Architecting the Future of Big Data Page 24
  • 25. © Hortonworks Inc. 2015 Preemption aided Scheduling • Admins – “Make the best use of cluster resources” • Users – “Give me resources fast” • Solution – Elastic queues – Loan idle capacities to others – Take it back on demand – Balance across queues: In – Balance across users in a queue: WIP Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2015 Fine-grain isolation for multi-tenancy • Memory – Custom monitoring – Inelastic Resource • CPU – Cgroups on Linux – Elastic Resource • Support on Windows – WIP Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2015 Multi-resource scheduling • Multi-dimensional bin-packing – Application A says “I want 8GB RAM and 2 CPUs” – Application B says “I want 1GB RAM and 10 CPUs” • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • Scheduling constrained based on the “bottleneck” resource – Watch out for utilization drop on the non-scarce resource Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2015 Node Labels • Partitions – Admin: “I have machines of different types” – Impact on capacity planning: “Hey, we bought those Windows machines” • Types – Exclusive: “This is my Precious!” – Non-exclusive: “I get binding preference. Use it for others when idle” • Constraints – “Take me to a machine running JDK version 9” – No impact on capacity planning – WIP Page 28 Architecting the Future of Big Data Default Partition Partition B Linux Partition C Windows JDK 8 JDK 7 JDK 7
  • 29. © Hortonworks Inc. 2015 Operational and Developer tooling Architecting the Future of Big Data Page 29
  • 30. © Hortonworks Inc. 2015 Application History and Timeline Service • Before – Few MR specific implementations: History and web-UI • Not just MR anymore! • History – “Why was my application slow?” – “Where did my containers run?” – MapReduce specific Job History Server – Need a generic solution beyond ResourceManager Restart • Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” • Application Timeline – Framework specific event collection and UIs – “Show me the Counters for my running MapReduce task” – “Show me the slowest Storm stream processing bolt while it is running” • Present – A LevelDB based implementation – Integrated into MapReduce, Apache Tez, Apache Hive Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2015 Other features • Web Services – No need for installed Hadoop Clients – Submit an app – Monitor / Kill it • Multi-homing Environments – Clients on a public networks – Cluster traffic on a private network – Fault tolerance – Security Page 31 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2015 Future Architecting the Future of Big Data Page 32
  • 33. © Hortonworks Inc. 2015 Apache Hadoop releases (contd) • Hadoop 2.7 – Likely April 19-24 week, 2014 – Moving to JDK 7 and beyond • Future Page 33 Architecting the Future of Big Data Apache Hadoop 2.7, 2.8 and beyond
  • 34. © Hortonworks Inc. 2015 Future: Timeline Service Next Generation • Next generation – Today’s solution helped understand the space – Limited scalability and availability • Analyzing Hadoop Clusters is a big-data problem – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries Page 34 Architecting the Future of Big Data
  • 35. © Hortonworks Inc. 2015 Future: Improved Usability • Generic run-time information – “What is my actual usage by the running container?” – “How many rack local containers did I get” – “How healthy is the scheduler” – “Why is my application stuck? What limits did it hit?” • With Timeline Service – Why is my application slow? – Why is my cluster slow? – Why is my application failing? – Why is my cluster down? – What happened with my application? Succeeded? – What happened in my clusters? • Collect and use past data – To schedule my application better – To do better capacity planning Page 35 Architecting the Future of Big Data
  • 36. © Hortonworks Inc. 2015 Future: Containerized Applications • Running Containerized Applications on YARN • Docker • Multiple use-cases – Run my existing service on YARN – Slider + Docker – Run my existing MapReduce application on YARN via a docker image Page 36 Architecting the Future of Big Data
  • 37. © Hortonworks Inc. 2015 Future: Scheduling • Support priorities across applications within the same queue • Policy Driven scheduling – “I want app level fairness in queue A, user level fairness in queue B, and throughput focus in all other queues” • Node anti-affinity – “Do not run two copies of my service daemon on the same machine” • Gang scheduling – “Run all of my app at once” • Dynamic scheduling of containers based on actual utilization • Stabilized App Reservations – “Create a reservation for my app with X resources to run at 6AM tomorrow” • Time based policies – “10% cluster capacity for queue A from 6-9AM, but 20% from 9-12AM” • Prioritized queues – Admin’s queue takes precedence over everything else • Lot more .. Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2015 Future: More Resource Types • Node level Isolation and Cluster level Scheduling • Disks – Space – IOPS: Read/Write • Network – Incoming bandwidth – Outgoing bandwidth Page 38 Architecting the Future of Big Data
  • 39. © Hortonworks Inc. 2015 Thank you! Page 39 Architecting the Future of Big Data Sandbox: Hadoop in a VM! Questions Time!

Notes de l'éditeur

  1. YARN is not the first general Resource Management platform. So what’s different? It’s data!
  2. Queues reflect org structures. Hierarchical in nature.