SlideShare une entreprise Scribd logo
1  sur  38
© Hortonworks Inc. 2014
Apache Hadoop YARN
Present and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Page 1
© Hortonworks Inc. 2014
A quick show of hands..
• Hadoop 2
Page 2
Architecting the Future of Big Data
Real life Hadoop Logo
© Hortonworks Inc. 2014
Who am I?
• 6.75 Hadoop-years old
• Last thing at School – a two node Tomcat cluster. Three months later,
first thing at job, brought down a 800 node cluster ;)
• Previously @Yahoo!
• Now @Hortonworks
• Two hats
– Hortonworks: Hadoop MapReduce and YARN Development lead
– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member
• Worked/working on
– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop
security
– Apache Ambari: Kickstarted the project and its first release
– Stinger: High performance data processing with Hadoop/Hive
• Lots of trouble shooting on clusters
• 99% + code in Apache, Hadoop
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Agenda
• Apache Hadoop 2 : Overview
• Past
• Present
• Future
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop 2
Next Generation Architecture
Architecting the Future of Big Data
Page 5
© Hortonworks Inc. 2014
What is YARN?
• Resource Management Platform
– MapReduce v2
– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!
– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?
• How is it different from Hadoop 1? ..
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Hadoop 1 vs Hadoop 2
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, highly-available & reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page 7
© Hortonworks Inc. 2014
Key Benefits of YARN
• Scale
• New Programming Models & Services
• Improved cluster utilization
• Agility
• To infinity and beyond ..
Page 8
© Hortonworks Inc. 2014
Why Migrate?
• 2.0 >= 2 * 1.0
– HDFS: Lots of ground-breaking features
– YARN: Next generation architecture
• Return on Investment: 2x throughput on same hardware!
• Ready for improvements in hardware
• Not convinced? Let’s see what others are saying!
Page 9
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Yahoo!
• Leader/Visionary on all things Hadoop!
• On YARN (0.23.x)
• Moving fast to 2.x
Page 10
Architecting the Future of Big Data
http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
© Hortonworks Inc. 2014
Twitter
Page 11
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Ebay
• Has one of the largest Hadoop clusters in the industry with many
petabytes of data
• Migrated production clusters to Hadoop-2
• Go to Mayank’s talk
– “Hadoop-2 @ ebay”!
– Thursday, April 3
– Track : Deployment and Operations
• Should be convinced by now .. . No?
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN: the Data Operating System
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Present
Architecting the Future of Big Data
Page 14
© Hortonworks Inc. 2014
Apache Hadoop releases
• 15 October, 2013
• The 1st GA release of Apache Hadoop 2.x
• YARN
– First stable and supported release of YARN
– Binary Compatibility for MapReduce applications built on hadoop-1.x
– YARN level APIs solidified for the future
– Performance
– Scale!
• HDFS
– High Availability for HDFS
– HDFS Federation
– HDFS Snapshots
– NFSv3 access to data in HDFS
• Support for running Hadoop on Microsoft Windows
• Substantial amount of integration testing with rest of projects in the
ecosystem
Page 15
Architecting the Future of Big Data
Apache Hadoop 2.2
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• 24 February, 2014
• First post GA release for the year 2014
• Alpha features in YARN
– ResourceManager HA
– Application History
– Will cover in the 2.4 content
• HDFS
– Details follow..
• Number of bug-fixes, enhancements
Page 16
Architecting the Future of Big Data
Apache Hadoop 2.3
© Hortonworks Inc. 2014
HDFS: Heterogeneous Storage
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2014
HDFS: DataNode caching
Page 18
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• Very soon!
• YARN
– Details follow..
– ResourceManager restart fail-over for high availability
– Preemption
– Application History and timeline
• HDFS
– FileSystem ACLs
– Rolling upgrades
Page 19
Architecting the Future of Big Data
Apache Hadoop 2.4
© Hortonworks Inc. 2014
ResourceManager Restart and fail-over
Page 20
Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2014
Capacity Scheduler Preemption
Page 21
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Application History and Timeline
• Few MR specific implementations: History and web-UI
• Not just MR anymore!
• History
– MapReduce specific Job History Server
– Beyond ResourceManager Restart
• Timeline
– Framework specific event collection and UIs
• Run analytics on historical apps!
Page 22
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future
Architecting the Future of Big Data
Page 23
© Hortonworks Inc. 2014
Future: Operational enhancements
• Rolling upgrades
– No/minimal impact to users
– Ideal: Always rolling!
• HDFS in
• YARN
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Enabling more apps
• Beyond MR
• Discussing next
– Long running services
– Isolation
– Multi-dimensional resource
scheduling
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Long running services
• You can run them already!
• Few enhancements needed
– Logs
– Security
– Management/monitoring
• Resource sharing across
workload types
• Project Slider
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Fine-grain isolation for multi-tenancy
• Custom memory-monitoring
• Cgroups
• Linux Containers
• VMs
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Multi-resource scheduling
• Today – memory & cpu
– Physical memory / virtual memory
– Cpu Cores – Virtual cores
• CPU stuff: More bake in
• Disks
– Space
– IOPS
• Network
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Other features
• Application SLAs
• Node labels
• Node affinity/anti-affinity
• Better online queue-management
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN Ecosystem
Beyond the core YARN project: Briefly
Architecting the Future of Big Data
Page 30
© Hortonworks Inc. 2014
Eco-system
Page 31
Applications Powered by YARN
Apache Giraph – Graph Processing
Apache Hama – BSP
Apache Hadoop MapReduce – Batch
Apache Tez – Batch/Interactive
Apache S4 – Stream Processing
Apache Samza – Stream Processing
Apache Storm – Stream Processing
Apache Spark – Iterative applications
HOYA – HBase on YARN
YARN Frameworks
Apache Twill
REEF by Microsoft
Spring support for Hadoop 2
There's an app for that...
YARN App Marketplace!
© Hortonworks Inc. 2014
Apache TEZ
• Moving beyond MR
• A data processing framework that can execute a complex DAG of
tasks.
• “Apache Tez - A New Chapter in Hadoop Data Processing”
– By Siddharth Seth: YARN & Tez Committer/PMC Member
– Thursday, April 3 (4:20-5:00pm)
Page 32
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Recap
Architecting the Future of Big Data
Page 33
© Hortonworks Inc. 2014
Recap
Page 34
Architecting the Future of Big Data
• Apache Hadoop 2 is, at least, twice as good!
• Exciting journey with Hadoop for this decade…
– Hadoop is no longer a one-trick pony, err elephant
– Beyond just HDFS & MapReduce
• Architecture for the future
– Centralized data
– Exciting spectrum of application types, workloads and usecases
© Hortonworks Inc. 2014
Couple more things..
Architecting the Future of Big Data
Page 35
© Hortonworks Inc. 2014
The Book is out!
Page 36
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Thank you!
Page 38
Download Sandbox: Experience Apache Hadoop
Both 2.x and 1.x Versions Available!
http://hortonworks.com/products/hortonworks-sandbox/
Questions Time!

Contenu connexe

Tendances

Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...DataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensionshdhappy001
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 

Tendances (20)

Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Yarn
YarnYarn
Yarn
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 

En vedette

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Rohit Agrawal
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords Stephan Ewen
 
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case StudyOozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case StudyFX Live Group
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweetmislam77
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)Kanak Biscuitwala
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 

En vedette (18)

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case StudyOozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweet
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Setting up Hadoop YARN Clustering
Setting up Hadoop YARN ClusteringSetting up Hadoop YARN Clustering
Setting up Hadoop YARN Clustering
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similaire à Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 

Similaire à Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future (20)

MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 

Dernier

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 

Dernier (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 

Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

  • 1. © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 1
  • 2. © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 2 Page 2 Architecting the Future of Big Data Real life Hadoop Logo
  • 3. © Hortonworks Inc. 2014 Who am I? • 6.75 Hadoop-years old • Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) • Previously @Yahoo! • Now @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project and its first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters • 99% + code in Apache, Hadoop Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Past • Present • Future Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 5
  • 6. © Hortonworks Inc. 2014 What is YARN? • Resource Management Platform – MapReduce v2 – Beyond MapReduce with Tez, Storm, Spark; in Hadoop! – Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? • How is it different from Hadoop 1? .. Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 7
  • 8. © Hortonworks Inc. 2014 Key Benefits of YARN • Scale • New Programming Models & Services • Improved cluster utilization • Agility • To infinity and beyond .. Page 8
  • 9. © Hortonworks Inc. 2014 Why Migrate? • 2.0 >= 2 * 1.0 – HDFS: Lots of ground-breaking features – YARN: Next generation architecture • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 9 Architecting the Future of Big Data
  • 10. © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 10 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
  • 11. © Hortonworks Inc. 2014 Twitter Page 11 Architecting the Future of Big Data
  • 12. © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with many petabytes of data • Migrated production clusters to Hadoop-2 • Go to Mayank’s talk – “Hadoop-2 @ ebay”! – Thursday, April 3 – Track : Deployment and Operations • Should be convinced by now .. . No? Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2014 YARN: the Data Operating System Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 14
  • 15. © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale! • HDFS – High Availability for HDFS – HDFS Federation – HDFS Snapshots – NFSv3 access to data in HDFS • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 15 Architecting the Future of Big Data Apache Hadoop 2.2
  • 16. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager HA – Application History – Will cover in the 2.4 content • HDFS – Details follow.. • Number of bug-fixes, enhancements Page 16 Architecting the Future of Big Data Apache Hadoop 2.3
  • 17. © Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2014 HDFS: DataNode caching Page 18 Architecting the Future of Big Data
  • 19. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Very soon! • YARN – Details follow.. – ResourceManager restart fail-over for high availability – Preemption – Application History and timeline • HDFS – FileSystem ACLs – Rolling upgrades Page 19 Architecting the Future of Big Data Apache Hadoop 2.4
  • 20. © Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 20 Architecting the Future of Big Data ZooKeeper
  • 21. © Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 21 Architecting the Future of Big Data
  • 22. © Hortonworks Inc. 2014 Application History and Timeline • Few MR specific implementations: History and web-UI • Not just MR anymore! • History – MapReduce specific Job History Server – Beyond ResourceManager Restart • Timeline – Framework specific event collection and UIs • Run analytics on historical apps! Page 22 Architecting the Future of Big Data
  • 23. © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 23
  • 24. © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS in • YARN Page 24 Architecting the Future of Big Data
  • 25. © Hortonworks Inc. 2014 Future: Enabling more apps • Beyond MR • Discussing next – Long running services – Isolation – Multi-dimensional resource scheduling Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types • Project Slider Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2014 Other features • Application SLAs • Node labels • Node affinity/anti-affinity • Better online queue-management Page 29 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 30
  • 31. © Hortonworks Inc. 2014 Eco-system Page 31 Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama – BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications HOYA – HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace!
  • 32. © Hortonworks Inc. 2014 Apache TEZ • Moving beyond MR • A data processing framework that can execute a complex DAG of tasks. • “Apache Tez - A New Chapter in Hadoop Data Processing” – By Siddharth Seth: YARN & Tez Committer/PMC Member – Thursday, April 3 (4:20-5:00pm) Page 32 Architecting the Future of Big Data
  • 33. © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 33
  • 34. © Hortonworks Inc. 2014 Recap Page 34 Architecting the Future of Big Data • Apache Hadoop 2 is, at least, twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just HDFS & MapReduce • Architecture for the future – Centralized data – Exciting spectrum of application types, workloads and usecases
  • 35. © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 35
  • 36. © Hortonworks Inc. 2014 The Book is out! Page 36 Architecting the Future of Big Data
  • 37. © Hortonworks Inc. 2014 Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2014 Thank you! Page 38 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!

Notes de l'éditeur

  1. Graph processing – Giraph, HamaStream proessing – Smaza, Storm, Spark, DataTorrentMapReduceTez – fast query executionWeave/REEF – frameworks to help with writing applicationsList of some of the applications which already support YARN, in some form.Smaza, Storm, S4 and DataTorrent are streaming frameworksVarious types of graph processing frameworks – Giraph and Hama are graph processing systemsThere’s some github projects – caching systems, on-demand web-server spin up Wave and REEF are frameworks on top of YARN to make writing applications easier