SlideShare une entreprise Scribd logo
1  sur  26
An Introduction to
                        MapReduce 2 and
                              YARN

                            Tom White, Cloudera
                              @tom_e_white
                               June 4, 2012
                               Chicago HUG
Tuesday, June 5, 2012
Road Trip




Tuesday, June 5, 2012
About me
                    •   Apache Hadoop Committer,
                        PMC Member, Apache
                        Member

                    •   Engineer at Cloudera
                        working on core Hadoop

                    •   Founder of Apache Whirr

                    •   Author of “Hadoop: The
                        Definitive Guide”

                        •   http://hadoopbook.com

Tuesday, June 5, 2012
First, whatʼs
                        MapReduce 1?


Tuesday, June 5, 2012
Tuesday, June 5, 2012
Whatʼs wrong with
                             MR1?


Tuesday, June 5, 2012
Motivation 1


                    •   Scaling >4000 nodes
                    • Fewer, larger clusters



Tuesday, June 5, 2012
Motivation 2


                    •   HA of Job Tracker
                    • Large, complex state



Tuesday, June 5, 2012
Motivation 3

                    •   Poor resource utilization
                    • Slots in MR1 are for either
                        map or reduce


Tuesday, June 5, 2012
Yet Another Resource Negotiator




Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Node Manager
               is a generalized Task Tracker
                    • Task Tracker
                     • fixed number of map or reduce
                        slots
                    • Node Manager
                     • containers with variable resource
                        limits

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
MR is user space
                         YARN is kernel


Tuesday, June 5, 2012
Bonus Apps

                    •   Distributed shell
                    • MPI   (MAPREDUCE-2911)



                    • Master-worker            (MAPREDUCE-3315)



                    • Apache Giraph, Hama

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Old API ≠ MR1
                        New API ≠ MR2


Tuesday, June 5, 2012
Old API         New API
                              o.a.h.mapred   o.a.h.mapreduce



                        MR1       ✓                ✓

                        MR2       ✓                ✓

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Try out MR2

                    • Apache Hadoop 2.0.0-alpha
                     • hadoop.apache.org
                    • CDH4 and Cloudera Manager
                     • cloudera.com
                    • Cloud - Apache Whirr
Tuesday, June 5, 2012
MR1
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>1.0.3</version>
   </dependency>


     MR2
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>2.0.0-alpha</version>
   </dependency>

Tuesday, June 5, 2012
TODO

                    • Still alpha status
                    • Performance tuning
                    • Usability bug fixes
                    • RM recovery
                    • Security in MR2 not complete

Tuesday, June 5, 2012
Questions?



Tuesday, June 5, 2012

Contenu connexe

En vedette

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARN
tomwhite
 

En vedette (19)

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARN
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road Ahead
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Similaire à Map Reduce v2 and YARN - CHUG - 20120604

The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
Chiradeep Vittal
 

Similaire à Map Reduce v2 and YARN - CHUG - 20120604 (7)

Ruby CI with Jenkins
Ruby CI with JenkinsRuby CI with Jenkins
Ruby CI with Jenkins
 
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bddMichael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Hadoop-2 @ eBay
Hadoop-2 @ eBayHadoop-2 @ eBay
Hadoop-2 @ eBay
 

Plus de Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
 

Plus de Chicago Hadoop Users Group (18)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Map Reduce v2 and YARN - CHUG - 20120604

  • 1. An Introduction to MapReduce 2 and YARN Tom White, Cloudera @tom_e_white June 4, 2012 Chicago HUG Tuesday, June 5, 2012
  • 3. About me • Apache Hadoop Committer, PMC Member, Apache Member • Engineer at Cloudera working on core Hadoop • Founder of Apache Whirr • Author of “Hadoop: The Definitive Guide” • http://hadoopbook.com Tuesday, June 5, 2012
  • 4. First, whatʼs MapReduce 1? Tuesday, June 5, 2012
  • 6. Whatʼs wrong with MR1? Tuesday, June 5, 2012
  • 7. Motivation 1 • Scaling >4000 nodes • Fewer, larger clusters Tuesday, June 5, 2012
  • 8. Motivation 2 • HA of Job Tracker • Large, complex state Tuesday, June 5, 2012
  • 9. Motivation 3 • Poor resource utilization • Slots in MR1 are for either map or reduce Tuesday, June 5, 2012
  • 10. Yet Another Resource Negotiator Tuesday, June 5, 2012
  • 13. Node Manager is a generalized Task Tracker • Task Tracker • fixed number of map or reduce slots • Node Manager • containers with variable resource limits Tuesday, June 5, 2012
  • 16. MR is user space YARN is kernel Tuesday, June 5, 2012
  • 17. Bonus Apps • Distributed shell • MPI (MAPREDUCE-2911) • Master-worker (MAPREDUCE-3315) • Apache Giraph, Hama Tuesday, June 5, 2012
  • 20. Old API ≠ MR1 New API ≠ MR2 Tuesday, June 5, 2012
  • 21. Old API New API o.a.h.mapred o.a.h.mapreduce MR1 ✓ ✓ MR2 ✓ ✓ Tuesday, June 5, 2012
  • 23. Try out MR2 • Apache Hadoop 2.0.0-alpha • hadoop.apache.org • CDH4 and Cloudera Manager • cloudera.com • Cloud - Apache Whirr Tuesday, June 5, 2012
  • 24. MR1 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>1.0.3</version> </dependency> MR2 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.0.0-alpha</version> </dependency> Tuesday, June 5, 2012
  • 25. TODO • Still alpha status • Performance tuning • Usability bug fixes • RM recovery • Security in MR2 not complete Tuesday, June 5, 2012