SlideShare une entreprise Scribd logo
1  sur  17
The Next Generation of  Hadoop Map-Reduce Sharad Agarwal sharadag@yahoo-inc.com sharad@apache.org
About Me Hadoop Committer and PMC member Architect at Yahoo!
Hadoop Map-Reduce Today JobTracker Manages cluster resources and job scheduling TaskTracker Per-node agent Manage tasks
Current Limitations Scalability Maximum Cluster size – 4,000 nodes Maximum concurrent tasks – 40,000 Coarse synchronization in JobTracker Single point of failure	 Failure kills all queued and running jobs Jobs need to be re-submitted by users Restart is very tricky due to complex state Hard partition of resources into map and reduce slots
Current Limitations Lacks support for alternate paradigms Iterative applications implemented using Map-Reduce are 10x slower.  Example: K-Means, PageRank Lack of wire-compatible protocols  Client and cluster must be of same version Applications and workflows cannot migrate to different clusters
Next Generation Map-Reduce Requirements Reliability Availability Scalability - Clusters of 6,000 machines Each machine with 16 cores, 48G RAM, 24TB disks 100,000 concurrent tasks 10,000 concurrent jobs Wire Compatibility Agility & Evolution – Ability for customers to control upgrades to the grid software stack.
Next Generation Map-Reduce – Design Centre Split up the two major functions of JobTracker Cluster resource management Application life-cycle management Map-Reduce becomes user-land library
Architecture
Architecture Resource Manager Global resource scheduler Hierarchical queues Node Manager Per-machine agent Manages the life-cycle of container Container resource monitoring Application Master Per-application Manages application scheduling and task execution E.g. Map-Reduce Application Master
 Improvements vis-à-vis current Map-Reduce Scalability  Application life-cycle management is very expensive Partition resource management and application life-cycle management Application management is distributed Hardware trends - Currently run clusters of 4,000 machines 6,000 2012 machines > 12,000 2009 machines <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
 Improvements vis-à-vis current Map-Reduce Availability  Application Master Optional failover via application-specific checkpoint Map-Reduce applications pick up where they left off Resource Manager No single point of failure - failover via ZooKeeper Application Masters are restarted automatically
 Improvements vis-à-vis current Map-Reduce Wire Compatibility  Protocols are wire-compatible Old clients can talk to new servers Rolling upgrades
 Improvements vis-à-vis current Map-Reduce Agility / Evolution  Map-Reduce now becomes a user-land library Multiple versions of Map-Reduce can run in the same cluster (ala Apache Pig) Faster deployment cycles for improvements Customers upgrade Map-Reduce versions on their schedule
 Improvements vis-à-vis current Map-Reduce Utilization Generic resource model  Memory CPU Disk b/w Network b/w Remove fixed partition of map and reduce slots
 Improvements vis-à-vis current Map-Reduce Support for programming paradigms other than Map-Reduce MPI Master-Worker Machine Learning Iterative processing Enabled by allowing use of paradigm-specific Application Master Run all on the same Hadoop cluster
Summary The next generation of Map-Reduce takes Hadoop to the next level Scale-out even further High availability Cluster Utilization  Support for paradigms other than Map-Reduce
Questions?

Contenu connexe

En vedette

07 2
07 207 2
07 2a_b_g
 
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingMarket Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingJongwook Woo
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Jongwook Woo
 
Super Barcode Training Camp - Motorola AirDefense Wireless Security Presentation
Super Barcode Training Camp - Motorola AirDefense Wireless Security PresentationSuper Barcode Training Camp - Motorola AirDefense Wireless Security Presentation
Super Barcode Training Camp - Motorola AirDefense Wireless Security PresentationSystem ID Warehouse
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World VerticaOmer Trajman
 
DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceIraj Hedayati
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Wireless hacking and security
Wireless hacking and securityWireless hacking and security
Wireless hacking and securityAdel Zalok
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
2nd year pre clinical RPD Terminology, Components and Classification of parti...
2nd year pre clinical RPD Terminology, Components and Classification of parti...2nd year pre clinical RPD Terminology, Components and Classification of parti...
2nd year pre clinical RPD Terminology, Components and Classification of parti...Sajjad Hussain
 
Presentation on Wireless border security system
Presentation on  Wireless border security systemPresentation on  Wireless border security system
Presentation on Wireless border security systemStudent
 
Wireless Security null seminar
Wireless Security null seminarWireless Security null seminar
Wireless Security null seminarNilesh Sapariya
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Dental surveying of Removal partial denture
Dental surveying of Removal partial dentureDental surveying of Removal partial denture
Dental surveying of Removal partial dentureAli Alarasy
 
Mouth preparation for removable partial denture/ dental education in india
Mouth preparation for removable partial denture/ dental education in indiaMouth preparation for removable partial denture/ dental education in india
Mouth preparation for removable partial denture/ dental education in indiaIndian dental academy
 
Diagnosis and treatment planning of Removable Partial Denture
Diagnosis and treatment planning of Removable Partial Denture Diagnosis and treatment planning of Removable Partial Denture
Diagnosis and treatment planning of Removable Partial Denture dwijk
 
Mouth preparation for removable partial dentures /certified fixed orthodontic...
Mouth preparation for removable partial dentures /certified fixed orthodontic...Mouth preparation for removable partial dentures /certified fixed orthodontic...
Mouth preparation for removable partial dentures /certified fixed orthodontic...Indian dental academy
 

En vedette (20)

The 3 to 5 Rule
The 3 to 5 RuleThe 3 to 5 Rule
The 3 to 5 Rule
 
07 2
07 207 2
07 2
 
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingMarket Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
 
Super Barcode Training Camp - Motorola AirDefense Wireless Security Presentation
Super Barcode Training Camp - Motorola AirDefense Wireless Security PresentationSuper Barcode Training Camp - Motorola AirDefense Wireless Security Presentation
Super Barcode Training Camp - Motorola AirDefense Wireless Security Presentation
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World Vertica
 
DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-Reduce
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Wireless hacking and security
Wireless hacking and securityWireless hacking and security
Wireless hacking and security
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
2nd year pre clinical RPD Terminology, Components and Classification of parti...
2nd year pre clinical RPD Terminology, Components and Classification of parti...2nd year pre clinical RPD Terminology, Components and Classification of parti...
2nd year pre clinical RPD Terminology, Components and Classification of parti...
 
Presentation on Wireless border security system
Presentation on  Wireless border security systemPresentation on  Wireless border security system
Presentation on Wireless border security system
 
Wireless Security null seminar
Wireless Security null seminarWireless Security null seminar
Wireless Security null seminar
 
types of dental surveyor
types of dental surveyortypes of dental surveyor
types of dental surveyor
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Dental surveying of Removal partial denture
Dental surveying of Removal partial dentureDental surveying of Removal partial denture
Dental surveying of Removal partial denture
 
Mouth preparation for removable partial denture/ dental education in india
Mouth preparation for removable partial denture/ dental education in indiaMouth preparation for removable partial denture/ dental education in india
Mouth preparation for removable partial denture/ dental education in india
 
Diagnosis and treatment planning of Removable Partial Denture
Diagnosis and treatment planning of Removable Partial Denture Diagnosis and treatment planning of Removable Partial Denture
Diagnosis and treatment planning of Removable Partial Denture
 
Mouth preparation for removable partial dentures /certified fixed orthodontic...
Mouth preparation for removable partial dentures /certified fixed orthodontic...Mouth preparation for removable partial dentures /certified fixed orthodontic...
Mouth preparation for removable partial dentures /certified fixed orthodontic...
 

Similaire à Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal

YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011Sharad Agarwal
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReducehuguk
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenHadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenInMobi
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkShay Hassidim
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at lohitvijayarenu
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 

Similaire à Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal (20)

YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenHadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal

  • 1. The Next Generation of Hadoop Map-Reduce Sharad Agarwal sharadag@yahoo-inc.com sharad@apache.org
  • 2. About Me Hadoop Committer and PMC member Architect at Yahoo!
  • 3. Hadoop Map-Reduce Today JobTracker Manages cluster resources and job scheduling TaskTracker Per-node agent Manage tasks
  • 4. Current Limitations Scalability Maximum Cluster size – 4,000 nodes Maximum concurrent tasks – 40,000 Coarse synchronization in JobTracker Single point of failure Failure kills all queued and running jobs Jobs need to be re-submitted by users Restart is very tricky due to complex state Hard partition of resources into map and reduce slots
  • 5. Current Limitations Lacks support for alternate paradigms Iterative applications implemented using Map-Reduce are 10x slower. Example: K-Means, PageRank Lack of wire-compatible protocols Client and cluster must be of same version Applications and workflows cannot migrate to different clusters
  • 6. Next Generation Map-Reduce Requirements Reliability Availability Scalability - Clusters of 6,000 machines Each machine with 16 cores, 48G RAM, 24TB disks 100,000 concurrent tasks 10,000 concurrent jobs Wire Compatibility Agility & Evolution – Ability for customers to control upgrades to the grid software stack.
  • 7. Next Generation Map-Reduce – Design Centre Split up the two major functions of JobTracker Cluster resource management Application life-cycle management Map-Reduce becomes user-land library
  • 9. Architecture Resource Manager Global resource scheduler Hierarchical queues Node Manager Per-machine agent Manages the life-cycle of container Container resource monitoring Application Master Per-application Manages application scheduling and task execution E.g. Map-Reduce Application Master
  • 10. Improvements vis-à-vis current Map-Reduce Scalability Application life-cycle management is very expensive Partition resource management and application life-cycle management Application management is distributed Hardware trends - Currently run clusters of 4,000 machines 6,000 2012 machines > 12,000 2009 machines <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
  • 11. Improvements vis-à-vis current Map-Reduce Availability Application Master Optional failover via application-specific checkpoint Map-Reduce applications pick up where they left off Resource Manager No single point of failure - failover via ZooKeeper Application Masters are restarted automatically
  • 12. Improvements vis-à-vis current Map-Reduce Wire Compatibility Protocols are wire-compatible Old clients can talk to new servers Rolling upgrades
  • 13. Improvements vis-à-vis current Map-Reduce Agility / Evolution Map-Reduce now becomes a user-land library Multiple versions of Map-Reduce can run in the same cluster (ala Apache Pig) Faster deployment cycles for improvements Customers upgrade Map-Reduce versions on their schedule
  • 14. Improvements vis-à-vis current Map-Reduce Utilization Generic resource model Memory CPU Disk b/w Network b/w Remove fixed partition of map and reduce slots
  • 15. Improvements vis-à-vis current Map-Reduce Support for programming paradigms other than Map-Reduce MPI Master-Worker Machine Learning Iterative processing Enabled by allowing use of paradigm-specific Application Master Run all on the same Hadoop cluster
  • 16. Summary The next generation of Map-Reduce takes Hadoop to the next level Scale-out even further High availability Cluster Utilization Support for paradigms other than Map-Reduce