Apache Hadoop has made giant strides since the last Hadoop Summit: the community has released hadoop-1.0 after nearly 6 years and is now on the cusp of the Hadoop.next (think of it as hadoop-2.0). Given the next generation of MR is out with 0.23.0 and 0.23.1, there is a new set of features that have been requested in the community. In this talk we will talk about the next set of features like pre emption, web services and near real time analysis and how we are working on tackling these in the near future. In this talk we will also cover the roadmap for Next Gen Map Reduce and timelines along with the release schedule for Apache Hadoop.
2. Hello! I’m Arun
• Founder/Architect at Hortonworks Inc.
– Lead, Map-Reduce
– Formerly, Architect Hadoop MapReduce, Yahoo
– Responsible for running Hadoop MR as a service for all of Yahoo
(50k nodes footprint)
• Apache Hadoop, ASF
– VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
– Long-term Committer/PMC member (full time >6 years)
– Release Manager for hadoop-2
Page 2
3. Agenda
• Yesterday: Hadoop MapReduce, circa 2011
• Today: Hadoop YARN
– Overview
– State of the art
• Art of the possible
– YARN Runtime
– MapReduce Framework
• Q&A
Page 3
6. Current Limitations
• Utilization
• Scalability
– Maximum Cluster size – 4,000 nodes
– Maximum concurrent tasks – 40,000
– Coarse synchronization in JobTracker
• Single point of failure
– Failure kills all queued and running jobs
– Jobs restarted on bounce
6
7. Current Limitations
• Hard partition of resources into map and reduce slots
– Low resource utilization
• Lacks support for alternate paradigms
– Iterative applications implemented using MapReduce are
10x slower
– Hacks for the likes of MPI/Graph Processing
• Lack of wire-compatible protocols
– Client and cluster must be of same version
– Applications and workflows cannot migrate to different
clusters
7
9. Requirements
• Reliability
• Availability
• Utilization
• Wire Compatibility
• Agility & Evolution – Ability for customers to control
upgrades to the grid software stack.
• Scalability - Clusters of 6,000-10,000 machines
– Each machine with 16 cores, 48G/96G RAM, 24TB/36TB
disks
– 100,000+ concurrent tasks
– 10,000 concurrent jobs
9
10. Design Centre
• Split up the two major functions of JobTracker
– Cluster resource management
– Application life-cycle management
• MapReduce becomes user-land library
10
11. Architecture
• Application
– Application is a job submitted to the framework
– Example – Map Reduce Job
• Container
– Basic unit of allocation
– Example – container A = 2GB, 1CPU
– Replaces the fixed map/reduce slots
11
12. Architecture
• Resource Manager
– Global resource scheduler
– Hierarchical queues
• Node Manager
– Per-machine agent
– Manages the life-cycle of container
– Container resource monitoring
• Application Master
– Per-application
– Manages application scheduling and task execution
– E.g. MapReduce Application Master
12
14. How do I get it?
• Available in hadoop-2.0.0-alpha release
14
15. Performance
• 2x+ across the board
• MapReduce
– Unlock lots of improvements from Terasort record (Owen/Arun,
2009)
– Shuffle 30%+
– Merge improvements
– Small Jobs – Uber AM
– Re-use task slots (containers)
More details: http://hortonworks.com/delivering-on-hadoop-next-benchmarking-performance/
Page 15
20. YARN - Data Processing Applications
• OpenMPI on Hadoop
• Spark (UC Berkeley)
– Shark is Hive-on-Spark
• Real-time data processing
– Storm (Twitter)
– Apache S4
• Graph processing – Apache Giraph
Page 20
21. YARN - Beyond Data Processing Apps
• Apache Hbase
– Deployment via YARN (HBASE-4329)
– Co-processors via YARN (HBASE-4047)
• Simple deployment for cluster services
Page 21
22. MapReduce – Way Forward
• MapReduce Framework Runtime
– Monolithic software
• MR Runtime?
– Sort, Merge, Shuffle et al
• Unpack into smaller building blocks!
– Allow applications and Pig/Hive to ‘plug-n-play’
– MR framework, as we know today, becomes a particular
configuration of the building blocks
Page 22
23. MapReduce – Pluggable Sort
• Pig & Hive benefit from hash-based aggregation
– Several queries don’t need full-sort of map-outputs
– Aggregation suffices
– Allow for pluggable MapOutputBuffer in MapTask
– Sort Avoidance - MAPREDUCE-4039
– External sort plugin – MAPREDUCE-2454
Page 23
24. MapReduce – Pluggable Shuffle
• Push v/s Pull shuffle
• Plug shuffle implementation (already in hadoop-2)
– E.g. RDMA for shuffle
– MAPREDUCE-4049
• Collation tasks
– Sailfish - Yahoo Research (includes auto-tuning of reduces)
Page 24
25. MapReduce – More ideas
• Allow for Map-Reduce-Reduce
– Allow for reduce output to be sorted/shuffled
– JOIN followed by ORDER BY
– Really big deal for Pig/Hive
Page 25
26. MapReduce – How do we get there?
• Multiple, concurrent implementations of MapReduce
– YARN is a really big deal…
– Allows for safe experiments, much less risky!
– Exposure surface is highly limited
Page 26