Contenu connexe Similaire à Map Reduce v2 and YARN - CHUG - 20120604 (7) Plus de Chicago Hadoop Users Group (18) Map Reduce v2 and YARN - CHUG - 201206041. An Introduction to
MapReduce 2 and
YARN
Tom White, Cloudera
@tom_e_white
June 4, 2012
Chicago HUG
Tuesday, June 5, 2012
3. About me
• Apache Hadoop Committer,
PMC Member, Apache
Member
• Engineer at Cloudera
working on core Hadoop
• Founder of Apache Whirr
• Author of “Hadoop: The
Definitive Guide”
• http://hadoopbook.com
Tuesday, June 5, 2012
7. Motivation 1
• Scaling >4000 nodes
• Fewer, larger clusters
Tuesday, June 5, 2012
8. Motivation 2
• HA of Job Tracker
• Large, complex state
Tuesday, June 5, 2012
9. Motivation 3
• Poor resource utilization
• Slots in MR1 are for either
map or reduce
Tuesday, June 5, 2012
13. Node Manager
is a generalized Task Tracker
• Task Tracker
• fixed number of map or reduce
slots
• Node Manager
• containers with variable resource
limits
Tuesday, June 5, 2012
16. MR is user space
YARN is kernel
Tuesday, June 5, 2012
17. Bonus Apps
• Distributed shell
• MPI (MAPREDUCE-2911)
• Master-worker (MAPREDUCE-3315)
• Apache Giraph, Hama
Tuesday, June 5, 2012
20. Old API ≠ MR1
New API ≠ MR2
Tuesday, June 5, 2012
21. Old API New API
o.a.h.mapred o.a.h.mapreduce
MR1 ✓ ✓
MR2 ✓ ✓
Tuesday, June 5, 2012
23. Try out MR2
• Apache Hadoop 2.0.0-alpha
• hadoop.apache.org
• CDH4 and Cloudera Manager
• cloudera.com
• Cloud - Apache Whirr
Tuesday, June 5, 2012
24. MR1
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.0.3</version>
</dependency>
MR2
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-alpha</version>
</dependency>
Tuesday, June 5, 2012
25. TODO
• Still alpha status
• Performance tuning
• Usability bug fixes
• RM recovery
• Security in MR2 not complete
Tuesday, June 5, 2012