This the is presentation on Hadoop 2.0 YARN for the webinar happened on 16th Nov 2013
Link to the webinar: https://plus.google.com/u/0/events/cq1u9u027fdd0emd8h0k55kcnu8
2. Hadoop Intro
●
Apache Hadoop is an open-source software framework that supports dataintensive distributed applications.
●
Supports running of applications on large clusters of commodity hardware.
●
Task are divided into Map-Reduce framework
●
Provides a distributed file system that stores data on the compute nodes.
5. Drawbacks of Hadoop 1.0
●
Cluster is tightly couple with Hadoop.
●
Cascading failures,.
6. What is Hadoop 2.0
● Re-architectured Hadoop is complete overhaul of 0.23 branch.
● Introduced YARN and MR2.
● Enhanced resource scheduler.
● Efficient utilization of cluster by running apps apart from MR Jobs.
9. Components of YARN
● ResourceManager
● NodeManager
● ApplicationMaster
● History Server
10. ResourceManager
The ResourceManager is the ultimate authority in Hadoop cluster. Which utilise
resources among all the applications in the system. All the negotiations of resources
are done from the ResourceManager.
11. Components of Resource Manager
Scheduler
The Scheduler is responsible for allocating resources to the various running
applications.
ApplicationsManager
The ApplicationsManager is responsible for accepting job-submissions, negotiating
the first container for executing the application specific ApplicationMaster and
provides the service for restarting the ApplicationMaster container on failure.
12. NodeManager
The NodeManager is the per-machine agent who is responsible monitoring the
resources for the respective machine it is running on and report the same to the
ResourceManager.
Containers are allocated on NodeManager to perform the task assigned
13. ApplicationMaster
●
●
●
It is a specific library for negotiating resources from the ResourceManager and
working with the NodeManager(s) to execute the task on containers and the
monitor the same.
ApplicationMaster has the responsibility of negotiating resource containers
from the Scheduler for the tasks.
Provides communication port to users to communicate with Application
Master.
16. YARN Solution
●
Apache YARN, will provide a framework on which various application
can execute.
●
Hadoop backers expect that the advent of Yarn could open the
floodgates for new applications being built to run on Hadoop.
●
Various projects, like Apache Tez, have been created to do more
advanced data processing compared to what MapReduce specializes in.
●
YARN promotes effective utilization of resources while providing
distributed environment for application execution
17. Current use case on YARN
Samza: Linked-In Release
Apache Samza is a distributed stream
processing framework. It uses Apache Kafka
for messaging, and Apache Hadoop YARN to
provide fault tolerance, processor isolation,
security, and resource management
Storm-YARN
Streaming IN Hadoop: Yahoo! release
Storm-YARN enables Storm applications to utilize the computational
resources in a Hadoop cluster along with accessing Hadoop storage
resources such as HBase and HDFS.