2. An Overview of Hadoop
Hadoop is a open-source tool which can be used
effectively in processing huge volumes of data sets. It
works in a distributed computing scenario. Hadoop is
one of the best solution for addressing the issue of big
data.
Newyorksys has the best trainers who provides the best
online training for Hadoop by using the state of the art
training methodologies
3. Agenda
What is Hadoop.
Why do we need Hadoop.
How Hadoop works.
HDFS Architecture.
What is Map – Reduce.
Hadoop Cluster.
Hadoop Processes.
Topology of a Hadoop Cluster.
Distinction of Hadoop Framework .
Prerequisites to learn hadoop.
4. What is Hadoop
Hadoop is an open Sourse Framework.
Developed by Apache Software Foundation.
Used for distributed processing of large date sets.
It works across clusters of computers using a simple
programming model (Map-Reduce).
5. Why do we need Hadoop
Data is growing faster.
Need to process multi petabytes of data.
The performance of traditional applications is
decreasing.
The number of machines in a cluster is not constant.
Failure is expected, rather than exceptional.
6. How Hadoop Works
The Hadoop core consists of two modules :
Hadoop Distributed File System (HDFS) [Storage].
Map Reduce [Processing].
Mapper
Reducer
8. What is Map – Reduce
Map Reduce plays a key role in hadoop framework.
Map Reduce is a Programming model for writing
applications that rapidly process large amount of data.
Mapper – is a function that processes input data to
generate intermediate output data.
Reducer – Merges all intermediate data from all
mappers and generate final output data.
9. Hadoop Cluster
A Hadoop Cluster consist of multiple machines Which
can be classified into 3 types
Namenode
Secondary Namenode
Datanode
10. Hadoop Processes
Below are the daemons (Processes) Which runs in a
cluster.
Name node (Runs on a master machine)
Job Tracker (Runs on a master machine)
Data node (Runs on slave machines)
Task Tracker (Runs on slave machines)
12. Distinction
Simple – Hadoop allows users to quickly write efficient
parallel code.
Reliable – Because Hadoop runs on commodity
hardware, it can face frequent automatically handle
such failures.
Scalable – we can increase or decrease the number of
nodes (machine) in hadoop cluster.
13. Prerequisites
Linux bases operating system (Mac
OS, Redhat, ubuntu)
Java 1.6 or higher version
Disk space ( To hold HDFS data and it’s replications )
Ram (Recommended 2GB)
A cluster of computers.
You can even install Hadoop on single machine.
14. Newyorksys.com
NewyorkSys is one of the leading top Training and
Consulting Company in US. We have certified trainers.
We will provide Online Training, Fast Track online
training, with job assistance. We are providing
excellent Training in all courses. We also help you in
resume preparation and provide job assistance till you
get job.
For more details Visit : http://www.newyorksys.com
15 Roaring Brook Rd, Chappaqua, NY 10514.
USA: +1-718-313-0499 & 718-305-1757
E:enquiry@newyorksys.us