SlideShare une entreprise Scribd logo
1  sur  139
Machine
Learning Basics
An Introduction
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
What’s in it for you?
What’s in it for you?
Why Hadoop?
What’s in it for you?
Why Hadoop?
What is Hadoop?
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFS
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
Use case of Hadoop
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
Use case of Hadoop
Demo on HDFS, MapReduce
and YARN
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Why Hadoop?
In a town far away..
Tim sells food grains in his shop
The customers were happy as Tim was very quick
with the orders
Tim sensed a good demand for other products, so he
thought of expanding his business
He started selling fruits, vegetables, meat, and dairy
products in addition to food grains
But it wasn’t as easy as he expected it to be. The
number of customers increased, and he was not
able to cater to their needs on time
He had to look into assisting his customers with
each of their orders and billing. It was too difficult for
him to manage alone
To start delivering orders on time and to manage the
customers’ demands, Tim hired 3 more people to
work with him
Matt took care of the fruits and vegetable section.
Luke handled the dairy and meat section. Ann was
appointed as the cashier
Matt
Luke
Ann
Tim
However, this was still not a solution to Tim’s problem
as there was not enough space in the shop for all the
items
Storage area
The storage was a bottleneck since storing and accessing
became more and more difficult with increased supply and
demand
Storage area
Tim came up with an idea to overcome this issue. He
decided to expand the storage area and distribute each
category of product on different floors
Now, customers were happy, and after picking up their
products from the respective sections, it was then billed
Now, customers were happy, and after picking up their
products from the respective sections, it was then billed
Now, let us compare this story to big data
Earlier, data was generated at a moderate rate, and all the
data was structured in nature. One processor was enough to
process all of it
With the increase in data generation, different types of data
were generated at high speed. It became difficult for a single
processor to process different types of data
Massive amount of different types of data which cannot be
processed and stored using traditional databases is known as
big data
To overcome this issue, multiple processors were used to
process each type of data
But now the problem was that one storage system was
accessed by all the processors and the storage became the
bottleneck
Just like how Tim adopted the distributed approach, the
storage system was also distributed and by doing so, the data
was stored in individual databases
Just like how Tim adopted the distributed approach, the
storage system was also distributed and by doing so, the data
was stored in individual databases
Through this story, we see the two approaches that are
used by Hadoop that is HDFS and MapReduce
HDFS refers to the distributed storage space just like how Tim distributed the
storage space amongst the various sections
Each person took care of a separate section and at the end the customers
went to the cashier for the final billing, this sorted the process and made it
easier. This is how Hadoop MapReduce works
This was a rough story of big data
generation and why Hadoop is required. I
will now explain in detail as to what
Hadoop is
This sounds interesting. I would like
to know more about Hadoop
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
What is Hadoop?
What is Hadoop?
Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
What is Hadoop?
Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
BIG DATA
That sounds interesting, so how
does Hadoop store and process all
of this big data?
Hadoop has individual components, which
are used for storing and processing big
data
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
The processing unit of Hadoop
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
The processing unit of Hadoop
The resource management unit of Hadoop
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop HDFS
What is HDFS?
Each block of data is stored on multiple
systems and by default has 128 MB of data
Data
Datanode Datanode Datanode
Hadoop Distributed File System (HDFS) is known for its distributed storage method.
It distributes the data amongst many computers. In addition to this, replication of
data is also done to avoid loss of data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
Let us now see how 500 MB of data is stored in the
traditional method
500 MB data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
Let us now see how 500 MB of data is stored in the
traditional method
Here, the entire set of data is stored in one
database. This overloads the database, and if it
crashes, we lose all our data
500 MB data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
What is HDFS?
Using Hadoop HDFS, this problem is taken care of as data is distributed amongst
many systems
Using Hadoop HDFS, this problem is taken care of
as data is distributed amongst many databases
By doing so, a single database is not
overloaded
500 MB data
What is HDFS?
.
.
.
Using Hadoop HDFS, this problem is taken care of as data is distributed amongst
many systems
Hadoop Distributed File System (HDFS) is specially designed for
storing massive datasets in commodity hardware
What is HDFS?
What is HDFS?
HDFS has two main components that help
with its storage
NameNode DataNode
Hadoop Distributed File System (HDFS) is specially designed for
storing massive datasets in commodity hardware
What is HDFS?
DataNode DataNode DataNode DataNode
NameNode
• NameNode is the master of the
system
• It stores all the metadata
NameNode
What is HDFS?
DataNode DataNode DataNode DataNode
• NameNode is the master of the
system
• It stores all the metadata• DataNode is known as the slave
node. There are multiple
DataNodes
• It performs the read/write
operations and stores the actual
data
What is HDFS?
NameNode
DataNode DataNode DataNode DataNode
• NameNode manages all the
DataNodes
• The DataNodes send signals
known as heartbeats to the
NameNode. This signal gives the
status of the DataNode
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block DBlock C
128 MB 128 MB128 MB 128 MB
Block A
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
The final block uses
only the remaining
space for storage
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
All these data blocks are stored
in DataNodes – computers
DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What happens if the computer that
contains block A crashes? Do we lose
the data in block A?
No, we don’t. That’s the beauty of Hadoop
HDFS. It uses replication to prevent the
loss of data
c
Rack 1
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Block ADN 1
c
Rack 1 Rack 2
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Block ADN 1 DN 1
Block ADN 5
Block A is replicated. The
replication factor is 3. The
replicas are stored in
different DataNodes
Block ADN 6
2 replicas cannot be stored on the same datanode
c
Rack 1 Rack 2
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Rack 3 Rack 4 Rack 5
Similarly, every other
block is replicated
Block ADN 1
Block DDN 2
DN 1
Block ADN 5
Block DDN 10Block BDN 4 Block CDN 7
Block CDN 11
Block EDN 13
Block DDN 14
DN 12Block ADN 6
Block BDN 8
Block BDN 9 Block CDN 15Block EDN 3 Block EDN 12
Architecture of HDFS
Stores
Metadata (Name, replicas, ….)
NameNode
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
…..….
Architecture of HDFS
…..….
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Rack is a collection
of DataNodes
Replication
Rack 1 Rack 2
Architecture of HDFS
Metadata ops Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Read
request?
Replication
…..….
Architecture of HDFS
Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Replication
…..….
Architecture of HDFS
Read
request?
Okay, read data from
DataNodes
Read permission
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Here is the data
that is read
ReplicationRead
data
…..….
Architecture of HDFS
Metadata ops
Client
Read
request?
Metadata ops Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Write Write
Client
ReplicationRead
data
…..….
Architecture of HDFS
Features of HDFS
HDFS is fault tolerant as
multiple copies of data are
made
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Provides end-to-end
encryption that protects
data
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Multiple nodes can be
added to the cluster
depending on the
requirement
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Hadoop is flexible in storing any type
of data, like structured, semi
structured or unstructured data
Fault tolerant Data security Scalability Flexibility
Now that we have stored data in
HDFS, how can we process it?
For processing data, Hadoop has a unit
known as MapReduce
In the traditional approach, big data was processed at the master node
Why MapReduce?
big data
In the traditional approach, big data was processed at the master node
Why MapReduce?
Master
Slave Slave
Slave Slave
big data
This was a disadvantage as it consumed more time to process various types of
data
Master
Slave Slave
Slave Slave
Why MapReduce?
big data
To overcome this issue, data was processed at each slave node. This approach
is known as MapReduce
Master
Slave Slave
Slave Slave
Why MapReduce?
big data
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop MapReduce
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
MapReduce tasks
Map tasks Reduce tasks
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Input Data is divided to form the input splits
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Map phase is the first phase, here data in each split is passed to produce output
values
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
In the shuffle and sort phase, output of mapping phase is taken and similar data
is grouped
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Here, the output values from the shuffling phase are aggregated. It then returns
a single output value
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Input Splits
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Input Splits
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
Map phase
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
easy, 1
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
Reducer phase
easy, 1easy, 1
Hadoop, 3
interesting, 1
is, 2
to, 1
Welcome, 1
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase Final Output
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
Reducer phase
easy, 1
easy 1
Hadoop 3
interesting 1
is 2
to 1
Welcome 1
easy, 1
Hadoop, 3
interesting, 1
is, 2
to, 1
Welcome, 1
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
Map task +
Reduce task
Splitting the stages into Map and
Reduce tasks improves the load
balancing
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
There is an automatic re-execution if a
certain task fails
Map task +
Reduce task
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
MapReduce has one of the simplest
programming model which is based on
Java. Java is a very common
programming language
Map task +
Reduce task
HDFS and MapReduce were the two units
of Hadoop 1.0
Hadoop 1.0 was also known as
MapReduce Version 1
The disadvantage with this version was
that the Job tracker did both the
processing of data and resource
allocation
As a result, Job tracker was overburdened
due to handling job scheduling, and
resource management
To overcome this issue, Hadoop 2
introduced YARN as the processing layer
that supported many frameworks
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop YARN
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
It is the master daemon. Manages the assignment of
resources such as CPU, memory
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
Node
Manager
It is the slave daemon. It reports the resource
usage to the Resource Manager
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
Application
Master
Node
Manager
Works with the negotiation of resources from resource
manager and works with node manager
What is YARN?
Client
Client
What is YARN?
Client
Client
Resource
Manager
What is YARN?
Client
Client
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
What is YARN?
Client
Client
Resource
Manager
Node
Manager
container
Node
Manager
Node
Manager
container
container container
Container is a collection of physical resources such as CPU, RAM
What is YARN?
Client
Client
Resource
Manager
Node
Manager
App Master
App Master
Node
Manager
Node
Manager
container
container
container container
App Master requests container to Resource Manager. It uses
container allocated by Node Manager
Node
Manager
App Master
App Master
Node
Manager
Node
Manager
container
container
container container
What is YARN?
Client
Client
Application
Resource
Manager
Client program sends application request to the resource
manager
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Node status
Job request
Node manager updates the status of the nodes to the resource
manager
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Job request
Resource Manager contacts the Node Manager requesting for
resources(containers). The Node Manager grants the request
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Job request
App Master contacts the Node Manager to use the container and runs in
one of the container allocated on one of the nodes
Features of YARN
Job scheduling Multitenancy
YARN is responsible to
process job requests and
allocate resources
Scalability
Features of YARN
Job scheduling Multitenancy
Different versions of MapReduce
can run on YARN. This makes
upgrading of MapReduce
manageable
Scalability
Features of YARN
Job scheduling Multitenancy Scalability
Depending on the requirement, the
number of nodes can be increased
Many companies use Hadoop for storing
and processing data. Now, let me tell you
about one such company
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Use case - Pinterest
You would have probably heard of the
popular image sharing website Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Problem
Pinterest faced a challenge in processing tremendous amount of data
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Problem
Pinterest faced a challenge in processing tremendous amount of data
There was a difficulty in analyzing which data needs to be displayed in a user’s
personalized discovery engine
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest uses Hadoop to process and analyze big data in a way that it helps
the company to show the most relevant content to its users
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest uses Hadoop to process and analyze big data in a way that it helps
the company to show the most relevant content to its users
Through continuous analysis of the data, Pinterest can provide its users with
features such as related pins, guided search and so on
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
This is how Pinterest benefited from
Hadoop. Let’s also start using Hadoop to
put an end to the big data challenges we
are facing
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Demo on HDFS, MapReduce
and YARN
Key Takeaways
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

Contenu connexe

Tendances

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 

Tendances (20)

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 

Similaire à Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxSakthiVinoth78
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystemrohitraj268
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaEdureka!
 
data analytics lecture 3.2.ppt
data analytics lecture 3.2.pptdata analytics lecture 3.2.ppt
data analytics lecture 3.2.pptRutujaPatil247341
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file systemJohn Veigas
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePete Kisich
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopYahoo Developer Network
 

Similaire à Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn (20)

What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
 
Hadoop – big deal
Hadoop – big dealHadoop – big deal
Hadoop – big deal
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
data analytics lecture 3.2.ppt
data analytics lecture 3.2.pptdata analytics lecture 3.2.ppt
data analytics lecture 3.2.ppt
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
hadoop
hadoophadoop
hadoop
 

Plus de Simplilearn

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in CybersecuritySimplilearn
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptxSimplilearn
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023 Simplilearn
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Simplilearn
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Simplilearn
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...Simplilearn
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Simplilearn
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...Simplilearn
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Simplilearn
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...Simplilearn
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Simplilearn
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Simplilearn
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Simplilearn
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...Simplilearn
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...Simplilearn
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...Simplilearn
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...Simplilearn
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Simplilearn
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...Simplilearn
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...Simplilearn
 

Plus de Simplilearn (20)

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in Cybersecurity
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptx
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
 

Dernier

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Dernier (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

  • 2. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism What’s in it for you?
  • 3. What’s in it for you? Why Hadoop?
  • 4. What’s in it for you? Why Hadoop? What is Hadoop?
  • 5. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFS
  • 6. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce
  • 7. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN
  • 8. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN Use case of Hadoop
  • 9. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN Use case of Hadoop Demo on HDFS, MapReduce and YARN
  • 10. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Why Hadoop?
  • 11. In a town far away..
  • 12. Tim sells food grains in his shop
  • 13. The customers were happy as Tim was very quick with the orders
  • 14. Tim sensed a good demand for other products, so he thought of expanding his business
  • 15. He started selling fruits, vegetables, meat, and dairy products in addition to food grains
  • 16. But it wasn’t as easy as he expected it to be. The number of customers increased, and he was not able to cater to their needs on time
  • 17. He had to look into assisting his customers with each of their orders and billing. It was too difficult for him to manage alone
  • 18. To start delivering orders on time and to manage the customers’ demands, Tim hired 3 more people to work with him
  • 19. Matt took care of the fruits and vegetable section. Luke handled the dairy and meat section. Ann was appointed as the cashier Matt Luke Ann Tim
  • 20. However, this was still not a solution to Tim’s problem as there was not enough space in the shop for all the items Storage area
  • 21. The storage was a bottleneck since storing and accessing became more and more difficult with increased supply and demand Storage area
  • 22. Tim came up with an idea to overcome this issue. He decided to expand the storage area and distribute each category of product on different floors
  • 23. Now, customers were happy, and after picking up their products from the respective sections, it was then billed
  • 24. Now, customers were happy, and after picking up their products from the respective sections, it was then billed Now, let us compare this story to big data
  • 25. Earlier, data was generated at a moderate rate, and all the data was structured in nature. One processor was enough to process all of it
  • 26. With the increase in data generation, different types of data were generated at high speed. It became difficult for a single processor to process different types of data
  • 27. Massive amount of different types of data which cannot be processed and stored using traditional databases is known as big data
  • 28. To overcome this issue, multiple processors were used to process each type of data
  • 29. But now the problem was that one storage system was accessed by all the processors and the storage became the bottleneck
  • 30. Just like how Tim adopted the distributed approach, the storage system was also distributed and by doing so, the data was stored in individual databases
  • 31. Just like how Tim adopted the distributed approach, the storage system was also distributed and by doing so, the data was stored in individual databases Through this story, we see the two approaches that are used by Hadoop that is HDFS and MapReduce
  • 32. HDFS refers to the distributed storage space just like how Tim distributed the storage space amongst the various sections
  • 33. Each person took care of a separate section and at the end the customers went to the cashier for the final billing, this sorted the process and made it easier. This is how Hadoop MapReduce works
  • 34. This was a rough story of big data generation and why Hadoop is required. I will now explain in detail as to what Hadoop is
  • 35. This sounds interesting. I would like to know more about Hadoop
  • 36. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism What is Hadoop?
  • 37. What is Hadoop? Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
  • 38. What is Hadoop? Hadoop is a framework which stores and processes big data in a distributed and parallel fashion BIG DATA
  • 39. That sounds interesting, so how does Hadoop store and process all of this big data?
  • 40. Hadoop has individual components, which are used for storing and processing big data
  • 41. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop
  • 42. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop The processing unit of Hadoop
  • 43. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop The processing unit of Hadoop The resource management unit of Hadoop
  • 44. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop HDFS
  • 45. What is HDFS? Each block of data is stored on multiple systems and by default has 128 MB of data Data Datanode Datanode Datanode Hadoop Distributed File System (HDFS) is known for its distributed storage method. It distributes the data amongst many computers. In addition to this, replication of data is also done to avoid loss of data
  • 46. What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 47. Let us now see how 500 MB of data is stored in the traditional method 500 MB data What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 48. Let us now see how 500 MB of data is stored in the traditional method Here, the entire set of data is stored in one database. This overloads the database, and if it crashes, we lose all our data 500 MB data What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 49. What is HDFS? Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many systems
  • 50. Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many databases By doing so, a single database is not overloaded 500 MB data What is HDFS? . . . Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many systems
  • 51. Hadoop Distributed File System (HDFS) is specially designed for storing massive datasets in commodity hardware What is HDFS?
  • 52. What is HDFS? HDFS has two main components that help with its storage NameNode DataNode Hadoop Distributed File System (HDFS) is specially designed for storing massive datasets in commodity hardware
  • 53. What is HDFS? DataNode DataNode DataNode DataNode NameNode • NameNode is the master of the system • It stores all the metadata
  • 54. NameNode What is HDFS? DataNode DataNode DataNode DataNode • NameNode is the master of the system • It stores all the metadata• DataNode is known as the slave node. There are multiple DataNodes • It performs the read/write operations and stores the actual data
  • 55. What is HDFS? NameNode DataNode DataNode DataNode DataNode • NameNode manages all the DataNodes • The DataNodes send signals known as heartbeats to the NameNode. This signal gives the status of the DataNode
  • 56. As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB What is HDFS?
  • 57. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 58. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 59. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block DBlock C 128 MB 128 MB128 MB 128 MB Block A As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 60. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 61. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A The final block uses only the remaining space for storage As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 62. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5 As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 63. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A All these data blocks are stored in DataNodes – computers DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5 As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 64. What happens if the computer that contains block A crashes? Do we lose the data in block A?
  • 65. No, we don’t. That’s the beauty of Hadoop HDFS. It uses replication to prevent the loss of data
  • 66. c Rack 1 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Block ADN 1
  • 67. c Rack 1 Rack 2 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Block ADN 1 DN 1 Block ADN 5 Block A is replicated. The replication factor is 3. The replicas are stored in different DataNodes Block ADN 6 2 replicas cannot be stored on the same datanode
  • 68. c Rack 1 Rack 2 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Rack 3 Rack 4 Rack 5 Similarly, every other block is replicated Block ADN 1 Block DDN 2 DN 1 Block ADN 5 Block DDN 10Block BDN 4 Block CDN 7 Block CDN 11 Block EDN 13 Block DDN 14 DN 12Block ADN 6 Block BDN 8 Block BDN 9 Block CDN 15Block EDN 3 Block EDN 12
  • 69. Architecture of HDFS Stores Metadata (Name, replicas, ….) NameNode
  • 70. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode …..…. Architecture of HDFS
  • 71. …..…. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Rack is a collection of DataNodes Replication Rack 1 Rack 2 Architecture of HDFS
  • 72. Metadata ops Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Read request? Replication …..…. Architecture of HDFS
  • 73. Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Replication …..…. Architecture of HDFS Read request? Okay, read data from DataNodes Read permission
  • 74. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Here is the data that is read ReplicationRead data …..…. Architecture of HDFS Metadata ops Client Read request?
  • 75. Metadata ops Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Write Write Client ReplicationRead data …..…. Architecture of HDFS
  • 76. Features of HDFS HDFS is fault tolerant as multiple copies of data are made Fault tolerant Data security Scalability Flexibility
  • 77. Features of HDFS Provides end-to-end encryption that protects data Fault tolerant Data security Scalability Flexibility
  • 78. Features of HDFS Multiple nodes can be added to the cluster depending on the requirement Fault tolerant Data security Scalability Flexibility
  • 79. Features of HDFS Hadoop is flexible in storing any type of data, like structured, semi structured or unstructured data Fault tolerant Data security Scalability Flexibility
  • 80. Now that we have stored data in HDFS, how can we process it?
  • 81. For processing data, Hadoop has a unit known as MapReduce
  • 82. In the traditional approach, big data was processed at the master node Why MapReduce? big data
  • 83. In the traditional approach, big data was processed at the master node Why MapReduce? Master Slave Slave Slave Slave big data
  • 84. This was a disadvantage as it consumed more time to process various types of data Master Slave Slave Slave Slave Why MapReduce? big data
  • 85. To overcome this issue, data was processed at each slave node. This approach is known as MapReduce Master Slave Slave Slave Slave Why MapReduce? big data
  • 86. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop MapReduce
  • 87. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce
  • 88. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce MapReduce tasks Map tasks Reduce tasks
  • 89. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Input Data is divided to form the input splits
  • 90. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Map phase is the first phase, here data in each split is passed to produce output values
  • 91. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() In the shuffle and sort phase, output of mapping phase is taken and similar data is grouped
  • 92. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Here, the output values from the shuffling phase are aggregated. It then returns a single output value
  • 93. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example
  • 94. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy
  • 95. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy Welcome to Hadoop Hadoop is interesting Hadoop is easy Input Splits
  • 96. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy Welcome to Hadoop Hadoop is interesting Hadoop is easy Input Splits Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 Map phase
  • 97. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 easy, 1
  • 98. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 Reducer phase easy, 1easy, 1 Hadoop, 3 interesting, 1 is, 2 to, 1 Welcome, 1
  • 99. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Final Output Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 Reducer phase easy, 1 easy 1 Hadoop 3 interesting 1 is 2 to 1 Welcome 1 easy, 1 Hadoop, 3 interesting, 1 is, 2 to, 1 Welcome, 1
  • 100. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model Map task + Reduce task Splitting the stages into Map and Reduce tasks improves the load balancing
  • 101. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model There is an automatic re-execution if a certain task fails Map task + Reduce task
  • 102. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model MapReduce has one of the simplest programming model which is based on Java. Java is a very common programming language Map task + Reduce task
  • 103. HDFS and MapReduce were the two units of Hadoop 1.0
  • 104. Hadoop 1.0 was also known as MapReduce Version 1
  • 105. The disadvantage with this version was that the Job tracker did both the processing of data and resource allocation
  • 106. As a result, Job tracker was overburdened due to handling job scheduling, and resource management
  • 107. To overcome this issue, Hadoop 2 introduced YARN as the processing layer that supported many frameworks
  • 108. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop YARN
  • 109. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop
  • 110. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager It is the master daemon. Manages the assignment of resources such as CPU, memory
  • 111. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager Node Manager It is the slave daemon. It reports the resource usage to the Resource Manager
  • 112. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager Application Master Node Manager Works with the negotiation of resources from resource manager and works with node manager
  • 116. What is YARN? Client Client Resource Manager Node Manager container Node Manager Node Manager container container container Container is a collection of physical resources such as CPU, RAM
  • 117. What is YARN? Client Client Resource Manager Node Manager App Master App Master Node Manager Node Manager container container container container App Master requests container to Resource Manager. It uses container allocated by Node Manager
  • 118. Node Manager App Master App Master Node Manager Node Manager container container container container What is YARN? Client Client Application Resource Manager Client program sends application request to the resource manager
  • 119. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Node status Job request Node manager updates the status of the nodes to the resource manager
  • 120. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Job request Resource Manager contacts the Node Manager requesting for resources(containers). The Node Manager grants the request
  • 121. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Job request App Master contacts the Node Manager to use the container and runs in one of the container allocated on one of the nodes
  • 122. Features of YARN Job scheduling Multitenancy YARN is responsible to process job requests and allocate resources Scalability
  • 123. Features of YARN Job scheduling Multitenancy Different versions of MapReduce can run on YARN. This makes upgrading of MapReduce manageable Scalability
  • 124. Features of YARN Job scheduling Multitenancy Scalability Depending on the requirement, the number of nodes can be increased
  • 125. Many companies use Hadoop for storing and processing data. Now, let me tell you about one such company
  • 126. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Use case - Pinterest
  • 127. You would have probably heard of the popular image sharing website Pinterest
  • 128.
  • 129. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Pinterest is a social media platform which allows you to pin any interesting information you find on its site
  • 130. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 131. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Problem Pinterest faced a challenge in processing tremendous amount of data Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 132. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Problem Pinterest faced a challenge in processing tremendous amount of data There was a difficulty in analyzing which data needs to be displayed in a user’s personalized discovery engine Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 133. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 134. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest uses Hadoop to process and analyze big data in a way that it helps the company to show the most relevant content to its users Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 135. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest uses Hadoop to process and analyze big data in a way that it helps the company to show the most relevant content to its users Through continuous analysis of the data, Pinterest can provide its users with features such as related pins, guided search and so on Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 136. This is how Pinterest benefited from Hadoop. Let’s also start using Hadoop to put an end to the big data challenges we are facing
  • 137. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Demo on HDFS, MapReduce and YARN

Notes de l'éditeur

  1. Style - 01
  2. Style - 01
  3. Style - 01
  4. Style - 01
  5. Style - 01
  6. Style - 01
  7. Style - 01
  8. Style - 01
  9. Style - 01
  10. Style - 01
  11. Style - 01
  12. Style - 01
  13. Style - 01
  14. Style - 01
  15. Style - 01
  16. Style - 01
  17. Style - 01
  18. Style - 01
  19. Style - 01
  20. Style - 01
  21. Style - 01
  22. Style - 01
  23. Style - 01
  24. Style - 01
  25. Style - 01
  26. Style - 01
  27. Style - 01
  28. Style - 01
  29. Style - 01
  30. Style - 01
  31. Style - 01
  32. Style - 01
  33. Style - 01
  34. Style - 01
  35. Style - 01
  36. Style - 01
  37. Style - 01
  38. Style - 01
  39. Style - 01
  40. Style - 01
  41. Style - 01
  42. Style - 01
  43. Style - 01
  44. Style - 01
  45. Style - 01
  46. Style - 01
  47. Style - 01
  48. Style - 01
  49. Style - 01
  50. Style - 01
  51. Style - 01
  52. Style - 01
  53. Style - 01
  54. Style - 01
  55. Style - 01
  56. Style - 01
  57. Style - 01
  58. Style - 01
  59. Style - 01
  60. Style - 01
  61. Style - 01
  62. Style - 01
  63. Style - 01
  64. Style - 01
  65. Style - 01
  66. Style - 01
  67. Style - 01
  68. Style - 01
  69. Style - 01
  70. Style - 01
  71. Style - 01
  72. Style - 01
  73. Style - 01
  74. Style - 01
  75. Style - 01
  76. Style - 01
  77. Style - 01
  78. Style - 01
  79. Style - 01
  80. Style - 01
  81. Style - 01
  82. Style - 01
  83. Style - 01
  84. Style - 01
  85. Style - 01
  86. Style - 01
  87. Style - 01
  88. Style - 01
  89. Style - 01
  90. Style - 01
  91. Style - 01
  92. Style - 01
  93. Style - 01
  94. Style - 01
  95. Style - 01