Hadoop is gaining interest all over the world. To make yourself comfortable with this latest technology. Check this presentaon. It explains basics of Hadoop and working flow of cluster.
2. Cnu Federer ExplainToday.blogspot.com
What is hadoop?
●
A powerful frame work to process big data
●
Parallel processing and Distributed database
HADOOP
Big
Data
Analytics,
Recommendations,
Insights
4. Cnu Federer ExplainToday.blogspot.com
Why it is siginificant?
●
Data is growing rapidly
●
Need for proper analytics
●
Saving power and time
●
Traditional methods failed
5. Cnu Federer ExplainToday.blogspot.com
Key terms in hadoop
●
Name Node
– Important machine which stores metadata about
datanodes
●
Resource Manager (Job Tracker)
– Manages available resources (datanodes'
memory/processing power)
These two considered as masters
6. Cnu Federer ExplainToday.blogspot.com
Key terms (contd..)
●
Data Node
– Which stores data and do map reduce tasks
– We can add as many as we want
●
Secondary Name node
– Takes frequent image files from Name node
– Useful in recovering Namenode failure
– Reduces burden for Name node
7. Cnu Federer ExplainToday.blogspot.com
Key terms (contd..)
●
HDFS
– Hadoop Distributed File System
– Each machine has their loca file systems, but this is
distributed and available for all machines
●
History Server
– Saves Job history of data nodes
8. Cnu Federer ExplainToday.blogspot.com
What is map-reduce?
●
A software framework used to process data
●
Introduced by Google
●
Map and Reduce are two phases
Mapping phase
Reducing Phase
Data Key-Value pairs
Results
11. Cnu Federer ExplainToday.blogspot.com
How hadoop works?
1 ➔
Store data in HDFS across all the nodes
➔
Namenode will store the metadata of
datanodes
➔
Task will be given to Hadoop cluster
➔
Resource Manager check with Name node
about which datanode has which data
2
3
12. Cnu Federer ExplainToday.blogspot.com
How hadoop works? ( contd..)
4 ➔
Based on namenode inputs, RM will give
Map Reduce tasks to data nodes
➔
Data nodes performs Map Reduce and
store the task in History Server
➔
After tasks have completed, results will be
collected and given back to user
5
13. Cnu Federer ExplainToday.blogspot.com
Commercial products
●
CDH ( Cloudera Distribution inclding
Apache Hadoop)
●
IBM Infosphere BigInsights
●
MapR apache hadoop distributions
●
Hortonworks Hadoop distributions
●
...... and so many