15. Hadoop Architecture
Hadoop Client
Contacts Name Node for data or
Job Tracker to submit jobs
Name Node
Job Tracker
Maintains mapping of file
blocks to data node slaves
Schedules jobs across task
tracker slaves
Data Node
Task Tracker
Stores and serves
blocks of data
Runs tasks (work units)
within a job
Share Physical Node
16. Hadoop Process
MapReduce Example for Word Count
cat *.txt | mapper.pl | sort | reducer.pl > out.txt
Split 1
(docid, text)
Map 1
(words, counts)
(sorted words, counts)
Be, 5
Reduce 1
“To Be
Or Not
To Be?”
(sorted words,
sum of counts)
Output
File 1
Be, 30
Be, 12
Split i
(docid, text)
Reduce i
Map i
Be, 7
Be, 6
Split N
(docid, text)
Map M
(sorted words,
sum of counts)
Reduce R
(sorted words,
sum of counts)
Shuffle
(words, counts)
Map(in_key, in_value) => list of (out_key, intermediate_value)
(sorted words, counts)
Output
File i
Output
File R
Reduce(out_key, list of intermediate_values) => out_value(s)