2. 1. Decide On Cluster Layout
There are four components of Hadoop which we would like to
spread out across the cluster:
◦ Data nodes – actually store and manage data;
◦ Naming node – acts as a catalogue service, showing what data is stored
where;
◦ Job tracker – tracks and manages submitted MapReduce tasks;
◦ Task tracker – low level worker that is issued jobs from job tracker.
Lets go with the following setup. This is fairly typical in terms
of data nodes and task trackers across the cluster, and one
instance of the naming node and job tracker:
Node
Hostname
Component
Master
ec2-23-22-133-70
Naming Node
Job Tracker
Slave 1
ec2-23-20-53-36
Data Node
Task Tracker
Slave 2
ec2-184-73-42-163
Data Node
Task Tracker
3. 2a. Configure Server Names
Logout of all of the machines and log back into the master
server;
The hadoop configuration will be located here on the server:
cd /home/ubuntu/hadoop-1.0.3/conf
Open the file ‘masters’ and replace the word ‘localhost’ with
the hostname of the server that you have allocated to master:
cd /home/ubuntu/hadoop-1.0.3/conf
vi masters
Open the file ‘slaves’ and replace the word ‘localhost’ with the
2 hostnames of the server that you have been allocated on 2
separate lines:
cd /home/ubuntu/hadoop-1.0.3/conf
vi slaves