2. Install Java
Do I have Java? Type on terminal: java -version
If I see the output below, then I don’t have java installed, follow instructions next
slide
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
2
3. Install Java
Type:
sudo apt-get install openjdk-8-jdk
Type Y to continue the installation process (it will take a while to complete the
installation)
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
3
4. Do I have java?
To confirm java ins installed on my Ubuntu system type:
java –version
You will see output below
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
4
5. Install Openssh
Is mandatory to install openssh server:
sudo apt-get install openssh-server
If ssh server is installed then
generate keys, run command below:
ssh-keygen -t rsa
Enter file, press enter
Enter passphrase, press enter
Enter same passphrase again press
enter
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
5
6. SSH Keys
Now we will copy the key to the user and host, in my case my user is hadoop and
host is hadoopdev
ssh-copy-id hadoop@hadoopdev
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
6
8. Download Apache Hadoop
Type in the terminal the following command to create new folder within my home
linux folder, in this case/home/Hadoop/:
mkdir hadoop_install
Then go into this new folder:
cd hadoop_install
And copy the command below:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-
2.7.3.tar.gz
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
8
9. Download Apache Hadoop
You will see windows reflecting the progress of the download
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
9
10. Unzip Hadoop folder
Once download is complete
Type the following command:
tar -xvf hadoop-2.7.3.tar.gz
Now you will see 2 folders, the new directory is called hadoop-2.7.3:
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
10
11. Setup bashrc
This is the java location (very important for next steps):
Edit bashrc
Type:
Sudo gedit ~/.bashrc
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
11
12. Setup ~/.bashrc
Add this lines to the .bashrc
Pls note on previous slide the java path is displayed, need to point bashrc to the
actual java path
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/ java-1.8.0-openjdk-amd64
export HADOOP_INSTALL=/home/hadoop/hadoop_install
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
12
13. Testing hadoop installation
Type the following command to refresh ~/.bashrc changes (no need to restart)
source ~/.basrch
Type the command below (if at this point you see an output like this you’re
doing well)
hadoop version
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
13
15. Point your java to hadoop conf file
Go to the path:
/home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop
Edit the file:
sudo gedit Hadoop-env.sh
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
15
16. Modifying hadoop-env.sh
Modify the value for Java Home in the file: hadoop-env.sh
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
16
17. Modify core-site.xml
Create a folder called tmp in /home/hadoop/hadoop_install
Add the following text to the core-site.xml , file is on the path:
/home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop_install/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.</description>
</property>
</configuration>
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
17
18. Modify mapred-site.xml
By default there is a file called: mapred-site.xml.template, needs to be renamed to
mapred-site.xml and then add the code below:
File is on path: /home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. </description>
</property>
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
18
19. Modify hdfs-site.xml
We need to créate 2 new folders which will contain name node and data node:
I placed these 2 folders on: /home/hadoop/hadoop_install/
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
19
20. Modify hdfs-site.xml
Add the code below in the file hdfs-site.xml, the paths for namnode and datanode are the 2 new folders
you just created on previous slide.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoop_install/namenode</value>
</property>
<property>
<name>dfs.data.node.name.dir</name>
<value>file:///home/hadoop/hadoop_install/datanode</value>
</property>
</configuration>
#hdfs-site.xml is located on the path: /home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
20
21. Format the namenode
Run the following command:
hadoop namenode –format
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
21
22. Format the namenode part 2
If everything is ok you will see message below:
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
22
23. Running Hadoop Single node
Run the command:
startall.sh
Then execute the command:
jps, you will see the following output
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
23
24. Stop Cluster
We run stop-all.sh
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
24
25. Web Interface: localhost:50070
In the browser go to: localhost:50070
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
25
26. Applies for:
This installation runs under:
Ubuntu 16
Hadoop 2.7.3
Virtual Machine:
2 Processors
2 Gb Ram
2 Network Interface, 1 as Bridge, 2nd as Nat
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
26
27. You need help?
Contact name:
Enrique Davila Gutierrez
Enrique.davila@Gmail.com
10/24/2016Enrique Davila Big Data Instructor enrique.davila@gmail.com
27