2. Problem Statement
The specifics of the problem include,
Interoperability between Hadoop and OpenStack.
Hadoop assumes that it has the direct control over resources.
But when installed on OpenStack, the compute and storage
resources of a Hadoop node may be distributed remotely over
the network.This introduces latency between the storage and
the compute components.
Minimizing the data transfer over iSCSI.
3. Literature Survey
Moving to the Cloud (Dr. Dinkar Sitaram et al.)
http://www.hastexo.com/resources/docs/installing-openstack-
essex-20121-ubuntu-1204-precise-pangolin
http://devstack.org/guides/multinode-lab.html
https://github.com/mseknibilel/OpenStack-Folsom-Install-guide
OpenStack Compute Administration Manual
(docs.openstack.org)
StackGeek OpenStack Guide
(http://www.stackgeek.com/blog/kordless/guides/gettingstarted
.html)
Hadoop Installation Guide (http://www.michael-
noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-
node-cluster/)
4. Proposed Solution Description
The solution consists of following stages
Using MRLU / Simple (Max Resource, Least Usage) scheduling
algorithm for allocatingVMs.
Disabling the option for Live Migration.
Using OpenStack root-disk for creating HDFS.
Using Swift service to store User input data and results.
Writing Bootstrap scripts to setup the IP address and other
initialization tasks.
5. Solution Description
MRLU
TheVMs spawned by Nova should be on the machine with
maximum resource and least utility.
Live Migration
In order to minimize the traffic via iSCSI, the solution demands
that we disable the live migration ofVMs on OpenStack.
Root Disk
Instead of allocating Cinder storage for HDFS, we plan to use
root-disk located at /var/lib/nova/instances/ on the local
machine.This would impose that the HDFS is not connected
over iSCSI.
6. Solution Description
Swift
To provide flexibility and abstraction for the user to interact
with the service, we use Swift to store the user input. Hadoop
uses this data to compute and store the results back on Swift.
Bootstrapping
We define a set of tasks that need to be performed
before/after spawning theVMs. Some of these tasks include
assigning IP address to Hadoop nodes etc.This can be achieved
by simple bootstrap scripts.
7. Overview of the Solution
32 GB 32 GB 32 GB 32 GB
VM VM VM VM
Master Slave Slave Slave
HDFS HDFS HDFS HDFS
Nova
Controller
Horizon
Swift
10.10.10.32/27
8. Network Configuration of the setup
Nova
Controller
Nova
Compute 1
Nova
Compute 2
Public
Switch
Private
Switch
College Network
Router
192.168.0.66
10.10.10.5
192.168.0.67
10.10.10.9
192.168.0.65
10.10.10.6
9. Hadoop deployment on OpenStack
Nova Controller Nova Compute 1 Nova Compute 2
Hadoop Master
192.168.0.33
10.10.10.34
Hadoop Slave 1
192.168.0.34
10.10.10.35
Hadoop Slave 2
192.168.0.36
10.10.10.36
Hadoop Slave 3
192.168.0.35
10.10.10.37
Hadoop Slave 4
192.168.0.38
10.10.10.38
10. Future Enhancements
Explore Swift as the backend storage for HDFS.
Bootstrap scripts to auto configure the Hadoop cluster
using snapshots of the images.
11. Team Members
Akshay MS (1PI09IS010)
Sandeep Raju P (1PI09CS081)
Suhas Mohan (1PI09IS104)
Vijesh M (1PI09CS119)
Vivek P (1PI09IS119)