SlideShare a Scribd company logo
1 of 11
Download to read offline
Cloud Computing Course
Final Project Assessment
Guided by Dr. Dinkar Sitaram
Problem Statement
The specifics of the problem include,
 Interoperability between Hadoop and OpenStack.
 Hadoop assumes that it has the direct control over resources.
But when installed on OpenStack, the compute and storage
resources of a Hadoop node may be distributed remotely over
the network.This introduces latency between the storage and
the compute components.
 Minimizing the data transfer over iSCSI.
Literature Survey
 Moving to the Cloud (Dr. Dinkar Sitaram et al.)
 http://www.hastexo.com/resources/docs/installing-openstack-
essex-20121-ubuntu-1204-precise-pangolin
 http://devstack.org/guides/multinode-lab.html
 https://github.com/mseknibilel/OpenStack-Folsom-Install-guide
 OpenStack Compute Administration Manual
(docs.openstack.org)
 StackGeek OpenStack Guide
(http://www.stackgeek.com/blog/kordless/guides/gettingstarted
.html)
 Hadoop Installation Guide (http://www.michael-
noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-
node-cluster/)
Proposed Solution Description
 The solution consists of following stages
 Using MRLU / Simple (Max Resource, Least Usage) scheduling
algorithm for allocatingVMs.
 Disabling the option for Live Migration.
 Using OpenStack root-disk for creating HDFS.
 Using Swift service to store User input data and results.
 Writing Bootstrap scripts to setup the IP address and other
initialization tasks.
Solution Description
 MRLU
TheVMs spawned by Nova should be on the machine with
maximum resource and least utility.
 Live Migration
In order to minimize the traffic via iSCSI, the solution demands
that we disable the live migration ofVMs on OpenStack.
 Root Disk
Instead of allocating Cinder storage for HDFS, we plan to use
root-disk located at /var/lib/nova/instances/ on the local
machine.This would impose that the HDFS is not connected
over iSCSI.
Solution Description
 Swift
To provide flexibility and abstraction for the user to interact
with the service, we use Swift to store the user input. Hadoop
uses this data to compute and store the results back on Swift.
 Bootstrapping
We define a set of tasks that need to be performed
before/after spawning theVMs. Some of these tasks include
assigning IP address to Hadoop nodes etc.This can be achieved
by simple bootstrap scripts.
Overview of the Solution
32 GB 32 GB 32 GB 32 GB
VM VM VM VM
Master Slave Slave Slave
HDFS HDFS HDFS HDFS
Nova
Controller
Horizon
Swift
10.10.10.32/27
Network Configuration of the setup
Nova
Controller
Nova
Compute 1
Nova
Compute 2
Public
Switch
Private
Switch
College Network
Router
192.168.0.66
10.10.10.5
192.168.0.67
10.10.10.9
192.168.0.65
10.10.10.6
Hadoop deployment on OpenStack
Nova Controller Nova Compute 1 Nova Compute 2
Hadoop Master
192.168.0.33
10.10.10.34
Hadoop Slave 1
192.168.0.34
10.10.10.35
Hadoop Slave 2
192.168.0.36
10.10.10.36
Hadoop Slave 3
192.168.0.35
10.10.10.37
Hadoop Slave 4
192.168.0.38
10.10.10.38
Future Enhancements
 Explore Swift as the backend storage for HDFS.
 Bootstrap scripts to auto configure the Hadoop cluster
using snapshots of the images.
Team Members
 Akshay MS (1PI09IS010)
 Sandeep Raju P (1PI09CS081)
 Suhas Mohan (1PI09IS104)
 Vijesh M (1PI09CS119)
 Vivek P (1PI09IS119)

More Related Content

What's hot

What's hot (20)

OpenStack-Ansible Project Update
OpenStack-Ansible Project UpdateOpenStack-Ansible Project Update
OpenStack-Ansible Project Update
 
DevStack
DevStackDevStack
DevStack
 
Automating hard things may 2015
Automating hard things   may 2015Automating hard things   may 2015
Automating hard things may 2015
 
Deploying OpenStack with Ansible
Deploying OpenStack with AnsibleDeploying OpenStack with Ansible
Deploying OpenStack with Ansible
 
Top Ten Security Considerations when Setting up your OpenNebula Cloud
Top Ten Security Considerations when Setting up your OpenNebula CloudTop Ten Security Considerations when Setting up your OpenNebula Cloud
Top Ten Security Considerations when Setting up your OpenNebula Cloud
 
Enabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaEnabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebula
 
Flexible, simple deployments with OpenStack-Ansible
Flexible, simple deployments with OpenStack-AnsibleFlexible, simple deployments with OpenStack-Ansible
Flexible, simple deployments with OpenStack-Ansible
 
OpenStack Storage - an Overview
OpenStack Storage - an OverviewOpenStack Storage - an Overview
OpenStack Storage - an Overview
 
OpenStack!
OpenStack!OpenStack!
OpenStack!
 
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
What is OpenStack to you? OpenStackFin 2014-02
What is OpenStack to you? OpenStackFin 2014-02What is OpenStack to you? OpenStackFin 2014-02
What is OpenStack to you? OpenStackFin 2014-02
 
Big Data on DC/OS
Big Data on DC/OSBig Data on DC/OS
Big Data on DC/OS
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
 
OpenStack en 10 minutes
OpenStack en 10 minutesOpenStack en 10 minutes
OpenStack en 10 minutes
 
Mirantis v OpenStack Ansible Dawn of Production
Mirantis v OpenStack Ansible Dawn of ProductionMirantis v OpenStack Ansible Dawn of Production
Mirantis v OpenStack Ansible Dawn of Production
 
Intro to OpenStack
Intro to OpenStackIntro to OpenStack
Intro to OpenStack
 
Docker in OpenStack
Docker in OpenStackDocker in OpenStack
Docker in OpenStack
 
Puppet + Windows Nano Server
Puppet + Windows Nano ServerPuppet + Windows Nano Server
Puppet + Windows Nano Server
 
SUSE Enterprise Storage
SUSE Enterprise StorageSUSE Enterprise Storage
SUSE Enterprise Storage
 

Viewers also liked (7)

Hadoop For OpenStack Log Analysis
Hadoop For OpenStack Log AnalysisHadoop For OpenStack Log Analysis
Hadoop For OpenStack Log Analysis
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 

Similar to Hadoop on OpenStack

20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
TRCK
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 

Similar to Hadoop on OpenStack (20)

Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
 
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
Introduction to OpenStack (2012)
Introduction to OpenStack (2012)Introduction to OpenStack (2012)
Introduction to OpenStack (2012)
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Tlu introduction-to-cloud
Tlu introduction-to-cloudTlu introduction-to-cloud
Tlu introduction-to-cloud
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Devops Spark Streaming
Devops Spark StreamingDevops Spark Streaming
Devops Spark Streaming
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David Nalley
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Arun
ArunArun
Arun
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Hadoop on OpenStack

  • 1. Cloud Computing Course Final Project Assessment Guided by Dr. Dinkar Sitaram
  • 2. Problem Statement The specifics of the problem include,  Interoperability between Hadoop and OpenStack.  Hadoop assumes that it has the direct control over resources. But when installed on OpenStack, the compute and storage resources of a Hadoop node may be distributed remotely over the network.This introduces latency between the storage and the compute components.  Minimizing the data transfer over iSCSI.
  • 3. Literature Survey  Moving to the Cloud (Dr. Dinkar Sitaram et al.)  http://www.hastexo.com/resources/docs/installing-openstack- essex-20121-ubuntu-1204-precise-pangolin  http://devstack.org/guides/multinode-lab.html  https://github.com/mseknibilel/OpenStack-Folsom-Install-guide  OpenStack Compute Administration Manual (docs.openstack.org)  StackGeek OpenStack Guide (http://www.stackgeek.com/blog/kordless/guides/gettingstarted .html)  Hadoop Installation Guide (http://www.michael- noll.com/tutorials/running-hadoop-on-ubuntu-linux-single- node-cluster/)
  • 4. Proposed Solution Description  The solution consists of following stages  Using MRLU / Simple (Max Resource, Least Usage) scheduling algorithm for allocatingVMs.  Disabling the option for Live Migration.  Using OpenStack root-disk for creating HDFS.  Using Swift service to store User input data and results.  Writing Bootstrap scripts to setup the IP address and other initialization tasks.
  • 5. Solution Description  MRLU TheVMs spawned by Nova should be on the machine with maximum resource and least utility.  Live Migration In order to minimize the traffic via iSCSI, the solution demands that we disable the live migration ofVMs on OpenStack.  Root Disk Instead of allocating Cinder storage for HDFS, we plan to use root-disk located at /var/lib/nova/instances/ on the local machine.This would impose that the HDFS is not connected over iSCSI.
  • 6. Solution Description  Swift To provide flexibility and abstraction for the user to interact with the service, we use Swift to store the user input. Hadoop uses this data to compute and store the results back on Swift.  Bootstrapping We define a set of tasks that need to be performed before/after spawning theVMs. Some of these tasks include assigning IP address to Hadoop nodes etc.This can be achieved by simple bootstrap scripts.
  • 7. Overview of the Solution 32 GB 32 GB 32 GB 32 GB VM VM VM VM Master Slave Slave Slave HDFS HDFS HDFS HDFS Nova Controller Horizon Swift 10.10.10.32/27
  • 8. Network Configuration of the setup Nova Controller Nova Compute 1 Nova Compute 2 Public Switch Private Switch College Network Router 192.168.0.66 10.10.10.5 192.168.0.67 10.10.10.9 192.168.0.65 10.10.10.6
  • 9. Hadoop deployment on OpenStack Nova Controller Nova Compute 1 Nova Compute 2 Hadoop Master 192.168.0.33 10.10.10.34 Hadoop Slave 1 192.168.0.34 10.10.10.35 Hadoop Slave 2 192.168.0.36 10.10.10.36 Hadoop Slave 3 192.168.0.35 10.10.10.37 Hadoop Slave 4 192.168.0.38 10.10.10.38
  • 10. Future Enhancements  Explore Swift as the backend storage for HDFS.  Bootstrap scripts to auto configure the Hadoop cluster using snapshots of the images.
  • 11. Team Members  Akshay MS (1PI09IS010)  Sandeep Raju P (1PI09CS081)  Suhas Mohan (1PI09IS104)  Vijesh M (1PI09CS119)  Vivek P (1PI09IS119)