SlideShare a Scribd company logo
1 of 13
Implementing
Hadoop on a Single
Cluster
- S A L IL NAVG IR E
Basic Setup
1.

Install Ubuntu

2.

Install Java, Python and update

3.

Add group ‘hadoop’ and ‘hduser’ as user (for security and
backup)

4.

Configure SSH
a)
b)

Configure it by editing file ssh_config and save a backup

c)

Generate ssh key for hduser

d)

Enable ssh access to your local machine with the newly created RSA
key

e)

5.

Install OpenSSH Server

hduser@Ubuntu:~$ ssh localhost

Disable IPv6 in sysctl.conf file in editor
Installing Hadoop
1. Download hadoop from the collection of Apache
Download Mirrors
• salil@ubuntu:/usr/local$ sudo tar xzf hadoop-2.0.6-alphasrc.tar.gz

2. Make sure to change the owner to hduser in
hadoop group
• $ sudo chown -R hduser:hadoop hadoop (change the
permissions)

3. Update $HOME/.bashrc – hadoop related
environment variables
Configuration
1. Edit environment variables in conf/hadoop-env.sh
2. Change settings in conf/*site.xml
3. Make directory and set the required ownerships and
permissions
• Now we create the directory and set the required ownerships
and permissions:
• $ sudo mkdir -p /app/hadoop/tmp
• $ sudo chown hduser:hadoop /app/hadoop/tmp
• $ sudo chmod 750 /app/hadoop/tmp

4. Add configurations snippets between <configuration>
... </configuration> tags in core-site.xml, mapredsite.xml and hdfs-site.xml
Starting your single node cluster
• First format the namenode
•

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop
namenode -format

• Start your single node cluster
• Running a MapReduce job
• Download data and copy from local file to hdfs
• hduser@ubuntu:~$ hadoop dfs -copyFromLocal
/home/hduser/project.txt /user/new
• hduser@ubuntu:~$ hadoop dfs -copyFromLocal
/home/hduser/hadoop/project.txt /user/lol
• hduser@ubuntu:~$ hadoop dfs -ls /user/lol
Found 2 items
drwxr-xr-x - hduser supergroup
0 2013-10-10
06:30 /user/lol/output
-rw-r--r-- 1 hduser supergroup 969039 2013-1005 20:20 /user/lol/project.txt
• hduser@ubuntu:~$ hadoop jar
/home/hduser/hadoop/hadoop-examples-1.0.3.jar
wordcount /user/lol/project.txt /user/lol/output/
• Hadoop Web interfaces
• http://localhost:50070/ – web UI of the NameNode daemon
• http://localhost:50030/ – web UI of the JobTracker daemon
• http://localhost:50060/ – web UI of the TaskTracker daemon
• The NameNode
Web interface gives
us a cluster
summary about
total /remaining,
capacity, live and
dead nodes.
• Aditionally we can
browse the HDFS to
view contents of
files and log
• The Jobtracker
Web interface
provides general
job statistics
about Hadoop
cluster,
running/complet
ed/failed jobs
and a job history
log file
• Tasktracker
provides info
about running
and non-running
tasks
Writing MapReduce programs
• Hadoop framework is written in java, which is
complicated to code for Non-CS guys
• Can be written in Python and converted to .jar file using
Jython to run on a Hadoop cluster

• But Jython has incomplete standard library because
some Python features not provided in Jython
• Alternative is to use Hadoop Streaming

• Hadoop streaming is the utility that comes with Hadoop
distribution; able to run any executable script as a
mapper and reducer
• Write mapper.py and reducer.py in python
• Download and copy data to HDFS

• Run same as previous java implementation
• There are other third party solutions of Python
Mapreduce which are similar to Streaming/Jython
but can be easily used as a library in Python
Python implementation stratagies
• Streaming
• mrjob
• dumbo
• Hadoopy

• Non-Hadoop
• disco

• Prefer Hadoop streaming if possible because it is
easy and has the lowest overhead
• Prefer mrjob where you need higher abstraction
and integration with AWS
Future Work….
• Python implementation in Hadoop
• Running Hadoop in Multi node cluster
• Pig and its implementation on linux
• Apache Mahout, Hive, Solr

More Related Content

What's hot

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop InstallMike Frampton
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Mike Frampton
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a clusterXuhong Zhang
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckWill Sterling
 
Beeswax Hive editor in Hue
Beeswax Hive editor in HueBeeswax Hive editor in Hue
Beeswax Hive editor in HueRomain Rigaux
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friendslucenerevolution
 
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Configuringahadoop
ConfiguringahadoopConfiguringahadoop
Configuringahadoopmensb
 

What's hot (17)

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a cluster
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeck
 
Beeswax Hive editor in Hue
Beeswax Hive editor in HueBeeswax Hive editor in Hue
Beeswax Hive editor in Hue
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
 
Configuringahadoop
ConfiguringahadoopConfiguringahadoop
Configuringahadoop
 
Pig
PigPig
Pig
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 

Viewers also liked

Salil presentation 11.07
Salil presentation 11.07Salil presentation 11.07
Salil presentation 11.07Salil Navgire
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and HadoopSalil Navgire
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Michael Arnold
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopDATAVERSITY
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Milind Bhandarkar
 
Monitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, NagiosMonitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, NagiosPradeep Kumar
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 

Viewers also liked (20)

Salil presentation 11.07
Salil presentation 11.07Salil presentation 11.07
Salil presentation 11.07
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
Monitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, NagiosMonitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, Nagios
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 

Similar to Implementing Hadoop on a single cluster

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installationmellempudilavanya999
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDataWorks Summit
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 

Similar to Implementing Hadoop on a single cluster (20)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
#WeSpeakLinux Session
#WeSpeakLinux Session#WeSpeakLinux Session
#WeSpeakLinux Session
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Implementing Hadoop on a single cluster

  • 1. Implementing Hadoop on a Single Cluster - S A L IL NAVG IR E
  • 2. Basic Setup 1. Install Ubuntu 2. Install Java, Python and update 3. Add group ‘hadoop’ and ‘hduser’ as user (for security and backup) 4. Configure SSH a) b) Configure it by editing file ssh_config and save a backup c) Generate ssh key for hduser d) Enable ssh access to your local machine with the newly created RSA key e) 5. Install OpenSSH Server hduser@Ubuntu:~$ ssh localhost Disable IPv6 in sysctl.conf file in editor
  • 3. Installing Hadoop 1. Download hadoop from the collection of Apache Download Mirrors • salil@ubuntu:/usr/local$ sudo tar xzf hadoop-2.0.6-alphasrc.tar.gz 2. Make sure to change the owner to hduser in hadoop group • $ sudo chown -R hduser:hadoop hadoop (change the permissions) 3. Update $HOME/.bashrc – hadoop related environment variables
  • 4. Configuration 1. Edit environment variables in conf/hadoop-env.sh 2. Change settings in conf/*site.xml 3. Make directory and set the required ownerships and permissions • Now we create the directory and set the required ownerships and permissions: • $ sudo mkdir -p /app/hadoop/tmp • $ sudo chown hduser:hadoop /app/hadoop/tmp • $ sudo chmod 750 /app/hadoop/tmp 4. Add configurations snippets between <configuration> ... </configuration> tags in core-site.xml, mapredsite.xml and hdfs-site.xml
  • 5. Starting your single node cluster • First format the namenode • hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format • Start your single node cluster
  • 6. • Running a MapReduce job • Download data and copy from local file to hdfs • hduser@ubuntu:~$ hadoop dfs -copyFromLocal /home/hduser/project.txt /user/new • hduser@ubuntu:~$ hadoop dfs -copyFromLocal /home/hduser/hadoop/project.txt /user/lol
  • 7. • hduser@ubuntu:~$ hadoop dfs -ls /user/lol Found 2 items drwxr-xr-x - hduser supergroup 0 2013-10-10 06:30 /user/lol/output -rw-r--r-- 1 hduser supergroup 969039 2013-1005 20:20 /user/lol/project.txt • hduser@ubuntu:~$ hadoop jar /home/hduser/hadoop/hadoop-examples-1.0.3.jar wordcount /user/lol/project.txt /user/lol/output/ • Hadoop Web interfaces • http://localhost:50070/ – web UI of the NameNode daemon • http://localhost:50030/ – web UI of the JobTracker daemon • http://localhost:50060/ – web UI of the TaskTracker daemon
  • 8. • The NameNode Web interface gives us a cluster summary about total /remaining, capacity, live and dead nodes. • Aditionally we can browse the HDFS to view contents of files and log
  • 9. • The Jobtracker Web interface provides general job statistics about Hadoop cluster, running/complet ed/failed jobs and a job history log file • Tasktracker provides info about running and non-running tasks
  • 10. Writing MapReduce programs • Hadoop framework is written in java, which is complicated to code for Non-CS guys • Can be written in Python and converted to .jar file using Jython to run on a Hadoop cluster • But Jython has incomplete standard library because some Python features not provided in Jython • Alternative is to use Hadoop Streaming • Hadoop streaming is the utility that comes with Hadoop distribution; able to run any executable script as a mapper and reducer
  • 11. • Write mapper.py and reducer.py in python • Download and copy data to HDFS • Run same as previous java implementation • There are other third party solutions of Python Mapreduce which are similar to Streaming/Jython but can be easily used as a library in Python
  • 12. Python implementation stratagies • Streaming • mrjob • dumbo • Hadoopy • Non-Hadoop • disco • Prefer Hadoop streaming if possible because it is easy and has the lowest overhead • Prefer mrjob where you need higher abstraction and integration with AWS
  • 13. Future Work…. • Python implementation in Hadoop • Running Hadoop in Multi node cluster • Pig and its implementation on linux • Apache Mahout, Hive, Solr