Apache HDFS - Lab Assignment

•

1 j'aime•2,485 vues

A tutorial presentation based on hadoop.apache.org documentation. I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.

Logiciels

HDFS warmup
Farzad Nozarian
3/14/15 @AUT

Purpose
How to set up and configure a single-node Hadoop installation so that you
can quickly perform simple operations using Hadoop Distributed File System
(HDFS).
2

Supported Platforms
• GNU/Linux is supported as a development and production platform.
Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
• Windows is also a supported platform but the followings steps are for
Linux only.
3

Required Software
• Java™ must be installed. Recommended Java versions are described at
http://wiki.apache.org/hadoop/HadoopJavaVersions
• ssh must be installed and sshd must be running to use the Hadoop scripts
that manage remote Hadoop daemons.
• To get a Hadoop distribution, download a recent stable release from one
of the Apache Download Mirrors
$ sudo apt-get install ssh
$ sudo apt-get install rsync
4

Prepare to Start the Hadoop Cluster
• Unpack the downloaded Hadoop distribution. In the distribution, edit the
file etc/hadoop/hadoop-env.sh to define some parameters as follows:
• Try the following command:
This will display the usage documentation for the hadoop script.
# set to the root of your Java installation
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0
# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop
$ bin/hadoop
5

Prepare to Start the Hadoop Cluster (Cont.)
• Now you are ready to start your Hadoop cluster in one of the three
supported modes:
• Local (Standalone) Mode
• By default, Hadoop is configured to run in a non-distributed mode, as a single Java
process. This is useful for debugging.
• Pseudo-Distributed Mode
• Hadoop can also be run on a single-node in a pseudo-distributed mode where each
Hadoop daemon runs in a separate Java process.
• Fully-Distributed Mode
6

Pseudo-Distributed Configuration
• etc/hadoop/core-site.xml:
• etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
7

Lab
Assignment
1. Start HDFS and verify that it's running.
2. Create a new directory /sics on HDFS.
3. Create a file, name it big, on your local filesystem
and upload it to HDFS under /sics.
4. View the content of /sics directory.
5. Determine the size of big on HDFS.
6. Print the first 5 lines to screen from big on HDFS.
7. Copy big to /big_hdfscopy on HDFS.
8. Copy big back to local filesystem and name it
big_localcopy.
9. Check the entire HDFS filesystem for
inconsistencies/problems.
10. Delete big from HDFS.
11. Delete /sics directory from HDFS.
8

1- Start HDFS and verify that it's running
1. Format the filesystem:
2. Start NameNode daemon and DataNode daemon:
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
3. Browse the web interface for the NameNode; by default it is available at:
• NameNode - http://localhost:50070/
$ bin/hdfs namenode -format
$ sbin/start-dfs.sh
9

2- Create a new directory /sics on HDFS
hdfs dfs -mkdir /sics
3- Create a file, name it big, on your local
filesystem and upload it to HDFS under /sics
hdfs dfs -put big /sics
10

4- View the content of /sics directory
hdfs dfs -ls big /sics
5- Determine the size of big on HDFS
hdfs dfs -du -h /sics/big
11

6- Print the first 5 lines to screen from big on
HDFS
hdfs dfs -cat /sics/big | head -n 5
7- Copy big to /big_hdfscopy on HDFS
hdfs dfs -cp /sics/big /sics/big_hdfscopy
12

8- Copy big back to local filesystem and name it
big_localcopy
hdfs dfs -get /sics/big big_localcopy
9- Check the entire HDFS filesystem for
inconsistencies/problems
hdfs fsck /
13

10- Delete big from HDFS.
hdfs dfs -rm /sics/big
11- Delete /sics directory from HDFS
hdfs dfs -rm -r /sics
14

References:
hadoop.apache.org
(http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)
15

Contenu connexe

Tendances

Configure h base hadoop and hbase clientShashwat Shriparv

Hadoop migration and upgradationShashwat Shriparv

Hadoop Installation presentationpuneet yadav

July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox

Hive Quick Start TutorialCarl Steinbach

Introduction to apache hadoopShashwat Shriparv

Hadoop operations basicHafizur Rahman

Hadoop & HDFS for BeginnersRahul Jain

Friends of Solr - Nutch & HDFSSaumitra Srivastav

Introduction to FlumeRupak Roy

6.hivePrashant Gupta

Introduction to hadoop administration jkEdureka!

Hadoop single node installation on ubuntu 14jijukjoseph

Beginning hive and_apache_pigMohamed Ali Mahmoud khouder

Implementing Hadoop on a single clusterSalil Navgire

01 hbaseSubhas Kumar Ghosh

Hadoop installation with an exampleNikita Kesharwani

HDFS: Hadoop Distributed FilesystemSteve Loughran

Hadoop architecture by ajayHadoop online training

Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit

Tendances (20)

Configure h base hadoop and hbase client

Hadoop migration and upgradation

Hadoop Installation presentation

July 2010 Triangle Hadoop Users Group - Chad Vawter Slides

Hive Quick Start Tutorial

Introduction to apache hadoop

Hadoop operations basic

Hadoop & HDFS for Beginners

Friends of Solr - Nutch & HDFS

Introduction to Flume

6.hive

Introduction to hadoop administration jk

Hadoop single node installation on ubuntu 14

Beginning hive and_apache_pig

Implementing Hadoop on a single cluster

01 hbase

Hadoop installation with an example

HDFS: Hadoop Distributed Filesystem

Hadoop architecture by ajay

Rigorous and Multi-tenant HBase Performance Measurement

En vedette

Apache Storm TutorialFarzad Nozarian

Apache Hadoop MapReduce TutorialFarzad Nozarian

Object Based DatabasesFarzad Nozarian

Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian

Big Data and Cloud ComputingFarzad Nozarian

Apache Spark TutorialFarzad Nozarian

S4: Distributed Stream Computing PlatformFarzad Nozarian

Big data Clustering Algorithms And StrategiesFarzad Nozarian

En vedette (8)

Apache Storm Tutorial

Apache Hadoop MapReduce Tutorial

Object Based Databases

Big Data Processing in Cloud Computing Environments

Big Data and Cloud Computing

Apache Spark Tutorial

S4: Distributed Stream Computing Platform

Big data Clustering Algorithms And Strategies

Similaire à Apache HDFS - Lab Assignment

Single node setupKBCHOW123

Run wordcount job (hadoop)valeri kopaleishvili

Hadoop 2.4 installing on ubuntu 14.04baabtra.com - No. 1 supplier of quality freshers

Configuring and manipulating HDFS filesRupak Roy

Hadoop File System.pptxAakashBerlia1

Single node hadoop cluster installation Mahantesh Angadi

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...Leons Petražickis

R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong

Exp-3.pptxPraveenKumar581409

BIGDATA ANALYTICS LAB MANUAL final.pdfANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE

Hadoop installation on windows habeebulla g

LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil

Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari

Install hadoop in a clusterXuhong Zhang

HDFS_Command_ReferenceTata Consultancy Services

Hadoop single node setupMohammad_Tariq

Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva

Top 10 Hadoop Shell Commands SimoniShah6

Data science bootcamp day2Chetan Khatri

Hdfs java apiTrieu Dao Minh

Similaire à Apache HDFS - Lab Assignment (20)

Single node setup

Run wordcount job (hadoop)

Hadoop 2.4 installing on ubuntu 14.04

Configuring and manipulating HDFS files

Hadoop File System.pptx

Single node hadoop cluster installation

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...

R hive tutorial supplement 1 - Installing Hadoop

Exp-3.pptx

BIGDATA ANALYTICS LAB MANUAL final.pdf

Hadoop installation on windows

LuisRodriguezLocalDevEnvironmentsDrupalOpenDays

Big data with hadoop Setup on Ubuntu 12.04

Install hadoop in a cluster

HDFS_Command_Reference

Hadoop single node setup

Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)

Top 10 Hadoop Shell Commands

Data science bootcamp day2

Hdfs java api

Dernier

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

TECUNIQUE: Success Stories: IT Service providermohitmore19

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Dernier (20)

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Unlocking the Future of AI Agents with Large Language Models

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

5 Signs You Need a Fashion PLM Software.pdf

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

How To Use Server-Side Rendering with Nuxt.js

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

Hand gesture recognition PROJECT PPT.pptx

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

HR Software Buyers Guide in 2024 - HRSoftware.com

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

Microsoft AI Transformation Partner Playbook.pdf

Diamond Application Development Crafting Solutions with Precision

TECUNIQUE: Success Stories: IT Service provider

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Apache HDFS - Lab Assignment

1. HDFS warmup Farzad Nozarian 3/14/15 @AUT

2. Purpose How to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop Distributed File System (HDFS). 2

3. Supported Platforms • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. • Windows is also a supported platform but the followings steps are for Linux only. 3

4. Required Software • Java™ must be installed. Recommended Java versions are described at http://wiki.apache.org/hadoop/HadoopJavaVersions • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. • To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors $ sudo apt-get install ssh $ sudo apt-get install rsync 4

5. Prepare to Start the Hadoop Cluster • Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows: • Try the following command: This will display the usage documentation for the hadoop script. # set to the root of your Java installation export JAVA_HOME=/usr/lib/jvm/jdk1.7.0 # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop $ bin/hadoop 5

6. Prepare to Start the Hadoop Cluster (Cont.) • Now you are ready to start your Hadoop cluster in one of the three supported modes: • Local (Standalone) Mode • By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. • Pseudo-Distributed Mode • Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. • Fully-Distributed Mode 6

7. Pseudo-Distributed Configuration • etc/hadoop/core-site.xml: • etc/hadoop/hdfs-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> 7

8. Lab Assignment 1. Start HDFS and verify that it's running. 2. Create a new directory /sics on HDFS. 3. Create a file, name it big, on your local filesystem and upload it to HDFS under /sics. 4. View the content of /sics directory. 5. Determine the size of big on HDFS. 6. Print the first 5 lines to screen from big on HDFS. 7. Copy big to /big_hdfscopy on HDFS. 8. Copy big back to local filesystem and name it big_localcopy. 9. Check the entire HDFS filesystem for inconsistencies/problems. 10. Delete big from HDFS. 11. Delete /sics directory from HDFS. 8

9. 1- Start HDFS and verify that it's running 1. Format the filesystem: 2. Start NameNode daemon and DataNode daemon: The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs). 3. Browse the web interface for the NameNode; by default it is available at: • NameNode - http://localhost:50070/ $ bin/hdfs namenode -format $ sbin/start-dfs.sh 9

10. 2- Create a new directory /sics on HDFS hdfs dfs -mkdir /sics 3- Create a file, name it big, on your local filesystem and upload it to HDFS under /sics hdfs dfs -put big /sics 10

11. 4- View the content of /sics directory hdfs dfs -ls big /sics 5- Determine the size of big on HDFS hdfs dfs -du -h /sics/big 11

12. 6- Print the first 5 lines to screen from big on HDFS hdfs dfs -cat /sics/big | head -n 5 7- Copy big to /big_hdfscopy on HDFS hdfs dfs -cp /sics/big /sics/big_hdfscopy 12

13. 8- Copy big back to local filesystem and name it big_localcopy hdfs dfs -get /sics/big big_localcopy 9- Check the entire HDFS filesystem for inconsistencies/problems hdfs fsck / 13

14. 10- Delete big from HDFS. hdfs dfs -rm /sics/big 11- Delete /sics directory from HDFS hdfs dfs -rm -r /sics 14

15. References: hadoop.apache.org (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) 15

Apache HDFS - Lab Assignment

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Apache HDFS - Lab Assignment

Similaire à Apache HDFS - Lab Assignment (20)

Dernier

Dernier (20)

Apache HDFS - Lab Assignment