Hadoop Installation and basic configuration

Hadoop HDFS/MapReduce
Architecture
Hardware
Installation and Configuration
Monitoring
Namenode

Hardware Requirements
● NameNode + JobTracker
– >= 2 cores
– >= 8 gigs ram
– >= 40gig disk RAID 10
● DataNode + TaskTracker
– >= 4 cores
– >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM
– >= N Gig disk space JBOD (no raid)

Installation
● Download tar file from hadoop or use a prebuilt
rpm
● https://github.com/gerritjvv/repo
● http://bigtop.apache.org/

Configuration
● $HADOOP_HOME/conf/core-site.xml
● $HADOOP_HOME/conf/mapred-site.xml
● $HADOOP_HOME/conf/hdfs-site.xml
● http://hadoop.apache.org/docs/stable/cluster_setup
●

Configuration Namenode
● Create directory for namenode metadata
– /data/hadoop/name
● Open core-site.xml
– Define fs.default.name = http://<host>:8020
● Open hdfs-site.xml
– Define dfs.name.dir=/data/hadoop/name
– Define dfs.replication=3
– Create dir /data/hadoop/hdfs
– Define dfs.data.dir=/data/hadoop/hdfs
– Defin dfs.http.address=localhost:50070
● Start the namenode with the format option
– /opt/hadoop/bin/hadoop namenode -format
– After the format start the namenode with service hadoop-namenode start

Configuration JobTracker
● Open /opt/hadoop/conf/mapred-site.xml
– Define the property
mapred.job.tracker=<host>:8021
– Create the directory /data/hadoop/mapred
– Define mapred.local.dir=/data/hadoop/mapred
● Start the JobTracker with service hadoop-
jobtracker start

Configuration DataNode
● On each datanode create the directory
/data/hadoop/hdfs (one directory per disk)
● Open /opt/hadoop/conf/hdfs-site.xml
– Define dfs.http.address=<host>:50070
– Define dfs.data.dir=/data/hadoop/hdfs
● Start the datanodes with service hadoop-
datanode start

Configuration Mapreduce
● On each datanode create the directory /data/hadoop/mapred
● Open /opt/hadoop/conf/mapred-site.xml
– Define mapred.local.dir=/data/hadoop/mapred
– Define mapred.tasktracker.map.tasks.maximum=<Number of map
tasks>
– Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce
tasks>
● Start the TaskTrackers with service hadoop-tasktracker start

Monitoring
● Web Html scraping
– https://github.com/gerritjvv/hadoop-monitoring
● Glanglia
– http://ganglia.info/?p=88
● Cacti
– http://blog.cloudera.com/blog/2009/07/hadoop-graphing

Namenode Edits
● Writes/Updates/Deletes are written to RAM and
to a write ahead log.
● The metadata in RAM is only merged into a
binary file during the secondary namenode
checkpoint
● This file corrupts easily
● Recovery is a manual task

HA
● Yarn and Hadoop 2.0.0
● Experimental
● http://hadoop.apache.org/docs/current/hadoop-yarn

Hadoop Installation and basic configuration

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Hadoop Installation and basic configuration

Similaire à Hadoop Installation and basic configuration (20)

Dernier

Dernier (20)

Hadoop Installation and basic configuration