SlideShare une entreprise Scribd logo
1  sur  10
∞

UPGRADING HADOOP

UPGRADE

Upgrading Hadoop
Ways of upgrading Hadoop

INFINITY |DWIVEDISHASHWAT@GMAIL.COM | HTTP://HELPMETOCODE.BLOGSPOT.COM
Table of Contents

....................................................................... 0

I.
II.

UPGRADE ............................................................................................................ 0

III.

Table of Contents ...................................................................................................1

IV.

Upgrading Cluster with new Cluster- (Method 1) ................................................ 2
Objective:
Pre-requisites:
Process Flow:
Methods and Process Flow:
Pros and Cons:
Cons
Pros

V.

Upgrading Existing cluster inline- (Method 2) .................................................... 4
Objective:
Pre-requisites:
Common assumptions
Methods and Process Flow:
Pros and Cons:
Cons
Pros
Upgrading Cluster with new Cluster- Method 1

Objective:
Upgrade a cluster by configuring a new cluster with same capacity and newer Hadoop version and then migrating the
files from old cluster to new one.
Pre-requisites:
1.
2.
3.

Full-fledged running cluster.
A newly configured cluster with newer version with same amount of resources or better.
Methods to migrate file from older cluster to new one.

Process Flow:
Using CopyToLocal
And
CopyFromLocal
Existing Cluster
V 1.0

New Cluster
Migrate data using
Hadoop cp command
from one cluster to other

V2.0

Using Hadoop distcp to
copy data from one
cluster to other

Methods and Process Flow:
1. CopyToLocal /CopyFromLocal:
The process flows, the files are copied to local drive using Hadoop command CopyToLocal and the files are then
pushed to the new cluster using CopyFromLocal, and the older cluster can be decommissioned.
2. Using Hadoop CP command :
This is a kind of cluster to cluster copy, using Hadoop ‘cp’ command the files are transferred from one HDFS to
other HDFS. As the version are different we need a mechanism called copy from HFTP where the command is
executed from the target cluster by defining source as old cluster with HFTP protocol and target as HDFS protocol.
3. Using Hadoop DISTCP command:
This is a kind of cluster to cluster copy, using Hadoop ‘distcp’ command the files are transferred from one HDFS
to other HDFS. As the version are different we need a mechanism called copy from HFTP where the command is
executed from the target cluster by defining source as old cluster with HFTP protocol and target as HDFS protocol.
In addition map-reduce to be run the cluster (Job Tracker and the task tracker must be running on the both the
cluster). This is a faster approaches to migrate data from one cluster to other one.

UPGRADING HADOOP - OCTOBER 2013

2
Pros and Cons:

Cons:
Slow process.
Additional intermediate is required in case of copy to local and copy from local
Overhead of copying files in case of cp and distcp.
Pros:
Safe
Always old cluster is there as a backup
Online/No downtime is required

UPGRADING HADOOP - OCTOBER 2013

3
Upgrading Existing cluster inline- Method 2

Hadoop V1

Hadoop V2

HDFS  Upgraded Metadata HDFS

Objective:
Upgrading existing cluster from V1 to V2 inline by installing/configuring new cluster and updating metadata.
Pre-requisites:
1.
2.

Backed up Metadata.
Metadata at safe location, so that it can be restored in case upgrade process is not successful.

Common assumptions
Newer versions should provide automatic support and conversion of the older versions data structures.
Downgrades are not supported. In some cases, when e.g. data structure layouts are not affected by particular
version the downgrade may be possible. In general, Hadoop does not provide tools to convert data from newer
versions to older ones.
Different Hadoop components should be upgraded simultaneously.
Inter-version compatibility is not supported. In some cases when e.g. communication protocols remain
unchanged different versions of different components may be compatible. For example, Jobtracker v.0.4.0 can
communicate with Namenode v.0.3.2. In general, Hadoop does not guarantee compatibility of components of
different versions.
Points to keep in mind while upgrading!
IF FOLLOWING HAPPENS DURING UPGRADE THERE MAY BE FULL DATA LOSS
Hardware failure
Software errors, and
Human mistakes

UPGRADING HADOOP - OCTOBER 2013

4
Methods and Process Flow:
Stop map-reduce cluster(s) and all client applications running on the DFS cluster.
Stop DFS using the shutdown command.
Install new version of Hadoop software.
Start DFS cluster with -upgrade option.
Start map-reduce cluster.
Verify the components run properly and finalize the upgrade when convinced. This is done using the finalizeUpgrade option to the hadoop dfsadmin command.
Pros and Cons:

Cons:
Chance of data loss if not handled properly.
Requires downtime.
Business impact if 100% up time is required.
Rollback overhead in case of failure.
Pros:
No extra storage is required.
Upgrade Happens in line with metadata update.
Less time taken for data migration.

UPGRADING HADOOP - OCTOBER 2013

5
Step by step up gradation process
Link:
http://wiki.apache.org/hadoop/Hadoop_Upgrade
Upgrade is an important part of the lifecycle of any software system, especially a distributed multi-component system like
Hadoop. This is a step-by-step procedure a Hadoop cluster administrator should follow in order to safely transition the
cluster to a newer software version. This is a general procedure, for particular version specific instructions please
additionally refer to the release notes and version change descriptions.
The purpose of the procedure is to minimize damage to the data stored in Hadoop during upgrades, which could be a result
of the following three types of errors:
1.

Hardware failure is considered normal for the operation of the system, and should be handled by the software.

2.
3.

Software errors, and
Human mistakes

can lead to partial or complete data loss.
In our experience the worst damage to the system is incurred when as a result of a software or human mistake the name
node decides that some blocks/files are redundant and issues a command for data nodes to remove the blocks. Although a
lot has been done to prevent this behavior the scenario is still possible.

Common assumptions:
Newer versions should provide automatic support and conversion of the older versions data structures.
Downgrades are not supported. In some cases, when e.g. data structure layouts are not affected by particular
version the downgrade may be possible. In general, Hadoop does not provide tools to convert data from newer
versions to older ones.
Different Hadoop components should be upgraded simultaneously.
Inter-version compatibility is not supported. In some cases when e.g. communication protocols remain unchanged
different versions of different components may be compatible. For example, JobTracker v.0.4.0 can communicate
with NameNode v.0.3.2. In general, Hadoop does not guarantee compatibility of components of different versions.

Instructions:
1.

Stop

map-reduce

cluster(s)

bin/stop-mapred.sh
and all client applications running on the DFS cluster.
2.

Run fsck command:

bin/hadoop fsck / -files -blocks -locations > dfs-v-old-fsck-1.log
Fix DFS to the point there are no errors. The resulting file will contain complete block map of the file system.
Note. Redirecting the fsck output is recommend for large clusters in order to avoid time consuming output to
stdout.
3.

Run lsr command:

bin/hadoop dfs -lsr / > dfs-v-old-lsr-1.log
The resulting file will contain complete namespace of the file system.
4.

Run report command

to

create

a

list

of

data

nodes

participating

in

the

cluster.

bin/hadoop dfsadmin -report > dfs-v-old-report-1.log
5.
6.

Optionally, copy all or unrecoverable only data stored in DFS to a local file system or a backup instance of DFS.
Optionally, stop and restart DFS cluster, in order to create an up-to-date namespace checkpoint of the old version.

bin/stop-dfs.sh
bin/start-dfs.sh

UPGRADING HADOOP - OCTOBER 2013

6
7.

Optionally, repeat 3, 4, 5, and compare the results with the previous run to ensure the state of the file system
remained unchanged.

8.

Copy

the

following

checkpoint

files

into

a

backup

directory:

dfs.name.dir/edits
dfs.name.dir/image/fsimage
9.

Stop

DFS

cluster.

bin/stop-dfs.sh
Verify that DFS has really stopped, and there are no DataNode processes running on any nodes.
10. Install new version of Hadoop software. See GettingStartedWithHadoop and HowToConfigure for details.
11. Optionally, update the conf/slaves file before starting, to reflect the current set of active nodes.
12. Optionally, change the configuration of the name node’s and the job tracker’s port numbers, to ignore unreachable
nodes that are running the old version, preventing them from connecting and disrupting system operation.

fs.default.name
mapred.job.tracker
13. Optionally,

start

name

node

only.

bin/hadoop-daemon.sh start namenode -upgrade
This should convert the checkpoint to the new version format.
run lsr command:

14. Optionally,

bin/hadoop dfs -lsr / > dfs-v-new-lsr-0.log
and compare with dfs-v-old-lsr-1.log
15. Start

DFS

cluster.

bin/start-dfs.sh
16. Run

report

command:

bin/hadoop dfsadmin -report > dfs-v-new-report-1.log
and compare with dfs-v-old-report-1.log to ensure all data nodes previously belonging to the cluster
are up and running.
17. Run lsr command:

bin/hadoop dfs -lsr / > dfs-v-new-lsr-1.log
and compare with dfs-v-old-lsr-1.log. These files should be identical unless the format
of lsr reporting or the data structures have changed in the new version.
18. Run fsck command:
bin/hadoop fsck / -files -blocks -locations > dfs-v-new-fsck-1.log
and compare with dfs-v-old-fsck-1.log. These files should be identical, unless the fsck reporting
format has changed in the new version.
19. Start

map-reduce

cluster

bin/start-mapred.sh
In case of failure the administrator should have the checkpoint files in order to be able to repeat the procedure from the
appropriate point or to restart the old version of Hadoop. The *.log files should help in investigating what went wrong
during the upgrade.

Enhancements:
This is a list of enhancements intended to simplify the upgrade procedure and to make the upgrade safer in general.
1.
2.

A shutdown function is required for Hadoop that would cleanly shut down the cluster, merging edits into the
image, avoiding the restart-DFS phase.
The safe mode implementation will further help to prevent name node from voluntary decisions on block deletion
and replication.

3.

A faster fsck is required. Currently fsck processes 1-2 TB per minute.

4.

Hadoop should provide a backup solution as a stand alone application.

5.
6.

Introduce an explicit -upgrade option for DFS (See below) and a related
finalize upgrade command.

Shutdown command:
During the shutdown the name node performs the following actions.
UPGRADING HADOOP - OCTOBER 2013

7
It locks the namespace for further modifications and waits for active leases to expire, and pending block
replications and deletions to complete.
Runs fsck, and optionally saves the result in a file provided.
Checkpoints and replicates the namespace image.
Sends shutdown command to all data nodes and verifies they actually turned themselves off by waiting for as long
as 5 heartbeat intervals during which no heartbeats should be reported.
Stops all running threads and terminates itself.

Upgrade option for DFS:
The main idea of upgrade is that each version that modifies data structures on disk has its own distinct working directory.
For instance, we'd have a "v0.6" and a “v0.7” directory for the name node and for all data nodes. These version directories
will be automatically created when a particular file system version is brought up for the first time. If DFS is started with the upgrade option the new file system version will do the following:
The name node will start in the read-only mode and will read in the old version checkpoint converting it to the new
format.
Create a new working directory corresponding to the new version and save the new image into it. The old
checkpoint will remain untouched in the working directory corresponding to the old version.
The name node will pass the upgrade request to the data nodes.
Each data node will create a working directory corresponding to the new version. If there is metadata in side files it
will be re-generated in the new working directory.
Then the data node will hard link blocks from the old working directory to the new one. The existing blocks will
remain untouched in their old directories.
The data node will confirm the upgrade and send its new block report to the name node.
Once the name node received the upgrade confirmations from all data nodes it will run the fsck and then switch
to the normal mode when it’s ready to serve clients’ requests.
This ensures that a snapshot of the old data is preserved until the new version is validated and tested to function properly.
Following the upgrade the file system can be run for a week or so to gain confidence. It can be rolled back to the
old snapshot if it breaks, or the upgrade can be “finalized” by admin using the “finalize upgrade” command, which would
remove old version working directories.
Care must be taken to deal with data nodes that are missing during the upgrade stage. In order to deal with such nodes the
name node should store the list of data nodes that have completed the upgrade, and reject data nodes that did not confirm
the upgrade.
When DFS will allow modification of blocks, this will require copying blocks into the current version working directory before
modifying them.
Linking allows the data from several versions of Hadoop to coexist and even evolve on the same hardware without
duplicating common parts.

Finalize Upgrade:
When the Hadoop administrator is convinced that the new version works properly he/she/it can issue a “finalize upgrade”
request.
The finalize request is first passed to the data nodes so that they could remove their previous version working
directories with all block files. This does not necessarily lead to physical removal of the blocks as long as they still
are referenced from the new version.
When the name node receives confirmation from all data nodes that current upgrade is finalized it will remove its
own old version directory and the checkpoint in it thus completing the upgrade and making it permanent.
The finalize upgrade procedure can run in the background without disrupting the cluster performance. Being in finalize mode
the name node will periodically verify confirmations from the data nodes and finalize itself when the load is light.

UPGRADING HADOOP - OCTOBER 2013

8
Simplified Upgrade Procedure:
The new utilities will substantially simplify the upgrade procedure:
1.
2.
3.
4.
5.
6.

Stop map-reduce cluster(s) and all client applications running on the DFS cluster.
Stop DFS using the shutdown command.
Install new version of Hadoop software.
Start DFS cluster with -upgrade option.
Start map-reduce cluster.
Verify the components run properly and finalize the upgrade when convinced. This is done using the finalizeUpgrade option to the hadoop dfsadmin command.

UPGRADING HADOOP - OCTOBER 2013

9

Contenu connexe

Tendances

Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoopjijukjoseph
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 

Tendances (20)

Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 

Similaire à Upgrading hadoop

Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksEdureka!
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...Leons Petražickis
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Run a mapreduce job
Run a mapreduce jobRun a mapreduce job
Run a mapreduce jobsubburaj raj
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 

Similaire à Upgrading hadoop (20)

Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
 
Unit 5
Unit  5Unit  5
Unit 5
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
Unit 1
Unit 1Unit 1
Unit 1
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Run a mapreduce job
Run a mapreduce jobRun a mapreduce job
Run a mapreduce job
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 

Plus de Shashwat Shriparv (20)

Learning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptxLearning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptx
 
LibreOffice 7.3.pptx
LibreOffice 7.3.pptxLibreOffice 7.3.pptx
LibreOffice 7.3.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Suspending a Process in Linux.pptx
Suspending a Process in Linux.pptxSuspending a Process in Linux.pptx
Suspending a Process in Linux.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Command Seperators.pptx
Command Seperators.pptxCommand Seperators.pptx
Command Seperators.pptx
 
R language introduction
R language introductionR language introduction
R language introduction
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Hbase interact with shell
Hbase interact with shellHbase interact with shell
Hbase interact with shell
 
H base development
H base developmentH base development
H base development
 
Hbase
HbaseHbase
Hbase
 
H base
H baseH base
H base
 
My sql
My sqlMy sql
My sql
 
Apache tomcat
Apache tomcatApache tomcat
Apache tomcat
 
Linux 4 you
Linux 4 youLinux 4 you
Linux 4 you
 
Java interview questions
Java interview questionsJava interview questions
Java interview questions
 
C# interview quesions
C# interview quesionsC# interview quesions
C# interview quesions
 
I pv6
I pv6I pv6
I pv6
 
Inventory system
Inventory systemInventory system
Inventory system
 

Dernier

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Dernier (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Upgrading hadoop

  • 1. ∞ UPGRADING HADOOP UPGRADE Upgrading Hadoop Ways of upgrading Hadoop INFINITY |DWIVEDISHASHWAT@GMAIL.COM | HTTP://HELPMETOCODE.BLOGSPOT.COM
  • 2. Table of Contents ....................................................................... 0 I. II. UPGRADE ............................................................................................................ 0 III. Table of Contents ...................................................................................................1 IV. Upgrading Cluster with new Cluster- (Method 1) ................................................ 2 Objective: Pre-requisites: Process Flow: Methods and Process Flow: Pros and Cons: Cons Pros V. Upgrading Existing cluster inline- (Method 2) .................................................... 4 Objective: Pre-requisites: Common assumptions Methods and Process Flow: Pros and Cons: Cons Pros
  • 3. Upgrading Cluster with new Cluster- Method 1 Objective: Upgrade a cluster by configuring a new cluster with same capacity and newer Hadoop version and then migrating the files from old cluster to new one. Pre-requisites: 1. 2. 3. Full-fledged running cluster. A newly configured cluster with newer version with same amount of resources or better. Methods to migrate file from older cluster to new one. Process Flow: Using CopyToLocal And CopyFromLocal Existing Cluster V 1.0 New Cluster Migrate data using Hadoop cp command from one cluster to other V2.0 Using Hadoop distcp to copy data from one cluster to other Methods and Process Flow: 1. CopyToLocal /CopyFromLocal: The process flows, the files are copied to local drive using Hadoop command CopyToLocal and the files are then pushed to the new cluster using CopyFromLocal, and the older cluster can be decommissioned. 2. Using Hadoop CP command : This is a kind of cluster to cluster copy, using Hadoop ‘cp’ command the files are transferred from one HDFS to other HDFS. As the version are different we need a mechanism called copy from HFTP where the command is executed from the target cluster by defining source as old cluster with HFTP protocol and target as HDFS protocol. 3. Using Hadoop DISTCP command: This is a kind of cluster to cluster copy, using Hadoop ‘distcp’ command the files are transferred from one HDFS to other HDFS. As the version are different we need a mechanism called copy from HFTP where the command is executed from the target cluster by defining source as old cluster with HFTP protocol and target as HDFS protocol. In addition map-reduce to be run the cluster (Job Tracker and the task tracker must be running on the both the cluster). This is a faster approaches to migrate data from one cluster to other one. UPGRADING HADOOP - OCTOBER 2013 2
  • 4. Pros and Cons: Cons: Slow process. Additional intermediate is required in case of copy to local and copy from local Overhead of copying files in case of cp and distcp. Pros: Safe Always old cluster is there as a backup Online/No downtime is required UPGRADING HADOOP - OCTOBER 2013 3
  • 5. Upgrading Existing cluster inline- Method 2 Hadoop V1 Hadoop V2 HDFS  Upgraded Metadata HDFS Objective: Upgrading existing cluster from V1 to V2 inline by installing/configuring new cluster and updating metadata. Pre-requisites: 1. 2. Backed up Metadata. Metadata at safe location, so that it can be restored in case upgrade process is not successful. Common assumptions Newer versions should provide automatic support and conversion of the older versions data structures. Downgrades are not supported. In some cases, when e.g. data structure layouts are not affected by particular version the downgrade may be possible. In general, Hadoop does not provide tools to convert data from newer versions to older ones. Different Hadoop components should be upgraded simultaneously. Inter-version compatibility is not supported. In some cases when e.g. communication protocols remain unchanged different versions of different components may be compatible. For example, Jobtracker v.0.4.0 can communicate with Namenode v.0.3.2. In general, Hadoop does not guarantee compatibility of components of different versions. Points to keep in mind while upgrading! IF FOLLOWING HAPPENS DURING UPGRADE THERE MAY BE FULL DATA LOSS Hardware failure Software errors, and Human mistakes UPGRADING HADOOP - OCTOBER 2013 4
  • 6. Methods and Process Flow: Stop map-reduce cluster(s) and all client applications running on the DFS cluster. Stop DFS using the shutdown command. Install new version of Hadoop software. Start DFS cluster with -upgrade option. Start map-reduce cluster. Verify the components run properly and finalize the upgrade when convinced. This is done using the finalizeUpgrade option to the hadoop dfsadmin command. Pros and Cons: Cons: Chance of data loss if not handled properly. Requires downtime. Business impact if 100% up time is required. Rollback overhead in case of failure. Pros: No extra storage is required. Upgrade Happens in line with metadata update. Less time taken for data migration. UPGRADING HADOOP - OCTOBER 2013 5
  • 7. Step by step up gradation process Link: http://wiki.apache.org/hadoop/Hadoop_Upgrade Upgrade is an important part of the lifecycle of any software system, especially a distributed multi-component system like Hadoop. This is a step-by-step procedure a Hadoop cluster administrator should follow in order to safely transition the cluster to a newer software version. This is a general procedure, for particular version specific instructions please additionally refer to the release notes and version change descriptions. The purpose of the procedure is to minimize damage to the data stored in Hadoop during upgrades, which could be a result of the following three types of errors: 1. Hardware failure is considered normal for the operation of the system, and should be handled by the software. 2. 3. Software errors, and Human mistakes can lead to partial or complete data loss. In our experience the worst damage to the system is incurred when as a result of a software or human mistake the name node decides that some blocks/files are redundant and issues a command for data nodes to remove the blocks. Although a lot has been done to prevent this behavior the scenario is still possible. Common assumptions: Newer versions should provide automatic support and conversion of the older versions data structures. Downgrades are not supported. In some cases, when e.g. data structure layouts are not affected by particular version the downgrade may be possible. In general, Hadoop does not provide tools to convert data from newer versions to older ones. Different Hadoop components should be upgraded simultaneously. Inter-version compatibility is not supported. In some cases when e.g. communication protocols remain unchanged different versions of different components may be compatible. For example, JobTracker v.0.4.0 can communicate with NameNode v.0.3.2. In general, Hadoop does not guarantee compatibility of components of different versions. Instructions: 1. Stop map-reduce cluster(s) bin/stop-mapred.sh and all client applications running on the DFS cluster. 2. Run fsck command: bin/hadoop fsck / -files -blocks -locations > dfs-v-old-fsck-1.log Fix DFS to the point there are no errors. The resulting file will contain complete block map of the file system. Note. Redirecting the fsck output is recommend for large clusters in order to avoid time consuming output to stdout. 3. Run lsr command: bin/hadoop dfs -lsr / > dfs-v-old-lsr-1.log The resulting file will contain complete namespace of the file system. 4. Run report command to create a list of data nodes participating in the cluster. bin/hadoop dfsadmin -report > dfs-v-old-report-1.log 5. 6. Optionally, copy all or unrecoverable only data stored in DFS to a local file system or a backup instance of DFS. Optionally, stop and restart DFS cluster, in order to create an up-to-date namespace checkpoint of the old version. bin/stop-dfs.sh bin/start-dfs.sh UPGRADING HADOOP - OCTOBER 2013 6
  • 8. 7. Optionally, repeat 3, 4, 5, and compare the results with the previous run to ensure the state of the file system remained unchanged. 8. Copy the following checkpoint files into a backup directory: dfs.name.dir/edits dfs.name.dir/image/fsimage 9. Stop DFS cluster. bin/stop-dfs.sh Verify that DFS has really stopped, and there are no DataNode processes running on any nodes. 10. Install new version of Hadoop software. See GettingStartedWithHadoop and HowToConfigure for details. 11. Optionally, update the conf/slaves file before starting, to reflect the current set of active nodes. 12. Optionally, change the configuration of the name node’s and the job tracker’s port numbers, to ignore unreachable nodes that are running the old version, preventing them from connecting and disrupting system operation. fs.default.name mapred.job.tracker 13. Optionally, start name node only. bin/hadoop-daemon.sh start namenode -upgrade This should convert the checkpoint to the new version format. run lsr command: 14. Optionally, bin/hadoop dfs -lsr / > dfs-v-new-lsr-0.log and compare with dfs-v-old-lsr-1.log 15. Start DFS cluster. bin/start-dfs.sh 16. Run report command: bin/hadoop dfsadmin -report > dfs-v-new-report-1.log and compare with dfs-v-old-report-1.log to ensure all data nodes previously belonging to the cluster are up and running. 17. Run lsr command: bin/hadoop dfs -lsr / > dfs-v-new-lsr-1.log and compare with dfs-v-old-lsr-1.log. These files should be identical unless the format of lsr reporting or the data structures have changed in the new version. 18. Run fsck command: bin/hadoop fsck / -files -blocks -locations > dfs-v-new-fsck-1.log and compare with dfs-v-old-fsck-1.log. These files should be identical, unless the fsck reporting format has changed in the new version. 19. Start map-reduce cluster bin/start-mapred.sh In case of failure the administrator should have the checkpoint files in order to be able to repeat the procedure from the appropriate point or to restart the old version of Hadoop. The *.log files should help in investigating what went wrong during the upgrade. Enhancements: This is a list of enhancements intended to simplify the upgrade procedure and to make the upgrade safer in general. 1. 2. A shutdown function is required for Hadoop that would cleanly shut down the cluster, merging edits into the image, avoiding the restart-DFS phase. The safe mode implementation will further help to prevent name node from voluntary decisions on block deletion and replication. 3. A faster fsck is required. Currently fsck processes 1-2 TB per minute. 4. Hadoop should provide a backup solution as a stand alone application. 5. 6. Introduce an explicit -upgrade option for DFS (See below) and a related finalize upgrade command. Shutdown command: During the shutdown the name node performs the following actions. UPGRADING HADOOP - OCTOBER 2013 7
  • 9. It locks the namespace for further modifications and waits for active leases to expire, and pending block replications and deletions to complete. Runs fsck, and optionally saves the result in a file provided. Checkpoints and replicates the namespace image. Sends shutdown command to all data nodes and verifies they actually turned themselves off by waiting for as long as 5 heartbeat intervals during which no heartbeats should be reported. Stops all running threads and terminates itself. Upgrade option for DFS: The main idea of upgrade is that each version that modifies data structures on disk has its own distinct working directory. For instance, we'd have a "v0.6" and a “v0.7” directory for the name node and for all data nodes. These version directories will be automatically created when a particular file system version is brought up for the first time. If DFS is started with the upgrade option the new file system version will do the following: The name node will start in the read-only mode and will read in the old version checkpoint converting it to the new format. Create a new working directory corresponding to the new version and save the new image into it. The old checkpoint will remain untouched in the working directory corresponding to the old version. The name node will pass the upgrade request to the data nodes. Each data node will create a working directory corresponding to the new version. If there is metadata in side files it will be re-generated in the new working directory. Then the data node will hard link blocks from the old working directory to the new one. The existing blocks will remain untouched in their old directories. The data node will confirm the upgrade and send its new block report to the name node. Once the name node received the upgrade confirmations from all data nodes it will run the fsck and then switch to the normal mode when it’s ready to serve clients’ requests. This ensures that a snapshot of the old data is preserved until the new version is validated and tested to function properly. Following the upgrade the file system can be run for a week or so to gain confidence. It can be rolled back to the old snapshot if it breaks, or the upgrade can be “finalized” by admin using the “finalize upgrade” command, which would remove old version working directories. Care must be taken to deal with data nodes that are missing during the upgrade stage. In order to deal with such nodes the name node should store the list of data nodes that have completed the upgrade, and reject data nodes that did not confirm the upgrade. When DFS will allow modification of blocks, this will require copying blocks into the current version working directory before modifying them. Linking allows the data from several versions of Hadoop to coexist and even evolve on the same hardware without duplicating common parts. Finalize Upgrade: When the Hadoop administrator is convinced that the new version works properly he/she/it can issue a “finalize upgrade” request. The finalize request is first passed to the data nodes so that they could remove their previous version working directories with all block files. This does not necessarily lead to physical removal of the blocks as long as they still are referenced from the new version. When the name node receives confirmation from all data nodes that current upgrade is finalized it will remove its own old version directory and the checkpoint in it thus completing the upgrade and making it permanent. The finalize upgrade procedure can run in the background without disrupting the cluster performance. Being in finalize mode the name node will periodically verify confirmations from the data nodes and finalize itself when the load is light. UPGRADING HADOOP - OCTOBER 2013 8
  • 10. Simplified Upgrade Procedure: The new utilities will substantially simplify the upgrade procedure: 1. 2. 3. 4. 5. 6. Stop map-reduce cluster(s) and all client applications running on the DFS cluster. Stop DFS using the shutdown command. Install new version of Hadoop software. Start DFS cluster with -upgrade option. Start map-reduce cluster. Verify the components run properly and finalize the upgrade when convinced. This is done using the finalizeUpgrade option to the hadoop dfsadmin command. UPGRADING HADOOP - OCTOBER 2013 9