SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
View Hadoop Administration course details at www.edureka.co/hadoop-admin
Top 5 Hadoop admin tasks
www.edureka.co/hadoop-adminSlide 2
Objectives of this Session
At the end of this module, you will be able to
Understand Cluster Planning
Understand Hadoop fully distributed cluster set up
Add further nodes to the running cluster
Upgrade existing Hadoop cluster
Understand name node High availability
www.edureka.co/hadoop-adminSlide 3
Why Hadoop Administration
www.edureka.co/hadoop-adminSlide 4
With the Rise of Hadoop Adoption and usage across various industries, the role of Hadoop Administrator has
become very important and is in demand.
Hadoop Administrator
www.edureka.co/hadoop-adminSlide 5
Hadoop Administration Responsibilities
www.edureka.co/hadoop-adminSlide 6
HDFS Support & Maintenance Monitor Hadoop ClusterProviding Security
Integrating Different Frameworks Hadoop Infrastructure Maintenance
Hadoop Admin Responsibilities
www.edureka.co/hadoop-adminSlide 7
Top 5 Hadoop Admin Tasks
www.edureka.co/hadoop-adminSlide 8
Top 5 Hadoop Admin Tasks
Task-1
Cluster Planning
Task-2
Hadoop Cluster set up Hadoop Version upgrade
Task-3
Adding or Removing Nodes to Cluster Providing High Availability to Cluster
Task-4 Task-5
www.edureka.co/hadoop-adminSlide 9
Cluster Planning
Task-1
www.edureka.co/hadoop-adminSlide 10
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
StandBy NameNode
Optional
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
DataNode
DataNode DataNode DataNode
www.edureka.co/hadoop-adminSlide 11
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately
5TB per week
HDFS set up to replicate each
block three times
Thus, 15TB of extra storage
space required per week
Assuming machines with 5x3TB
hard drives, equating to a new
machine required each week
Assume Overheads to be 30%
www.edureka.co/hadoop-adminSlide 12
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
 General ( Depends on requirement
‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a
JBOD* configuration
» Do not use RAID!
» 2 x Quad-core CPUs
» 24 -32GB RAM
» Gigabit Ethernet
General Configuration
 Multiples of ( 1 hard drive + 2 cores
+ 6-8GB RAM) generally work well
for many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 13
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce task
will take 1GB to 2GB of RAM
Slave nodes should not be using
virtual memory
RULE OF THUMB!
Total number of tasks = 1.5 x number
of processor core
Ensure enough RAM is present to
run all tasks, plus the DataNode,
TaskTracker daemons, plus the
operating system
www.edureka.co/hadoop-adminSlide 14
Master Node Hardware Recommendations
Carrier-class hardware
(Not commodity hardware)
Dual power supplies
Dual Ethernet cards
(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master
Node
Requires
www.edureka.co/hadoop-adminSlide 15
Hadoop Cluster Set up
Task-2
www.edureka.co/hadoop-adminSlide 16
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM
 Suitable for running MapReduce programs during development
 Has no DFS
 Hadoop daemons run on the local machine
 Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 17
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map
Reduce
Hadoop 2.x Configuration Files – Apache Hadoop
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 18
Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh
yarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and
YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
www.edureka.co/hadoop-adminSlide 19
Hadoop Daemons
NameNode daemon
» Runs on master node of the Hadoop Distributed File System (HDFS)
» Directs Data Nodes to perform their low-level I/O tasks
DataNode daemon
» Runs on each slave machine in the HDFS
» Does the low-level I/O work
Resource Manager
» Runs on master node of the Data processing System(MapReduce)
» Global resource Scheduler
Node Manager
» Runs on each slave node of Data processing System
» Platform for the Data processing tasks
Job HistoryServer
» JobHistoryServer is responsible for servicing all job history related requests from client
www.edureka.co/hadoop-adminSlide 20
Hadoop 1.x and Hadoop 2.x Ecosystem
Pig Latin
Data Analysis
Hive
DW System
Other
YARN
Frameworks
(MPI, GIRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
HBase
Structured DataUnstructured/
Semi-structured Data
Hadoop 1.x Hadoop 2.x
www.edureka.co/hadoop-adminSlide 21
Demo On Hadoop Cluster Set Up
www.edureka.co/hadoop-adminSlide 22
Hadoop Version upgrade
Task-3
www.edureka.co/hadoop-adminSlide 23
Stop map-reduce cluster and all client applications running on the DFS cluster
Take the back up of File System Name Space
Install new version of Hadoop software
Update the all configuration files in new Hadoop
start name node with Upgrade command
Compare the new HDFS file system with previous version file system name space
finalize upgrade.
Hadoop Version Upgrade
www.edureka.co/hadoop-adminSlide 24
1) Run Report
• FSCK
• LSR
• DFSADMIN
2) Take Back up
• Configuration
• Applications
• Data and Meta Data
3) Install new Version of Hadoop
4) Upgrade
hadoop-daemon.sh start namenode -upgrade
Hadoop Version Upgrade
5) Run New Reports
• FSCK
• LSR
• DFSADMIN
Compare old and new Reports
Test new Cluster
6) Finalize upgrade
• hadoop dfsadmin -finalizeUpgrade
www.edureka.co/hadoop-adminSlide 25
Adding or Removing Nodes
from Cluster
Task-4
www.edureka.co/hadoop-adminSlide 26
Commissioning and Decommissioning of DataNode
DataNode
Master Node
DataNode
DataNode DataNode DataNode
DataNodeDataNode
DataNode
DecommissioningCommissioning
www.edureka.co/hadoop-adminSlide 27
Add (Commission) DataNodes
Update the network
addresses in the
‘include’ files
dfs.include
mapred.include
Update the
NameNode:
hadoop dfsadmin
-refreshNodes
Update the Job
Tracker:
hadoop mradmin
-refreshNodes Update the
‘slaves’ file
Start the DataNode
and TaskTracker
hadoop-daemon.sh
start tasktracker
hadoop-daemon.sh
start datanode
Cross Check the Web
6 UI to ensure the
successful addition
Run Balancer to
7 move the HDFS
blocks to
DataNodes
1 2 3
4
5
www.edureka.co/hadoop-adminSlide 28
Demo On Commissioning Data Node
www.edureka.co/hadoop-adminSlide 29
Providing High Availability to Cluster
Task-5
www.edureka.co/hadoop-adminSlide 30
Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a
single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable
until the NameNode was either restarted or brought up on a separate machine.
Achieve the High Availability in two different ways
 HA using the Quorum Journal Manager (QJM) to share edit logs between the Active and Standby NameNodes.
 HA using NFS for shared storage instead of the QJM
High Availability (HA)
www.edureka.co/hadoop-adminSlide 31
Slave NodeSlave NodeSlave Node
Standby NodeActive Node
Journal Nodes
(Shared Edits)
Failover Controller
Standby
Failover Controller
Active
Zookeeper Service
Block Report & Heart
beat
Monitor status and
health. Manage HA
state
HA Architecture
Monitor status and
health. Manage HA
state
Write Read
www.edureka.co/hadoop-adminSlide 32
Demo On NameNode High Availability
www.edureka.co/hadoop-adminSlide 33
Hadoop admin Job Trends
Questions
www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Introduction to hadoop administration   jk

Contenu connexe

Tendances

Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guideChetan Khatri
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 

Tendances (20)

Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

En vedette

Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and developmentconline training
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 

En vedette (6)

Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 

Similaire à Introduction to hadoop administration jk

Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksEdureka!
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudLeons Petražickis
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 

Similaire à Introduction to hadoop administration jk (20)

Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 

Plus de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Plus de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Introduction to hadoop administration jk

  • 1. View Hadoop Administration course details at www.edureka.co/hadoop-admin Top 5 Hadoop admin tasks
  • 2. www.edureka.co/hadoop-adminSlide 2 Objectives of this Session At the end of this module, you will be able to Understand Cluster Planning Understand Hadoop fully distributed cluster set up Add further nodes to the running cluster Upgrade existing Hadoop cluster Understand name node High availability
  • 4. www.edureka.co/hadoop-adminSlide 4 With the Rise of Hadoop Adoption and usage across various industries, the role of Hadoop Administrator has become very important and is in demand. Hadoop Administrator
  • 6. www.edureka.co/hadoop-adminSlide 6 HDFS Support & Maintenance Monitor Hadoop ClusterProviding Security Integrating Different Frameworks Hadoop Infrastructure Maintenance Hadoop Admin Responsibilities
  • 8. www.edureka.co/hadoop-adminSlide 8 Top 5 Hadoop Admin Tasks Task-1 Cluster Planning Task-2 Hadoop Cluster set up Hadoop Version upgrade Task-3 Adding or Removing Nodes to Cluster Providing High Availability to Cluster Task-4 Task-5
  • 10. www.edureka.co/hadoop-adminSlide 10 RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Hadoop Cluster: A Typical Use Case RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores. Ethernet: 3 x 10 GB/s OS: 64-bit CentOS RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply Active NameNodeSecondary NameNode DataNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply StandBy NameNode Optional RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS DataNode DataNode DataNode DataNode
  • 11. www.edureka.co/hadoop-adminSlide 11 Seeking cluster growth on storage capacity is often a good method to use! Cluster Growth Based On Storage Capacity Data grows by approximately 5TB per week HDFS set up to replicate each block three times Thus, 15TB of extra storage space required per week Assuming machines with 5x3TB hard drives, equating to a new machine required each week Assume Overheads to be 30%
  • 12. www.edureka.co/hadoop-adminSlide 12 Slave Nodes: Recommended Configuration Higher-performance vs lower performance components Save the Money, Buy more Nodes!  General ( Depends on requirement ‘base’ configuration for a slave Node » 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration » Do not use RAID! » 2 x Quad-core CPUs » 24 -32GB RAM » Gigabit Ethernet General Configuration  Multiples of ( 1 hard drive + 2 cores + 6-8GB RAM) generally work well for many types of applications Special Configuration Slave Nodes “A cluster with more nodes performs better than one with fewer, slightly faster nodes”
  • 13. www.edureka.co/hadoop-adminSlide 13 Slave Nodes: More Details (RAM) Slave Nodes (RAM) Generally each Map or Reduce task will take 1GB to 2GB of RAM Slave nodes should not be using virtual memory RULE OF THUMB! Total number of tasks = 1.5 x number of processor core Ensure enough RAM is present to run all tasks, plus the DataNode, TaskTracker daemons, plus the operating system
  • 14. www.edureka.co/hadoop-adminSlide 14 Master Node Hardware Recommendations Carrier-class hardware (Not commodity hardware) Dual power supplies Dual Ethernet cards (Bonded to provide failover) Raided hard drives At least 32GB of RAM Master Node Requires
  • 16. www.edureka.co/hadoop-adminSlide 16 Hadoop Cluster Modes Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM  Suitable for running MapReduce programs during development  Has no DFS  Hadoop daemons run on the local machine  Hadoop daemons run on a cluster of machines Standalone (or Local) Mode
  • 18. www.edureka.co/hadoop-adminSlide 18 Configuration Files Configuration Filenames Description of Log Files hadoop-env.sh yarn-env.sh Settings for Hadoop Daemon’s process environment. core-site.xml Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN. hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes. yarn-site.xml Configuration setting for Resource Manager and Node Manager. mapred-site.xml Configuration settings for MapReduce Applications. slaves A list of machines (one per line) that each run DataNode and Node Manager.
  • 19. www.edureka.co/hadoop-adminSlide 19 Hadoop Daemons NameNode daemon » Runs on master node of the Hadoop Distributed File System (HDFS) » Directs Data Nodes to perform their low-level I/O tasks DataNode daemon » Runs on each slave machine in the HDFS » Does the low-level I/O work Resource Manager » Runs on master node of the Data processing System(MapReduce) » Global resource Scheduler Node Manager » Runs on each slave node of Data processing System » Platform for the Data processing tasks Job HistoryServer » JobHistoryServer is responsible for servicing all job history related requests from client
  • 20. www.edureka.co/hadoop-adminSlide 20 Hadoop 1.x and Hadoop 2.x Ecosystem Pig Latin Data Analysis Hive DW System Other YARN Frameworks (MPI, GIRAPH) HBaseMapReduce Framework YARN Cluster Resource Management Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis HBase Structured DataUnstructured/ Semi-structured Data Hadoop 1.x Hadoop 2.x
  • 23. www.edureka.co/hadoop-adminSlide 23 Stop map-reduce cluster and all client applications running on the DFS cluster Take the back up of File System Name Space Install new version of Hadoop software Update the all configuration files in new Hadoop start name node with Upgrade command Compare the new HDFS file system with previous version file system name space finalize upgrade. Hadoop Version Upgrade
  • 24. www.edureka.co/hadoop-adminSlide 24 1) Run Report • FSCK • LSR • DFSADMIN 2) Take Back up • Configuration • Applications • Data and Meta Data 3) Install new Version of Hadoop 4) Upgrade hadoop-daemon.sh start namenode -upgrade Hadoop Version Upgrade 5) Run New Reports • FSCK • LSR • DFSADMIN Compare old and new Reports Test new Cluster 6) Finalize upgrade • hadoop dfsadmin -finalizeUpgrade
  • 25. www.edureka.co/hadoop-adminSlide 25 Adding or Removing Nodes from Cluster Task-4
  • 26. www.edureka.co/hadoop-adminSlide 26 Commissioning and Decommissioning of DataNode DataNode Master Node DataNode DataNode DataNode DataNode DataNodeDataNode DataNode DecommissioningCommissioning
  • 27. www.edureka.co/hadoop-adminSlide 27 Add (Commission) DataNodes Update the network addresses in the ‘include’ files dfs.include mapred.include Update the NameNode: hadoop dfsadmin -refreshNodes Update the Job Tracker: hadoop mradmin -refreshNodes Update the ‘slaves’ file Start the DataNode and TaskTracker hadoop-daemon.sh start tasktracker hadoop-daemon.sh start datanode Cross Check the Web 6 UI to ensure the successful addition Run Balancer to 7 move the HDFS blocks to DataNodes 1 2 3 4 5
  • 29. www.edureka.co/hadoop-adminSlide 29 Providing High Availability to Cluster Task-5
  • 30. www.edureka.co/hadoop-adminSlide 30 Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. Achieve the High Availability in two different ways  HA using the Quorum Journal Manager (QJM) to share edit logs between the Active and Standby NameNodes.  HA using NFS for shared storage instead of the QJM High Availability (HA)
  • 31. www.edureka.co/hadoop-adminSlide 31 Slave NodeSlave NodeSlave Node Standby NodeActive Node Journal Nodes (Shared Edits) Failover Controller Standby Failover Controller Active Zookeeper Service Block Report & Heart beat Monitor status and health. Manage HA state HA Architecture Monitor status and health. Manage HA state Write Read
  • 32. www.edureka.co/hadoop-adminSlide 32 Demo On NameNode High Availability
  • 34. Questions www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions