SlideShare une entreprise Scribd logo
1  sur  28
www.edureka.in/hadoop-admin
How It Works…
 LIVE classes
 Class recordings
 Module wise Quizzes and Practical Assignments
 24x7 on-demand technical support
 Deployment of different clusters
 Online certification exam
 Lifetime access to the Learning Management System

www.edureka.in/hadoop-admin
Course Topics
 Week 1
–
–
–

Understanding Big Data
Hadoop Components
Introduction to Hadoop 2.0

 Week 2
–
–
–

Hadoop 2.0
Hadoop Configuration
Hadoop Cluster Architecture

 Week 3
–
–
–

Different Hadoop Server Roles
Data processing flow
Cluster Network Configuration

 Week 4
–
–
–

Job Scheduling
Fair Scheduler
Monitoring a Hadoop Cluster

 Week 5
–
–
–

Securing your Hadoop Cluster
Kerberos and HDFS Federation
Backup and Recovery

 Week 6
–
–
–

Oozie and Hive Administration
HBase Architecture
HBase Administration
www.edureka.in/hadoop-admin
Topics for Today
 What is Big Data?
 Limitations of the existing solutions
 Solving the problem with Hadoop
 Introduction to Hadoop

 Hadoop Eco-System
 Hadoop Core Components
 MapReduce software framework
 Hadoop Cluster Administrator: Roles and Responsibilities
 Introduction to Hadoop 2.0

www.edureka.in/hadoop-admin
What Is Big Data?
 Lots of Data (Terabytes or Petabytes).
 Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of
information.

A airline jet collects 10 terabytes of sensor data
for every 30 minutes of flying time.

NYSE generates about one terabyte of new trade data
per day to Perform stock trading analytics to determine
trends for optimal trades.
www.edureka.in/hadoop-admin
IBM’s Definition
 IBM’s definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/

Characteristics Of Big Data

Volume

Velocity

Variety

12 Terabytes of
tweets created
each day

Scrutinizes 5 million
trade events created
each day to identify
potential fraud

Sensor
data, audio, video, cli
ck streams, log files
and more
www.edureka.in/hadoop-admin
Data Volume Is Growing Exponentially


Estimated Global Data Volume:
 2011: 1.8 ZB
 2015: 7.9 ZB



The world's information doubles every two years



Over the next 10 years:
 The number of servers worldwide will grow by 10x

 Amount of information managed by enterprise data
centers will grow by 50x
 Number of “files” enterprise data center handle will
grow by 75x

Source: http://www.emc.com/leadership/programs/digital-universe.htm,
which was based on the 2011 IDC Digital Universe Study

www.edureka.in/hadoop-admin
What Big Companies Have To Say…
“Analyzing Big Data sets will become a key basis for competition.”

McKinsey

“Leaders in every sector will have to grapple the implications of Big Data.”

“Big Data analytics are rapidly emerging as the preferred solution to business and
technology trends that are disrupting.”

Gartner

“Enterprises should not delay implementation of Big Data Analytics.”
“Use Hadoop to gain a competitive advantage over more risk-averse enterprises.”

Forrester
Research

“Prioritize Big Data projects that might benefit from Hadoop.”

www.edureka.in/hadoop-admin
Some Of the Hadoop Users

www.edureka.in/hadoop-admin
Hadoop Users – In Detail

http://wiki.apache.org/hadoop/PoweredBy

www.edureka.in/hadoop-admin
What Is Hadoop?
 Apache Hadoop is a framework that allows for the distributed processing of large data sets
across clusters of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage & distributed processing.

www.edureka.in/hadoop-admin
Hadoop Key Characteristics

Reliable

Flexible

Hadoop

Features

Economical

Scalable

www.edureka.in/hadoop-admin
Hadoop History
Fastest sort of a TB, 3.5mins
over 910 nodes
Doug Cutting adds DFS &
MapReduce support to Nutch

Doug Cutting & Mike Cafarella
started working on Nutch

2002

2003

Fastest sort of a TB,
62secs over 1,460 nodes
Sorted a PB in 16.25hours
Over 3.658 nodes

NY Times converts 4TB of
Image archives over 100 EC2s

2004

Google publishes GFS &
MapReduce papers

2005

2006

2007

2008

Yahoo! hires Cutting,
Hadoop spins out of Nutch
Facebook launches Hive:
SQL Support for Hadoop

2009

Founded

Doug Cutting
Joins Cloudera

Hadoop Summit 2009,
750 attendees

www.edureka.in/hadoop-admin
Hadoop 1.x Eco-System
Apache Oozie (Workflow)
Hive

Pig Latin

DW System

Data Analysis

Mahout
Machine Learning

MapReduce Framework

HBase
HDFS (Hadoop Distributed File System)
Flume

Sqoop
Import Or Export

Unstructured or
Semi-Structured data

Structured Data
www.edureka.in/hadoop-admin
Hadoop 1.x Core Components
Hadoop is a system for large scale data processing.
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)

 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.

 Additional Administration
Tools:
 Filesystem utilities
 Job scheduling and monitoring
 Web UI

 MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-Healing, High Bandwidth
 Clustered storage
 JobTracker manages the TaskTrackers
www.edureka.in/hadoop-admin
Hadoop 1.x Core Components (Contd.)

MapReduce
Engine

Job Tracker

Task
Tracker

Task
Tracker

Task
Tracker

Task
Tracker

HDFS
Cluster

Admin Node
Name node

Data Node

Data Node

Data Node

Data Node

www.edureka.in/hadoop-admin
Name Node and Data Nodes
 NameNode:
 master of the system
 maintains and manages the blocks which are present on the
DataNodes

 DataNodes:
 slaves which are deployed on each machine and provide the
actual storage
 responsible for serving read and write requests for the clients

www.edureka.in/hadoop-admin
Secondary Name Node
metadata

 Secondary NameNode:

NameNode

Single Point
Failure

 Not a hot standby for the NameNode
 Connects to NameNode every hour*

 Housekeeping, backup of NemeNode metadata
 Saved metadata can build a failed NameNode

Secondary
NameNode

You give me
metadata every
hour, I will make
it secure

metadata
www.edureka.in/hadoop-admin
What Is MapReduce?
 MapReduce is a programming model
 It is neither platform- nor language-specific
 Record-oriented data processing (key and value)
 Task distributed across multiple nodes

Key

Value

 Where possible, each node processes data
stored on that node

 Consists of two phases

MapReduce

 Map
 Reduce

www.edureka.in/hadoop-admin
What Is MapReduce? (Contd.)
Process can be considered as being similar to a Unix pipeline

cat /my/log | grep '.html' | sort | uniq –c > /my/outfile

MAP

SORT

REDUCE

www.edureka.in/hadoop-admin
Hadoop 1.x – In Summary
Client

HDFS
Name Node

Data Node

Data Node

Map Reduce
Secondary
Name Node

Job Tracker

Task Tracker
Map

Reduce

Task Tracker
Map

Reduce

….
Data
Blocks
www.edureka.in/hadoop-admin
Poll Questions

www.edureka.in/hadoop-admin
Hadoop Cluster Administrator
Roles and Responsibilities
 Deploying the cluster
 Performance and availability of the cluster
 Job scheduling and Management
 Upgrades
 Backup and Recovery
 Monitoring the cluster
 Troubleshooting

www.edureka.in/hadoop-admin
Hadoop 1.0 Vs. Hadoop 2.0

Property

Hadoop 1.x

Hadoop 2.x

NameNodes

1

Many

High Availability

Not present

Highly Available

Processing Control

JobTracker, Task Tracker

Resource Manager, Node
Manager, App Master

www.edureka.in/hadoop-admin
MRv1 Vs. MRv2
Hadoop 1.0

Hadoop 2.0
MapReduce
(data processing)

MapReduce
(Data Processing)
Job Tracker

HDFS
(Data Storage)

 Problems with Resource utilization
 Slots only for Map and Reduce

Others
(data Processing)

Data Node
YARN

(Cluster Resource Management)
Scheduler

Applications
Manager (AsM)

HDFS
(Data Storage)
 Provides a Cluster Level Resource Manager
 Application Level Resource Management (Node
Manager??)
 Provides slots for Jobs other than Map and Reduce
www.edureka.in/hadoop-admin
Hadoop 2.0 - Architecture
HDFS
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Secondary
Name Node

Shared
edit logs

YARN

Read edit logs and applies
to its own namespace
Standby
NameNode

Active
NameNode

Data Node

Client

Data Node

Resource Manager

Node Manager
Container

Node Manager
Container

App
Master

Node Manager
Container

App
Master

Data Node

Node Manager
Container

App
Master

Data Node

App
Master

www.edureka.in/hadoop-admin
Assignments
 Attempt the following Assignments using the documents present in the LMS:
 Apache Hadoop 1.0 Installation on Ubuntu in Pseudo-Distributed Mode
 Execute Linux Basic Commands
 Execute HDFS Hands On
 Cloudera CDH3 and CDH4 Quick VM installation on your local machine

www.edureka.in/hadoop-admin
Thank You
See You in Class Next Week

Contenu connexe

Tendances

Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsEdureka!
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin trainingArun Kumar
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 

Tendances (20)

Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 

Similaire à Hadoop Adminstration with Latest Release (2.0)

Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 

Similaire à Hadoop Adminstration with Latest Release (2.0) (20)

Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 

Plus de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Plus de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Dernier

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 

Dernier (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 

Hadoop Adminstration with Latest Release (2.0)

  • 2. How It Works…  LIVE classes  Class recordings  Module wise Quizzes and Practical Assignments  24x7 on-demand technical support  Deployment of different clusters  Online certification exam  Lifetime access to the Learning Management System www.edureka.in/hadoop-admin
  • 3. Course Topics  Week 1 – – – Understanding Big Data Hadoop Components Introduction to Hadoop 2.0  Week 2 – – – Hadoop 2.0 Hadoop Configuration Hadoop Cluster Architecture  Week 3 – – – Different Hadoop Server Roles Data processing flow Cluster Network Configuration  Week 4 – – – Job Scheduling Fair Scheduler Monitoring a Hadoop Cluster  Week 5 – – – Securing your Hadoop Cluster Kerberos and HDFS Federation Backup and Recovery  Week 6 – – – Oozie and Hive Administration HBase Architecture HBase Administration www.edureka.in/hadoop-admin
  • 4. Topics for Today  What is Big Data?  Limitations of the existing solutions  Solving the problem with Hadoop  Introduction to Hadoop  Hadoop Eco-System  Hadoop Core Components  MapReduce software framework  Hadoop Cluster Administrator: Roles and Responsibilities  Introduction to Hadoop 2.0 www.edureka.in/hadoop-admin
  • 5. What Is Big Data?  Lots of Data (Terabytes or Petabytes).  Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information. A airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time. NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades. www.edureka.in/hadoop-admin
  • 6. IBM’s Definition  IBM’s definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Characteristics Of Big Data Volume Velocity Variety 12 Terabytes of tweets created each day Scrutinizes 5 million trade events created each day to identify potential fraud Sensor data, audio, video, cli ck streams, log files and more www.edureka.in/hadoop-admin
  • 7. Data Volume Is Growing Exponentially  Estimated Global Data Volume:  2011: 1.8 ZB  2015: 7.9 ZB  The world's information doubles every two years  Over the next 10 years:  The number of servers worldwide will grow by 10x  Amount of information managed by enterprise data centers will grow by 50x  Number of “files” enterprise data center handle will grow by 75x Source: http://www.emc.com/leadership/programs/digital-universe.htm, which was based on the 2011 IDC Digital Universe Study www.edureka.in/hadoop-admin
  • 8. What Big Companies Have To Say… “Analyzing Big Data sets will become a key basis for competition.” McKinsey “Leaders in every sector will have to grapple the implications of Big Data.” “Big Data analytics are rapidly emerging as the preferred solution to business and technology trends that are disrupting.” Gartner “Enterprises should not delay implementation of Big Data Analytics.” “Use Hadoop to gain a competitive advantage over more risk-averse enterprises.” Forrester Research “Prioritize Big Data projects that might benefit from Hadoop.” www.edureka.in/hadoop-admin
  • 9. Some Of the Hadoop Users www.edureka.in/hadoop-admin
  • 10. Hadoop Users – In Detail http://wiki.apache.org/hadoop/PoweredBy www.edureka.in/hadoop-admin
  • 11. What Is Hadoop?  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing. www.edureka.in/hadoop-admin
  • 13. Hadoop History Fastest sort of a TB, 3.5mins over 910 nodes Doug Cutting adds DFS & MapReduce support to Nutch Doug Cutting & Mike Cafarella started working on Nutch 2002 2003 Fastest sort of a TB, 62secs over 1,460 nodes Sorted a PB in 16.25hours Over 3.658 nodes NY Times converts 4TB of Image archives over 100 EC2s 2004 Google publishes GFS & MapReduce papers 2005 2006 2007 2008 Yahoo! hires Cutting, Hadoop spins out of Nutch Facebook launches Hive: SQL Support for Hadoop 2009 Founded Doug Cutting Joins Cloudera Hadoop Summit 2009, 750 attendees www.edureka.in/hadoop-admin
  • 14. Hadoop 1.x Eco-System Apache Oozie (Workflow) Hive Pig Latin DW System Data Analysis Mahout Machine Learning MapReduce Framework HBase HDFS (Hadoop Distributed File System) Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data www.edureka.in/hadoop-admin
  • 15. Hadoop 1.x Core Components Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  Distributed across “nodes”  Natively redundant  NameNode tracks locations.  Additional Administration Tools:  Filesystem utilities  Job scheduling and monitoring  Web UI  MapReduce (Processing)  Splits a task across processors  “near” the data & assembles results  Self-Healing, High Bandwidth  Clustered storage  JobTracker manages the TaskTrackers www.edureka.in/hadoop-admin
  • 16. Hadoop 1.x Core Components (Contd.) MapReduce Engine Job Tracker Task Tracker Task Tracker Task Tracker Task Tracker HDFS Cluster Admin Node Name node Data Node Data Node Data Node Data Node www.edureka.in/hadoop-admin
  • 17. Name Node and Data Nodes  NameNode:  master of the system  maintains and manages the blocks which are present on the DataNodes  DataNodes:  slaves which are deployed on each machine and provide the actual storage  responsible for serving read and write requests for the clients www.edureka.in/hadoop-admin
  • 18. Secondary Name Node metadata  Secondary NameNode: NameNode Single Point Failure  Not a hot standby for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NemeNode metadata  Saved metadata can build a failed NameNode Secondary NameNode You give me metadata every hour, I will make it secure metadata www.edureka.in/hadoop-admin
  • 19. What Is MapReduce?  MapReduce is a programming model  It is neither platform- nor language-specific  Record-oriented data processing (key and value)  Task distributed across multiple nodes Key Value  Where possible, each node processes data stored on that node  Consists of two phases MapReduce  Map  Reduce www.edureka.in/hadoop-admin
  • 20. What Is MapReduce? (Contd.) Process can be considered as being similar to a Unix pipeline cat /my/log | grep '.html' | sort | uniq –c > /my/outfile MAP SORT REDUCE www.edureka.in/hadoop-admin
  • 21. Hadoop 1.x – In Summary Client HDFS Name Node Data Node Data Node Map Reduce Secondary Name Node Job Tracker Task Tracker Map Reduce Task Tracker Map Reduce …. Data Blocks www.edureka.in/hadoop-admin
  • 23. Hadoop Cluster Administrator Roles and Responsibilities  Deploying the cluster  Performance and availability of the cluster  Job scheduling and Management  Upgrades  Backup and Recovery  Monitoring the cluster  Troubleshooting www.edureka.in/hadoop-admin
  • 24. Hadoop 1.0 Vs. Hadoop 2.0 Property Hadoop 1.x Hadoop 2.x NameNodes 1 Many High Availability Not present Highly Available Processing Control JobTracker, Task Tracker Resource Manager, Node Manager, App Master www.edureka.in/hadoop-admin
  • 25. MRv1 Vs. MRv2 Hadoop 1.0 Hadoop 2.0 MapReduce (data processing) MapReduce (Data Processing) Job Tracker HDFS (Data Storage)  Problems with Resource utilization  Slots only for Map and Reduce Others (data Processing) Data Node YARN (Cluster Resource Management) Scheduler Applications Manager (AsM) HDFS (Data Storage)  Provides a Cluster Level Resource Manager  Application Level Resource Management (Node Manager??)  Provides slots for Jobs other than Map and Reduce www.edureka.in/hadoop-admin
  • 26. Hadoop 2.0 - Architecture HDFS All name space edits logged to shared NFS storage; single writer (fencing) Secondary Name Node Shared edit logs YARN Read edit logs and applies to its own namespace Standby NameNode Active NameNode Data Node Client Data Node Resource Manager Node Manager Container Node Manager Container App Master Node Manager Container App Master Data Node Node Manager Container App Master Data Node App Master www.edureka.in/hadoop-admin
  • 27. Assignments  Attempt the following Assignments using the documents present in the LMS:  Apache Hadoop 1.0 Installation on Ubuntu in Pseudo-Distributed Mode  Execute Linux Basic Commands  Execute HDFS Hands On  Cloudera CDH3 and CDH4 Quick VM installation on your local machine www.edureka.in/hadoop-admin
  • 28. Thank You See You in Class Next Week

Notes de l'éditeur

  1. NameNode Single Point of Failure
  2. Any production cluster larger than 20-30 nodes requires a full time admin. This admin is responsible for:- the performance and availability of the cluster, the data it contains, and the jobs that run there.- deployment, upgrades, troubleshooting, configuration, tuning, job management, installing tools, architecting processes, monitoring, backups, recovery, etc. There is not a single organization with production Hadoop cluster that didn’t have a full-time admin. The fact that Cloudera is offering Hadoop Administrator Certification and that O’Reilly is selling a book called “Hadoop Operations” demonstrates the importance of a Hadoop Cluster Administrator role in industry.