SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
Page 1 of 4
Big Data – Apache Hadoop Administrator Training
Objective
This training aims to provide the participants with a comprehensive understanding
of all the steps necessary to operate and maintain a Hadoop cluster. From
Installation and configuration through load-balancing and tuning.
The participants will learn the complete Installation of Hadoop Cluster, understand
the basic and advanced concepts of Map Reduce and the best practices for Apache
Hadoop Development as experienced by the developers and architects of core
Apache Hadoop. With the help of hands-on exercises, participants will learn the
following topics during the course.
1. The internals of MapReduce and HDFS and how to build Hadoop
Architecture.
2. Proper cluster configuration and deployment to integrate with systems
and hardware in data centre.
3. How to load data into cluster from dynamically-generated files using
Flume and from RDBMS using Sqoop.
4. Configuring the FairScheduler to provide service-level agreements for
multiple users of a cluster.
5. Discussing Kerberos-based security for your cluster.
6. Best practices for preparing and maintaining Apache Hadoop in
production.
7. Troubleshooting, diagnosing, tuning and solving Hadoop issues.
Note: The course will be have 20% of theoretical discussion and 80% of actual
hands on
Audience & Pre-Requisites
This course is designed for Systems Administrators and IT Managers who have
basic Linux experience. No need for prior knowledge of Apache Hadoop.
Duration: 30 hours
Course Outline
• Introduction
• The Case for Apache Hadoop
o A Brief History of Hadoop
Page 2 of 4
o Core Hadoop Components
o Fundamental Concepts
• The Hadoop Distributed File System
o HDFS Features
o HDFS Design Assumptions
o Overview of HDFS Architecture
• MapReduce and YARN
o What Is MapReduce?
o Features of MapReduce
o Basic MapReduce Concepts
o Architectural Overview
o Hands-On Exercise
• An Overview of the Hadoop Ecosystem
o What is the Hadoop Ecosystem?
o Analysis Tools
o Data Storage and Retrieval Tools
• Overview of Cloudera Distributions of Hadoop
o What is CDH?
• Overview of Hortonworks Distributions of Hadoop
• Planning your Hadoop Cluster
o General planning Considerations
o Choosing the Right Hardware
o Network Considerations
• Gen1 – Pseudo and 4 Node Cluster -Vanilla Hadoop
o Installation
o Configuration
o Performance Aspects
• Installation a 4 Node with NN, SNN, JT in EC2
• Hadoop Installation
o Deployment Types
o Installing Hadoop
o Basic Configuration Parameters
o Hands-On Exercise
Page 3 of 4
• Advanced Configuration
o Advanced Parameters
o Configuring Rack Awareness
• Hadoop Security
o Why Hadoop Security Is Important
o Hadoop’ s Security System Concepts
o What Kerberos Is and How it Works
• Gen2 Pseudo Cluster – Vanilla Cluster
o Installation of Hadoop
o Hadoop 2 Configuration
o Hadoop Federation Capability
• Configuring HA in Gen2
• Configuring Federation in Gen2
Managing and Scheduling Jobs
o Managing Running Jobs
o Hands-On Exercise
o The Capacity Scheduler
• Cluster Maintenance
o Checking HDFS Status
o Hands-On Exercise
o Copying Data Between Clusters
o Adding and Removing Cluster Nodes [ Node Maintenance]
o Rebalancing the Cluster
o Hands-On Exercise
o NameNode Metadata Backup
o Cluster Upgrading
o User Management
o Quota Management
• Cluster Monitoring and Troubleshooting
o General System Monitoring
o Managing Hadoop’ s Log Files
o Using the NameNode and JobTracker Web UIs
o Hands-On Exercise
o Cluster Monitoring with Ganglia
o Common Troubleshooting Issues
o Benchmarking Your Cluster
Page 4 of 4
• Installing and Managing Other Hadoop Projects
o Hive
o Pig
o Sqoop
• Working with Apache Ambari
o Installation of a 4 Node cluster
o Web HDFS
o Security in Ambari
o Adding new host via Ambari
o Configuring Capacity Scheduler
o Mounting HDFS
o HDFS Snapshots

Contenu connexe

Tendances

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Edureka!
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Rameez Rangrez_Hadoop_Admin
Rameez Rangrez_Hadoop_AdminRameez Rangrez_Hadoop_Admin
Rameez Rangrez_Hadoop_AdminRameez Rangrez
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡youngick
 

Tendances (20)

Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Anju
AnjuAnju
Anju
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Rameez Rangrez_Hadoop_Admin
Rameez Rangrez_Hadoop_AdminRameez Rangrez_Hadoop_Admin
Rameez Rangrez_Hadoop_Admin
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 

En vedette

Owez_IBM_Hadoop_Admin
Owez_IBM_Hadoop_AdminOwez_IBM_Hadoop_Admin
Owez_IBM_Hadoop_AdminOwez Mujawar
 
Vijay_hadoop admin
Vijay_hadoop adminVijay_hadoop admin
Vijay_hadoop adminvijay vijay
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

En vedette (10)

Owez_IBM_Hadoop_Admin
Owez_IBM_Hadoop_AdminOwez_IBM_Hadoop_Admin
Owez_IBM_Hadoop_Admin
 
Vijay_hadoop admin
Vijay_hadoop adminVijay_hadoop admin
Vijay_hadoop admin
 
CV_SONU..
CV_SONU..CV_SONU..
CV_SONU..
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Archana Jaiswal Resume
Archana Jaiswal ResumeArchana Jaiswal Resume
Archana Jaiswal Resume
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similaire à Hadoop_Admin_eVenkat

Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-trainingKnowledgehut
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdfManoel Ribeiro
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfSpiritsoftsTraining
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoopKnowledgehut
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course contentRS Trainings
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 

Similaire à Hadoop_Admin_eVenkat (20)

Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Manoj CV
Manoj CVManoj CV
Manoj CV
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdf
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course content
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 

Hadoop_Admin_eVenkat

  • 1. Page 1 of 4 Big Data – Apache Hadoop Administrator Training Objective This training aims to provide the participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. From Installation and configuration through load-balancing and tuning. The participants will learn the complete Installation of Hadoop Cluster, understand the basic and advanced concepts of Map Reduce and the best practices for Apache Hadoop Development as experienced by the developers and architects of core Apache Hadoop. With the help of hands-on exercises, participants will learn the following topics during the course. 1. The internals of MapReduce and HDFS and how to build Hadoop Architecture. 2. Proper cluster configuration and deployment to integrate with systems and hardware in data centre. 3. How to load data into cluster from dynamically-generated files using Flume and from RDBMS using Sqoop. 4. Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster. 5. Discussing Kerberos-based security for your cluster. 6. Best practices for preparing and maintaining Apache Hadoop in production. 7. Troubleshooting, diagnosing, tuning and solving Hadoop issues. Note: The course will be have 20% of theoretical discussion and 80% of actual hands on Audience & Pre-Requisites This course is designed for Systems Administrators and IT Managers who have basic Linux experience. No need for prior knowledge of Apache Hadoop. Duration: 30 hours Course Outline • Introduction • The Case for Apache Hadoop o A Brief History of Hadoop
  • 2. Page 2 of 4 o Core Hadoop Components o Fundamental Concepts • The Hadoop Distributed File System o HDFS Features o HDFS Design Assumptions o Overview of HDFS Architecture • MapReduce and YARN o What Is MapReduce? o Features of MapReduce o Basic MapReduce Concepts o Architectural Overview o Hands-On Exercise • An Overview of the Hadoop Ecosystem o What is the Hadoop Ecosystem? o Analysis Tools o Data Storage and Retrieval Tools • Overview of Cloudera Distributions of Hadoop o What is CDH? • Overview of Hortonworks Distributions of Hadoop • Planning your Hadoop Cluster o General planning Considerations o Choosing the Right Hardware o Network Considerations • Gen1 – Pseudo and 4 Node Cluster -Vanilla Hadoop o Installation o Configuration o Performance Aspects • Installation a 4 Node with NN, SNN, JT in EC2 • Hadoop Installation o Deployment Types o Installing Hadoop o Basic Configuration Parameters o Hands-On Exercise
  • 3. Page 3 of 4 • Advanced Configuration o Advanced Parameters o Configuring Rack Awareness • Hadoop Security o Why Hadoop Security Is Important o Hadoop’ s Security System Concepts o What Kerberos Is and How it Works • Gen2 Pseudo Cluster – Vanilla Cluster o Installation of Hadoop o Hadoop 2 Configuration o Hadoop Federation Capability • Configuring HA in Gen2 • Configuring Federation in Gen2 Managing and Scheduling Jobs o Managing Running Jobs o Hands-On Exercise o The Capacity Scheduler • Cluster Maintenance o Checking HDFS Status o Hands-On Exercise o Copying Data Between Clusters o Adding and Removing Cluster Nodes [ Node Maintenance] o Rebalancing the Cluster o Hands-On Exercise o NameNode Metadata Backup o Cluster Upgrading o User Management o Quota Management • Cluster Monitoring and Troubleshooting o General System Monitoring o Managing Hadoop’ s Log Files o Using the NameNode and JobTracker Web UIs o Hands-On Exercise o Cluster Monitoring with Ganglia o Common Troubleshooting Issues o Benchmarking Your Cluster
  • 4. Page 4 of 4 • Installing and Managing Other Hadoop Projects o Hive o Pig o Sqoop • Working with Apache Ambari o Installation of a 4 Node cluster o Web HDFS o Security in Ambari o Adding new host via Ambari o Configuring Capacity Scheduler o Mounting HDFS o HDFS Snapshots