SlideShare une entreprise Scribd logo
1  sur  46
TIB Academy,
5/3, Varathur Road, Kundalahalli Gate,
Bangalore-560066.
+91-9513332301 / 02 www.tibacademy.in
 Open source software framework designed
for storage and processing of large scale
data on clusters of commodity hardware
 Created by Doug Cutting and Mike Carafella
in 2005.
 Cutting named the program after his son’s
toy elephant.
 Data-intensive text processing
 Assembly of large genomes
 Graph mining
 Machine learning and data mining
 Large scale social network analysis
• Contains Libraries and other
modules
Hadoop
Common
• Hadoop Distributed File SystemHDFS
• Yet Another Resource
Negotiator
Hadoop
YARN
• A programming model for large
scale data processing
Hadoop
MapReduce
 What were the limitations of earlier large-
scale computing?
 What requirements should an alternative
approach have?
 How does Hadoop address those
requirements?
 Historically computation was processor-
bound
› Data volume has been relatively small
› Complicated computations are performed on that
data
 Advances in computer technology has
historically centered around improving the
power of a single machine
 Moore’s Law
› The number of transistors on a dense integrated
circuit doubles every two years
 Single-core computing can’t scale with
current computing needs
 Power consumption limits the speed
increase we get from transistor density
 Allows developers
to use multiple
machines for a
single task
 Programming on a distributed system is
much more complex
› Synchronizing data exchanges
› Managing a finite bandwidth
› Controlling computation timing is complicated
“You know you have a distributed system when
the crash of a computer you’ve never
heard of stops you from getting any work
done.” –Leslie Lamport
 Distributed systems must be designed with
the expectation of failure
 Typically divided into Data Nodes and
Compute Nodes
 At compute time, data is copied to the
Compute Nodes
 Fine for relatively small amounts of data
 Modern systems deal with far more data
than was gathering in the past
 Facebook
› 500 TB per day
 Yahoo
› Over 170 PB
 eBay
› Over 6 PB
 Getting the data to the processors becomes
the bottleneck
 Must support partial
failure
 Must be scalable
 Failure of a single component must not cause
the failure of the entire system only a
degradation of the application performance
 Failure should not
result in the loss of
any data
 If a component fails, it should be able to
recover without restarting the entire system
 Component failure or recovery during a job
must not affect the final output
 Increasing resources should increase load
capacity
 Increasing the load on the system should
result in a graceful decline in performance
for all jobs
› Not system failure
 Based on work done by Google in the early
2000s
› “The Google File System” in 2003
› “MapReduce: Simplified Data Processing on
Large Clusters” in 2004
 The core idea was to distribute the data as it
is initially stored
› Each node can then perform computation on the
data it stores without moving the data for the
initial processing
 Applications are written in a high-level
programming language
› No network programming or temporal dependency
 Nodes should communicate as little as possible
› A “shared nothing” architecture
 Data is spread among the machines in advance
› Perform computation where the data is already
stored as often as possible
 When data is loaded onto the system it is
divided into blocks
› Typically 64MB or 128MB
 Tasks are divided into two phases
› Map tasks which are done on small portions of data
where the data is stored
› Reduce tasks which combine data to produce the
final output
 A master program allocates work to individual
nodes
 Failures are detected by the master program
which reassigns the work to a different node
 Restarting a task does not affect the nodes
working on other portions of the data
 If a failed node restarts, it is added back to the
system and assigned new tasks
 The master can redundantly execute the same
task to avoid slow running nodes
 Responsible for storing data on the cluster
 Data files are split into blocks and distributed
across the nodes in the cluster
 Each block is replicated multiple times
 HDFS is a file system written in Java based
on the Google’s GFS
 Provides redundant storage for massive
amounts of data
 HDFS works best with a smaller number of
large files
› Millions as opposed to billions of files
› Typically 100MB or more per file
 Files in HDFS are write once
 Optimized for streaming reads of large files
and not random reads
 Files are split into blocks
 Blocks are split across many machines at load
time
› Different blocks from the same file will be stored on
different machines
 Blocks are replicated across multiple machines
 The NameNode keeps track of which blocks
make up a file and where they are stored
 Default replication is 3-fold
 When a client wants to retrieve data
› Communicates with the NameNode to determine
which blocks make up a file and on which data
nodes those blocks are stored
› Then communicated directly with the data nodes
to read the data
 A method for distributing computation across
multiple nodes
 Each node processes the data that is stored at
that node
 Consists of two main phases
› Map
› Reduce
 Automatic parallelization and distribution
 Fault-Tolerance
 Provides a clean abstraction for
programmers to use
 Reads data as key/value pairs
› The key is often discarded
 Outputs zero or more key/value pairs
 Output from the mapper is sorted by key
 All values with the same key are guaranteed
to go to the same machine
 Called once for each unique key
 Gets a list of all values associated with a key
as input
 The reducer outputs zero or more final
key/value pairs
› Usually just one output per input key
 NameNode
› Holds the metadata for the HDFS
 Secondary NameNode
› Performs housekeeping functions for the NameNode
 DataNode
› Stores the actual HDFS data blocks
 JobTracker
› Manages MapReduce jobs
 TaskTracker
› Monitors individual Map and Reduce tasks
 Stores the HDFS file system information in a
fsimage
 Updates to the file system (add/remove blocks)
do not change the fsimage file
› They are instead written to a log file
 When starting the NameNode loads the fsimage
file and then applies the changes in the log file
 NOT a backup for the NameNode
 Periodically reads the log file and applies the
changes to the fsimage file bringing it up to
date
 Allows the NameNode to restart faster when
required
 JobTracker
› Determines the execution plan for the job
› Assigns individual tasks
 TaskTracker
› Keeps track of the performance of an individual
mapper or reducer
 MapReduce is very powerful, but can be
awkward to master
 These tools allow programmers who are
familiar with other programming styles to
take advantage of the power of MapReduce
 Hive
› Hadoop processing with SQL
 Pig
› Hadoop processing with scripting
 Cascading
› Pipe and Filter processing model
 HBase
› Database model built on top of Hadoop
 Flume
› Designed for large scale data movement

Contenu connexe

Tendances

Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the roomcacois
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentalsits_skm
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoopAbhi Goyan
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 

Tendances (20)

Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop
HadoopHadoop
Hadoop
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop-2.6.0 Slides
Hadoop-2.6.0 SlidesHadoop-2.6.0 Slides
Hadoop-2.6.0 Slides
 

En vedette

Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free DownloadTIB Academy
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

En vedette (8)

Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free Download
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similaire à Hadoop tutorial for beginners-tibacademy.in

Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabadKelly Technologies
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangaloreKelly Technologies
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Distributed processing
Distributed processingDistributed processing
Distributed processingNeil Stein
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 

Similaire à Hadoop tutorial for beginners-tibacademy.in (20)

Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabad
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangalore
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Distributed processing
Distributed processingDistributed processing
Distributed processing
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 

Plus de TIB Academy

AWS Training Institute in Bangalore | Best AWS Course In Bangalore
AWS Training Institute in Bangalore | Best AWS Course In BangaloreAWS Training Institute in Bangalore | Best AWS Course In Bangalore
AWS Training Institute in Bangalore | Best AWS Course In BangaloreTIB Academy
 
MySQL training in Bangalore | Best MySQL Course in Bangalore
MySQL training in Bangalore | Best MySQL Course in BangaloreMySQL training in Bangalore | Best MySQL Course in Bangalore
MySQL training in Bangalore | Best MySQL Course in BangaloreTIB Academy
 
CCNA Training in Bangalore | Best Networking course in Bangalore
CCNA Training in Bangalore | Best Networking course in BangaloreCCNA Training in Bangalore | Best Networking course in Bangalore
CCNA Training in Bangalore | Best Networking course in BangaloreTIB Academy
 
Core Java Training in Bangalore | Best Core Java Class in Bangalore
Core Java Training in Bangalore | Best Core Java Class in BangaloreCore Java Training in Bangalore | Best Core Java Class in Bangalore
Core Java Training in Bangalore | Best Core Java Class in BangaloreTIB Academy
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute TIB Academy
 
Best Hadoop Training in Bangalore - TIB Academy
Best Hadoop Training in Bangalore - TIB AcademyBest Hadoop Training in Bangalore - TIB Academy
Best Hadoop Training in Bangalore - TIB AcademyTIB Academy
 
Selenium training for beginners
Selenium training for beginnersSelenium training for beginners
Selenium training for beginnersTIB Academy
 
TIB Academy provides best Oracal DBA classes in Bangalore
TIB Academy provides best Oracal DBA classes in BangaloreTIB Academy provides best Oracal DBA classes in Bangalore
TIB Academy provides best Oracal DBA classes in BangaloreTIB Academy
 
Aws tutorial for beginners- tibacademy.in
Aws tutorial for beginners- tibacademy.inAws tutorial for beginners- tibacademy.in
Aws tutorial for beginners- tibacademy.inTIB Academy
 
C C++ tutorial for beginners- tibacademy.in
C C++ tutorial for beginners- tibacademy.inC C++ tutorial for beginners- tibacademy.in
C C++ tutorial for beginners- tibacademy.inTIB Academy
 
Java tutorial for beginners-tibacademy.in
Java tutorial for beginners-tibacademy.inJava tutorial for beginners-tibacademy.in
Java tutorial for beginners-tibacademy.inTIB Academy
 
Android tutorial for beginners-traininginbangalore.com
Android tutorial for beginners-traininginbangalore.comAndroid tutorial for beginners-traininginbangalore.com
Android tutorial for beginners-traininginbangalore.comTIB Academy
 
SoapUI Training in Bangalore
SoapUI Training in BangaloreSoapUI Training in Bangalore
SoapUI Training in BangaloreTIB Academy
 
Spring-training-in-bangalore
Spring-training-in-bangaloreSpring-training-in-bangalore
Spring-training-in-bangaloreTIB Academy
 
Salesforce Certification
Salesforce CertificationSalesforce Certification
Salesforce CertificationTIB Academy
 

Plus de TIB Academy (17)

AWS Training Institute in Bangalore | Best AWS Course In Bangalore
AWS Training Institute in Bangalore | Best AWS Course In BangaloreAWS Training Institute in Bangalore | Best AWS Course In Bangalore
AWS Training Institute in Bangalore | Best AWS Course In Bangalore
 
MySQL training in Bangalore | Best MySQL Course in Bangalore
MySQL training in Bangalore | Best MySQL Course in BangaloreMySQL training in Bangalore | Best MySQL Course in Bangalore
MySQL training in Bangalore | Best MySQL Course in Bangalore
 
CCNA Training in Bangalore | Best Networking course in Bangalore
CCNA Training in Bangalore | Best Networking course in BangaloreCCNA Training in Bangalore | Best Networking course in Bangalore
CCNA Training in Bangalore | Best Networking course in Bangalore
 
Core Java Training in Bangalore | Best Core Java Class in Bangalore
Core Java Training in Bangalore | Best Core Java Class in BangaloreCore Java Training in Bangalore | Best Core Java Class in Bangalore
Core Java Training in Bangalore | Best Core Java Class in Bangalore
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute
 
Best Hadoop Training in Bangalore - TIB Academy
Best Hadoop Training in Bangalore - TIB AcademyBest Hadoop Training in Bangalore - TIB Academy
Best Hadoop Training in Bangalore - TIB Academy
 
Selenium training for beginners
Selenium training for beginnersSelenium training for beginners
Selenium training for beginners
 
Python Training
Python TrainingPython Training
Python Training
 
TIB Academy provides best Oracal DBA classes in Bangalore
TIB Academy provides best Oracal DBA classes in BangaloreTIB Academy provides best Oracal DBA classes in Bangalore
TIB Academy provides best Oracal DBA classes in Bangalore
 
Aws tutorial for beginners- tibacademy.in
Aws tutorial for beginners- tibacademy.inAws tutorial for beginners- tibacademy.in
Aws tutorial for beginners- tibacademy.in
 
C C++ tutorial for beginners- tibacademy.in
C C++ tutorial for beginners- tibacademy.inC C++ tutorial for beginners- tibacademy.in
C C++ tutorial for beginners- tibacademy.in
 
Java tutorial for beginners-tibacademy.in
Java tutorial for beginners-tibacademy.inJava tutorial for beginners-tibacademy.in
Java tutorial for beginners-tibacademy.in
 
Android tutorial for beginners-traininginbangalore.com
Android tutorial for beginners-traininginbangalore.comAndroid tutorial for beginners-traininginbangalore.com
Android tutorial for beginners-traininginbangalore.com
 
SoapUI Training in Bangalore
SoapUI Training in BangaloreSoapUI Training in Bangalore
SoapUI Training in Bangalore
 
R programming
R programmingR programming
R programming
 
Spring-training-in-bangalore
Spring-training-in-bangaloreSpring-training-in-bangalore
Spring-training-in-bangalore
 
Salesforce Certification
Salesforce CertificationSalesforce Certification
Salesforce Certification
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Dernier (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Hadoop tutorial for beginners-tibacademy.in

  • 1. TIB Academy, 5/3, Varathur Road, Kundalahalli Gate, Bangalore-560066. +91-9513332301 / 02 www.tibacademy.in
  • 2.  Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and Mike Carafella in 2005.  Cutting named the program after his son’s toy elephant.
  • 3.  Data-intensive text processing  Assembly of large genomes  Graph mining  Machine learning and data mining  Large scale social network analysis
  • 4.
  • 5. • Contains Libraries and other modules Hadoop Common • Hadoop Distributed File SystemHDFS • Yet Another Resource Negotiator Hadoop YARN • A programming model for large scale data processing Hadoop MapReduce
  • 6.
  • 7.  What were the limitations of earlier large- scale computing?  What requirements should an alternative approach have?  How does Hadoop address those requirements?
  • 8.  Historically computation was processor- bound › Data volume has been relatively small › Complicated computations are performed on that data  Advances in computer technology has historically centered around improving the power of a single machine
  • 9.
  • 10.  Moore’s Law › The number of transistors on a dense integrated circuit doubles every two years  Single-core computing can’t scale with current computing needs
  • 11.  Power consumption limits the speed increase we get from transistor density
  • 12.  Allows developers to use multiple machines for a single task
  • 13.  Programming on a distributed system is much more complex › Synchronizing data exchanges › Managing a finite bandwidth › Controlling computation timing is complicated
  • 14. “You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done.” –Leslie Lamport  Distributed systems must be designed with the expectation of failure
  • 15.  Typically divided into Data Nodes and Compute Nodes  At compute time, data is copied to the Compute Nodes  Fine for relatively small amounts of data  Modern systems deal with far more data than was gathering in the past
  • 16.  Facebook › 500 TB per day  Yahoo › Over 170 PB  eBay › Over 6 PB  Getting the data to the processors becomes the bottleneck
  • 17.  Must support partial failure  Must be scalable
  • 18.  Failure of a single component must not cause the failure of the entire system only a degradation of the application performance  Failure should not result in the loss of any data
  • 19.  If a component fails, it should be able to recover without restarting the entire system  Component failure or recovery during a job must not affect the final output
  • 20.  Increasing resources should increase load capacity  Increasing the load on the system should result in a graceful decline in performance for all jobs › Not system failure
  • 21.  Based on work done by Google in the early 2000s › “The Google File System” in 2003 › “MapReduce: Simplified Data Processing on Large Clusters” in 2004  The core idea was to distribute the data as it is initially stored › Each node can then perform computation on the data it stores without moving the data for the initial processing
  • 22.  Applications are written in a high-level programming language › No network programming or temporal dependency  Nodes should communicate as little as possible › A “shared nothing” architecture  Data is spread among the machines in advance › Perform computation where the data is already stored as often as possible
  • 23.  When data is loaded onto the system it is divided into blocks › Typically 64MB or 128MB  Tasks are divided into two phases › Map tasks which are done on small portions of data where the data is stored › Reduce tasks which combine data to produce the final output  A master program allocates work to individual nodes
  • 24.  Failures are detected by the master program which reassigns the work to a different node  Restarting a task does not affect the nodes working on other portions of the data  If a failed node restarts, it is added back to the system and assigned new tasks  The master can redundantly execute the same task to avoid slow running nodes
  • 25.
  • 26.  Responsible for storing data on the cluster  Data files are split into blocks and distributed across the nodes in the cluster  Each block is replicated multiple times
  • 27.  HDFS is a file system written in Java based on the Google’s GFS  Provides redundant storage for massive amounts of data
  • 28.  HDFS works best with a smaller number of large files › Millions as opposed to billions of files › Typically 100MB or more per file  Files in HDFS are write once  Optimized for streaming reads of large files and not random reads
  • 29.  Files are split into blocks  Blocks are split across many machines at load time › Different blocks from the same file will be stored on different machines  Blocks are replicated across multiple machines  The NameNode keeps track of which blocks make up a file and where they are stored
  • 31.  When a client wants to retrieve data › Communicates with the NameNode to determine which blocks make up a file and on which data nodes those blocks are stored › Then communicated directly with the data nodes to read the data
  • 32.
  • 33.  A method for distributing computation across multiple nodes  Each node processes the data that is stored at that node  Consists of two main phases › Map › Reduce
  • 34.  Automatic parallelization and distribution  Fault-Tolerance  Provides a clean abstraction for programmers to use
  • 35.  Reads data as key/value pairs › The key is often discarded  Outputs zero or more key/value pairs
  • 36.  Output from the mapper is sorted by key  All values with the same key are guaranteed to go to the same machine
  • 37.  Called once for each unique key  Gets a list of all values associated with a key as input  The reducer outputs zero or more final key/value pairs › Usually just one output per input key
  • 38.
  • 39.
  • 40.  NameNode › Holds the metadata for the HDFS  Secondary NameNode › Performs housekeeping functions for the NameNode  DataNode › Stores the actual HDFS data blocks  JobTracker › Manages MapReduce jobs  TaskTracker › Monitors individual Map and Reduce tasks
  • 41.  Stores the HDFS file system information in a fsimage  Updates to the file system (add/remove blocks) do not change the fsimage file › They are instead written to a log file  When starting the NameNode loads the fsimage file and then applies the changes in the log file
  • 42.  NOT a backup for the NameNode  Periodically reads the log file and applies the changes to the fsimage file bringing it up to date  Allows the NameNode to restart faster when required
  • 43.  JobTracker › Determines the execution plan for the job › Assigns individual tasks  TaskTracker › Keeps track of the performance of an individual mapper or reducer
  • 44.
  • 45.  MapReduce is very powerful, but can be awkward to master  These tools allow programmers who are familiar with other programming styles to take advantage of the power of MapReduce
  • 46.  Hive › Hadoop processing with SQL  Pig › Hadoop processing with scripting  Cascading › Pipe and Filter processing model  HBase › Database model built on top of Hadoop  Flume › Designed for large scale data movement

Notes de l'éditeur

  1. Example of failure issues. Linux lab is distributed file system, if the file server fails, what happens.
  2. Example of map and reduce
  3. Default replication is 3-fold