SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
HADOOP ECOSYSTEM
Sandip K. Darwade
MNIT Jaipur
May 27, 2014
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 1 / 29
Outline
Hadoop
Hadoop Ecosystem
HDFS
MapReduce
YARN
Avro
Pig
Hive
HBase
Mahout
Sqoop
ZooKeeper
Chukwa
HCatalog
References
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 2 / 29
What is Hadoop ?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming
models.
Hadoop is best known for MapReduce and its distributed
filesystem (HDFS),and large-scale data processing.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 3 / 29
What is Hadoop Ecosystem ?
Introduction to the world of Hadoop and the core related
software projects. There are countless commercial
Hadoop-integrated products focused on making Hadoop
more usable and layman-accessible, but the ones here
were chosen because they provide core functionality and
speed in Hadoop so called Hadoop Ecosystem.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 4 / 29
Hadoop Ecosystem
Figure : Hadoop Ecosystem Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 5 / 29
HDFS
Hadoop Distributed File System.
Files are stored in HDFS and divided into blocks, which
are then copied to multiple Data Nodes.
Hadoop cluster contains only one NameNode and many
DataNodes.
Data blocks are replicated for High Availability and fast
access.
Figure : HDFS Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 6 / 29
HDFS
NameNode
Run on a separate machine.
Manage the file system namespace,and control access of external
clients.
Store file system Meta-data in memory.
File information, each block information of files, and every file
block information in Data Node .
DataNode
Run on Separate machine,which is the basic unit of file storage.
Sent all messages of existing Blocks periodically to Name Node.
Data Node response read and write request from the Name
Node,and also respond, create, delete, and copy the block
command from Name Node.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 7 / 29
MapReduce
Programming model for data processing.
Hadoop can run MapReduce programs written in various
languages Java,Python.
Parallel Processing,put Mapreduce in very large-scale
data analysis.
Mapper produce intermediate results.
Reducer aggregates the results.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 8 / 29
MapReduce
Files are split into fixed sized blocks and stored on data
nodes (Default 64MB).
Programs written, can process on distributed clusters in
parallel.
Input data is a set of key/value pairs, the output is also
the key/value pairs.
Mainly Two Phase Map and Reduce.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 9 / 29
MapReduce (continue...)
Figure : MapReduce Process Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 10 / 29
MapReduce (continue...)
Map
Map process each block separately in parallel.
Generate an intermediate key/value pairs set.
Results of these logic blocks are reassembled.
Reduce
Accepts an intermediate key and related value.
Processed the intermediate key and value.
Form a set of relatively small value set.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 11 / 29
YARN
YARN (Yet Another Resource Negotiator).
MapReduce 1.0 had issues with scalability, memory usage
and synchronization.
YARN addresses problems with MapReduce 1.0’s
architecture, specifically with the JobTracker service.
YARN splits up the two major functionalities of the
JobTracker, resource management and job
scheduling/monitoring, into separate daemons.
Rather than burdening a single node with handling
scheduling and resource management for the entire
cluster, YARN now distributes this responsibility across
the cluster.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 12 / 29
YARN (continue...)
Figure : Yarn Architecture Via Apache
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 13 / 29
Avro
Avro is a framework for performing remote procedure
calls and data serialization.
It can be used to pass data from one program or language
to another, e.g. from C to Pig.
Suited for use with scripting languages such as Pig
because data is always stored with its schema in Avro and
therefore the data is self-describing.
Avro can also handle changes in schema still preserving
access to the data.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 14 / 29
Pig
Pig is a framework consisting of a high-level scripting
language (Pig Latin).
Run-time environment that allows users to execute
MapReduce on a Hadoop cluster.
Like HiveQL in Hive, Pig Latin is a higher-level language
that compiles to MapReduce.
Pig is more flexible than Hive with respect to possible
data format.
Pig’s data model is similar to the relational data model,
except that tuples (a.k.a. records or rows) can be nested.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 15 / 29
Hive
Apache Hive is a data warehouse infrastructure built on
top of Hadoop for providing data summarization, query
and analysis.
Using Hadoop was not easy for end users those who were
not familiar with MapReduce framework.
A Hive query is converted to MapReduce tasks.
Figure : Hive Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 16 / 29
Hive (continue...)
Building blocks of Hive.
Metastore stores the system catalog and metadata about tables,
columns, partitions, etc.
Driver manages the lifecycle of a HiveQL statement as it moves
through Hive.
Query Compiler compiles HiveQL into a directed acyclic graph for
MapReduce tasks.
Execution Engine executes the tasks produced by the compiler in
proper dependency order.
Hive Server provides a thrift interface and a JDBC/ODBC server.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 17 / 29
HBase
HBase is distributed column-oriented database built on
top of HDFS.
HBase is not relational and does not support SQL, but
given the proper problem space.
It is able to do what an RDBMS cannot.
HBase is modeled with an HBase master node
orchestrating a cluster of one or more regionserver slaves.
HBase master is responsible for bootstrapping a virgin
install, for assigning regions to registered regionservers,
and for recovering regionserver failures.
HBase manages a ZooKeeper instance as the authority on
cluster state.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 18 / 29
HBase (continue...)
Figure : HBase Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 19 / 29
Mahout
Mahout is a scalable machine-learning and data mining
library.
There are currently four main groups of algorithms in
Mahout.
Recommendations, a.k.a. collective filtering.
Classification, a.k.a categorization.
Clustering.
Frequent itemset mining, a.k.a parallel frequent pattern mining.
Mahout is not simply a collection of pre-existing
algorithms.
Algorithms in the Mahout library belong to the subset
that can be executed in a distributed fashion, and have
been written to be executable in MapReduce.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 20 / 29
Mahout (continue...)
Figure : Mahout Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 21 / 29
Sqoop
Sqoop allows easy import and export of data from
structured data stores.
Command-line tool to import any JDBC supported
database into Hadoop.
Generate Writables for use in MapReduce jobs.
High performance connectors for some RDBMS.
Distributed,reliable,available service for efficiently moving
large amount of data as it is produced.
Suited for gathering log from multiple systems.
Inserting them into HDFS as they are generated.
Design Goal : Reliability , Scalability , Manageability,
Extensibility.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 22 / 29
Sqoop (continue...)
Figure : Sqoop Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 23 / 29
ZooKeeper
ZooKeeper is a distributed, open-source coordination
service for distributed applications.
They are especially prone to errors such as race
conditions and deadlock.
Generate Writables for use in MapReduce jobs.
ZooKeeper is to relieve distributed applications the
responsibility of implementing coordination services from
scratch.
ZooKeeper allows distributed processes to coordinate
with each other through a shared hierarchical namespace.
The name space consists of data registers called znodes,
and these are similar to files and directories.
ZooKeeper data is kept in-memory, which means it can
achieve high throughput and low latency numbers.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 24 / 29
ZooKeeper (continue...)
Figure : ZooKeeper Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 25 / 29
Chukwa
Chukwa is a Hadoop subproject devoted to large-scale log
collection and analysis.
Chukwa is built on top of HDFS and MapReduce
framework and inherits Hadoops scalability and
robustness.
Four Components of Chukwa.
Agents that run on each machine and emit data.
Collectors that receive data from the agent and write to a stable storage.
MapReduce jobs for parsing and archiving the data.
HICC, Hadoop Infrastructure Care Center; a web-portal style interface
for displaying data.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 26 / 29
Chukwa (continue...)
Figure : Chukwa Architecture
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 27 / 29
HCatalog
An incubator-level project at Apache.
HCatalog is a metadata and table storage management
service for HDFS.
HCatalog depends on the Hive metastore and exposes it
to other services such as MapReduce and Pig.
HCatalog’s goal is to simplify the user’s interaction with
HDFS data.
Enable data sharing between tools and execution
platforms.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 28 / 29
Bibliography I
G. Yang, “The application of mapreduce in the cloud computing,” Intelligence
Information Processing and Trusted Computing (IPTC) 2011, vol. 9,
pp. 154–156, Oct 2011.
T. White, Hadoop:The Definitive Guide, Third Edition.
1005 Gravenstein Highway North, Sebastopol, CA 95472: OReilly Media, Inc.,
2012.
Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 29 / 29

Contenu connexe

Tendances (20)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Apache hive
Apache hiveApache hive
Apache hive
 

Similaire à Hadoop Ecosystem

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 

Similaire à Hadoop Ecosystem (20)

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
The solution for big data
The solution for big dataThe solution for big data
The solution for big data
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Dernier

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Dernier (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Hadoop Ecosystem

  • 1. HADOOP ECOSYSTEM Sandip K. Darwade MNIT Jaipur May 27, 2014 Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 1 / 29
  • 3. What is Hadoop ? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is best known for MapReduce and its distributed filesystem (HDFS),and large-scale data processing. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 3 / 29
  • 4. What is Hadoop Ecosystem ? Introduction to the world of Hadoop and the core related software projects. There are countless commercial Hadoop-integrated products focused on making Hadoop more usable and layman-accessible, but the ones here were chosen because they provide core functionality and speed in Hadoop so called Hadoop Ecosystem. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 4 / 29
  • 5. Hadoop Ecosystem Figure : Hadoop Ecosystem Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 5 / 29
  • 6. HDFS Hadoop Distributed File System. Files are stored in HDFS and divided into blocks, which are then copied to multiple Data Nodes. Hadoop cluster contains only one NameNode and many DataNodes. Data blocks are replicated for High Availability and fast access. Figure : HDFS Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 6 / 29
  • 7. HDFS NameNode Run on a separate machine. Manage the file system namespace,and control access of external clients. Store file system Meta-data in memory. File information, each block information of files, and every file block information in Data Node . DataNode Run on Separate machine,which is the basic unit of file storage. Sent all messages of existing Blocks periodically to Name Node. Data Node response read and write request from the Name Node,and also respond, create, delete, and copy the block command from Name Node. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 7 / 29
  • 8. MapReduce Programming model for data processing. Hadoop can run MapReduce programs written in various languages Java,Python. Parallel Processing,put Mapreduce in very large-scale data analysis. Mapper produce intermediate results. Reducer aggregates the results. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 8 / 29
  • 9. MapReduce Files are split into fixed sized blocks and stored on data nodes (Default 64MB). Programs written, can process on distributed clusters in parallel. Input data is a set of key/value pairs, the output is also the key/value pairs. Mainly Two Phase Map and Reduce. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 9 / 29
  • 10. MapReduce (continue...) Figure : MapReduce Process Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 10 / 29
  • 11. MapReduce (continue...) Map Map process each block separately in parallel. Generate an intermediate key/value pairs set. Results of these logic blocks are reassembled. Reduce Accepts an intermediate key and related value. Processed the intermediate key and value. Form a set of relatively small value set. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 11 / 29
  • 12. YARN YARN (Yet Another Resource Negotiator). MapReduce 1.0 had issues with scalability, memory usage and synchronization. YARN addresses problems with MapReduce 1.0’s architecture, specifically with the JobTracker service. YARN splits up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. Rather than burdening a single node with handling scheduling and resource management for the entire cluster, YARN now distributes this responsibility across the cluster. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 12 / 29
  • 13. YARN (continue...) Figure : Yarn Architecture Via Apache Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 13 / 29
  • 14. Avro Avro is a framework for performing remote procedure calls and data serialization. It can be used to pass data from one program or language to another, e.g. from C to Pig. Suited for use with scripting languages such as Pig because data is always stored with its schema in Avro and therefore the data is self-describing. Avro can also handle changes in schema still preserving access to the data. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 14 / 29
  • 15. Pig Pig is a framework consisting of a high-level scripting language (Pig Latin). Run-time environment that allows users to execute MapReduce on a Hadoop cluster. Like HiveQL in Hive, Pig Latin is a higher-level language that compiles to MapReduce. Pig is more flexible than Hive with respect to possible data format. Pig’s data model is similar to the relational data model, except that tuples (a.k.a. records or rows) can be nested. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 15 / 29
  • 16. Hive Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. Using Hadoop was not easy for end users those who were not familiar with MapReduce framework. A Hive query is converted to MapReduce tasks. Figure : Hive Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 16 / 29
  • 17. Hive (continue...) Building blocks of Hive. Metastore stores the system catalog and metadata about tables, columns, partitions, etc. Driver manages the lifecycle of a HiveQL statement as it moves through Hive. Query Compiler compiles HiveQL into a directed acyclic graph for MapReduce tasks. Execution Engine executes the tasks produced by the compiler in proper dependency order. Hive Server provides a thrift interface and a JDBC/ODBC server. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 17 / 29
  • 18. HBase HBase is distributed column-oriented database built on top of HDFS. HBase is not relational and does not support SQL, but given the proper problem space. It is able to do what an RDBMS cannot. HBase is modeled with an HBase master node orchestrating a cluster of one or more regionserver slaves. HBase master is responsible for bootstrapping a virgin install, for assigning regions to registered regionservers, and for recovering regionserver failures. HBase manages a ZooKeeper instance as the authority on cluster state. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 18 / 29
  • 19. HBase (continue...) Figure : HBase Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 19 / 29
  • 20. Mahout Mahout is a scalable machine-learning and data mining library. There are currently four main groups of algorithms in Mahout. Recommendations, a.k.a. collective filtering. Classification, a.k.a categorization. Clustering. Frequent itemset mining, a.k.a parallel frequent pattern mining. Mahout is not simply a collection of pre-existing algorithms. Algorithms in the Mahout library belong to the subset that can be executed in a distributed fashion, and have been written to be executable in MapReduce. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 20 / 29
  • 21. Mahout (continue...) Figure : Mahout Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 21 / 29
  • 22. Sqoop Sqoop allows easy import and export of data from structured data stores. Command-line tool to import any JDBC supported database into Hadoop. Generate Writables for use in MapReduce jobs. High performance connectors for some RDBMS. Distributed,reliable,available service for efficiently moving large amount of data as it is produced. Suited for gathering log from multiple systems. Inserting them into HDFS as they are generated. Design Goal : Reliability , Scalability , Manageability, Extensibility. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 22 / 29
  • 23. Sqoop (continue...) Figure : Sqoop Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 23 / 29
  • 24. ZooKeeper ZooKeeper is a distributed, open-source coordination service for distributed applications. They are especially prone to errors such as race conditions and deadlock. Generate Writables for use in MapReduce jobs. ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace. The name space consists of data registers called znodes, and these are similar to files and directories. ZooKeeper data is kept in-memory, which means it can achieve high throughput and low latency numbers. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 24 / 29
  • 25. ZooKeeper (continue...) Figure : ZooKeeper Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 25 / 29
  • 26. Chukwa Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoops scalability and robustness. Four Components of Chukwa. Agents that run on each machine and emit data. Collectors that receive data from the agent and write to a stable storage. MapReduce jobs for parsing and archiving the data. HICC, Hadoop Infrastructure Care Center; a web-portal style interface for displaying data. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 26 / 29
  • 27. Chukwa (continue...) Figure : Chukwa Architecture Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 27 / 29
  • 28. HCatalog An incubator-level project at Apache. HCatalog is a metadata and table storage management service for HDFS. HCatalog depends on the Hive metastore and exposes it to other services such as MapReduce and Pig. HCatalog’s goal is to simplify the user’s interaction with HDFS data. Enable data sharing between tools and execution platforms. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 28 / 29
  • 29. Bibliography I G. Yang, “The application of mapreduce in the cloud computing,” Intelligence Information Processing and Trusted Computing (IPTC) 2011, vol. 9, pp. 154–156, Oct 2011. T. White, Hadoop:The Definitive Guide, Third Edition. 1005 Gravenstein Highway North, Sebastopol, CA 95472: OReilly Media, Inc., 2012. Sandip K. Darwade (MNIT) HADOOP ECOSYSTEM May 27, 2014 29 / 29