SlideShare a Scribd company logo
1 of 15
HADOOP ECOSYSTEM
In the previous blog on Hadoop Tutorial,
we discussed about Hadoop, its features
and core components.
Now, the next step forward is to
understand Hadoop Ecosystem.
It is an essential topic to understand
before you start working with Hadoop.
This Hadoop ecosystem blog will
familiarize you with industry-wide used
Big Data frameworks, required
for Hadoop Certification.
 HDFS ->Hadoop Distributed File System
 YARN -> Yet Another Resource Negotiator
 MapReduce -> Data processing using
programming
 Spark -> In-memory Data Processing
 PIG, HIVE-> Data Processing Services using
Query (SQL-like)
 HBase -> NoSQL Database
 Mahout, Spark MLlib -> Machine Learning
 Apache Drill -> SQL on Hadoop
 Zookeeper -> Managing Cluster
HDFS
 Hadoop Distributed File System is the core
component or you can say, the backbone of
Hadoop Ecosystem.
 HDFS is the one, which makes it possible to
store different types of large data sets (i.e.
structured, unstructured and semi structured
data).
 HDFS creates a level of abstraction over the
resources, from where we can see the whole
HDFS as a single unit.
 It helps us in storing our data across various
nodes and maintaining the log file about the
stored data (metadata).
 HDFS has two core components, i.e. NameNode and DataNode.
◦ The NameNode is the main node and it doesn’t store the
actual data. It contains metadata, just like a log file or you can
say as a table of content. Therefore, it requires less storage and
high computational resources.
◦ On the other hand, all your data is stored on
the DataNodes and hence it requires more storage resources.
These DataNodes are commodity hardware (like your laptops
and desktops) in the distributed environment. That’s the
reason, why Hadoop solutions are very cost effective.
◦ You always communicate to the NameNode while writing the
data. Then, it internally sends a request to the client to store
and replicate data on various DataNodes.
YARN
 Consider YARN as the brain of your Hadoop
Ecosystem. It performs all your processing
activities by allocating resources and
scheduling tasks.
 It has two major components,
i.e. ResourceManager and NodeManager.
◦ ResourceManager is again a main node in the
processing department.
◦ It receives the processing requests, and then
passes the parts of requests to corresponding
NodeManagers accordingly, where the actual
processing takes place.
◦ NodeManagers are installed on every DataNode. It
is responsible for execution of task on every single
DataNode.
◦ Schedulers: Based on your application resource
requirements, Schedulers perform scheduling
algorithms and allocates the resources.
◦ ApplicationsManager: While ApplicationsManager
accepts the job submission, negotiates to
containers (i.e. the Data node environment where
process executes) for executing the application
specific ApplicationMaster and monitoring the
progress. ApplicationMasters are the deamons
which reside on DataNode and communicates to
containers for execution of tasks on each
DataNode.ResourceManager has two components,
MAPREDUCE
 It is the core component of processing in a
Hadoop Ecosystem as it provides the logic of
processing. In other words, MapReduce is a
software framework which helps in writing
applications that processes large data sets using
distributed and parallel algorithms inside Hadoop
environment.
 In a MapReduce program, Map() and
Reduce() are two functions.
◦ The Map function performs actions like filtering,
grouping and sorting.
◦ While Reduce function aggregates and summarizes
the result produced by map function.
◦ The result generated by the Map function is a key
value pair (K, V) which acts as the input for Reduce
function.
APACHE PIG
 But don’t be shocked when I say that at the PIG
has two parts: Pig Latin, the language and the pig
runtime, for the execution environment. You can
better understand it as Java and JVM.
 It supports pig latin language, which has SQL like
command structure.
 As everyone does not belong from a programming
background. So, Apache PIG relieves them. You
might be curious to know how?
 Well, I will tell you an interesting fact:
 10 line of pig latin = approx. 200 lines of Map-
Reduce Java code
 back end of Pig job, a map-reduce job executes.
HIVE:
 Facebook created HIVE for people who are fluent with SQL. Thus, HIVE
makes them feel at home while working in a Hadoop Ecosystem.
 Basically, HIVE is a data warehousing component which performs reading,
writing and managing large data sets in a distributed environment using
SQL-like interface.
 HIVE + SQL = HQL
 The query language of Hive is called Hive Query Language(HQL), which is
very similar like SQL.
 It has 2 basic components: Hive Command Line and JDBC/ODBC driver.
 The Hive Command line interface is used to execute HQL commands.
 While, Java Database Connectivity (JDBC) and Object Database
Connectivity (ODBC) is used to establish connection from data storage.
 Secondly, Hive is highly scalable. As, it can serve both the purposes, i.e.
large data set processing (i.e. Batch query processing) and uery
processing).
 It supports all primitive data types of SQL.
 You can use predefined functions, or write tailored user defined functions
(UDF) also to accomplish your specific needs.
 As an alternative, you may go to this comprehensive video tutorial where
each tool present in Hadoop Ecosystem has been discussed:
 Hadoop Ecosystem | Edureka
 APACHE MAHOUT
Now, let us talk about Mahout which is
renowned for machine learning. Mahout
provides an environment for creating
machine learning applications which are
scalable.
 Collaborative filtering: Mahout mines user
behaviors, their patterns and their characteristics and
based on that it predicts and make recommendations
to the users. The typical use case is E-commerce
website.
 Clustering: It organizes a similar group of data
together like articles can contain blogs, news,
research papers etc.
 Classification: It means classifying and categorizing
data into various sub-departments like articles can be
categorized into blogs, news, essay, research papers
and other categories.
 Frequent item set missing: Here Mahout checks,
which objects are likely to be appearing together and
make suggestions, if they are missing. For example,
cell phone and cover are brought together in general.
So, if you search for a cell phone, it will also
recommend you the cover and cases.
 Mahout provides a command line to invoke various
algorithms. It has a predefined set of library which
APACHE SPARK
 Apache Spark is a framework for real time
data analytics in a distributed computing
environment.
 The Spark is written in Scala and was
originally developed at the University of
California, Berkeley.
 It executes in-memory computations to
increase speed of data processing over Map-
Reduce.
 It is 100x faster than Hadoop for large scale
data processing by exploiting in-memory
computations and other
APACHE HBASE
 HBase is an open source, non-relational
distributed database. In other words, it is a NoSQL
database.
 It supports all types of data and that is why, it’s
capable of handling anything and everything
inside a Hadoop ecosystem.
 It is modelled after Google’s BigTable, which is a
distributed storage system designed to cope up
with large data sets.
 The HBase was designed to run on top of HDFS
and provides BigTable like capabilities.
 It gives us a fault tolerant way of storing sparse
data, which is common in most Big Data use
cases.
 The HBase is written in Java, whereas HBase
applications can be written in REST, Avro and
Thrift APIs.
APACHE DRILL
 As the name suggests, Apache Drill is used to drill into any
kind of data. It’s an open source application which works with
distributed environment to analyze large data sets.
 It is a replica of Google Dremel.
 It supports different kinds NoSQL databases and file systems,
which is a powerful feature of Drill. For example: Azure Blob
Storage, Google Cloud Storage, HBase, MongoDB, MapR-DB
HDFS, MapR-FS, Amazon S3, Swift, NAS and local files.
 So, basically the main aim behind Apache Drill is to provide
scalability so that we can process petabytes and exabytes of
data efficiently (or you can say in minutes).
 The main power of Apache Drill lies in combining a variety
of data stores just by using a single query.
 Apache Drill basically follows the ANSI SQL.
 It has a powerful scalability factor in supporting millions of
users and serve their query requests over large scale data.

More Related Content

What's hot

Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Edureka!
 

What's hot (20)

Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
HDFS
HDFSHDFS
HDFS
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Spark1
Spark1Spark1
Spark1
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
Hive
HiveHive
Hive
 

Similar to Bigdata

BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfDIVYA370851
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxmrudulasb
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf8840VinayShelke
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem nuriadelasheras
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 

Similar to Bigdata (20)

BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptx
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 

More from renukarenuka9 (20)

mobile computing
mobile computingmobile computing
mobile computing
 
Dip
DipDip
Dip
 
Compiler design
Compiler designCompiler design
Compiler design
 
Web programming
Web programmingWeb programming
Web programming
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Rdbms
RdbmsRdbms
Rdbms
 
Rdbms
RdbmsRdbms
Rdbms
 
operating system
operating systemoperating system
operating system
 
Rdbms
RdbmsRdbms
Rdbms
 
OPERATING SYSTEM
OPERATING SYSTEMOPERATING SYSTEM
OPERATING SYSTEM
 
Data mining
Data miningData mining
Data mining
 
Computer network
Computer networkComputer network
Computer network
 
computer network
computer networkcomputer network
computer network
 
operating system
operating systemoperating system
operating system
 
data mining
data miningdata mining
data mining
 
COMPUTER NETWORK
COMPUTER NETWORKCOMPUTER NETWORK
COMPUTER NETWORK
 
data mining
data miningdata mining
data mining
 

Recently uploaded

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Recently uploaded (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

Bigdata

  • 1.
  • 2. HADOOP ECOSYSTEM In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. This Hadoop ecosystem blog will familiarize you with industry-wide used Big Data frameworks, required for Hadoop Certification.
  • 3.  HDFS ->Hadoop Distributed File System  YARN -> Yet Another Resource Negotiator  MapReduce -> Data processing using programming  Spark -> In-memory Data Processing  PIG, HIVE-> Data Processing Services using Query (SQL-like)  HBase -> NoSQL Database  Mahout, Spark MLlib -> Machine Learning  Apache Drill -> SQL on Hadoop  Zookeeper -> Managing Cluster
  • 4. HDFS  Hadoop Distributed File System is the core component or you can say, the backbone of Hadoop Ecosystem.  HDFS is the one, which makes it possible to store different types of large data sets (i.e. structured, unstructured and semi structured data).  HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit.  It helps us in storing our data across various nodes and maintaining the log file about the stored data (metadata).
  • 5.  HDFS has two core components, i.e. NameNode and DataNode. ◦ The NameNode is the main node and it doesn’t store the actual data. It contains metadata, just like a log file or you can say as a table of content. Therefore, it requires less storage and high computational resources. ◦ On the other hand, all your data is stored on the DataNodes and hence it requires more storage resources. These DataNodes are commodity hardware (like your laptops and desktops) in the distributed environment. That’s the reason, why Hadoop solutions are very cost effective. ◦ You always communicate to the NameNode while writing the data. Then, it internally sends a request to the client to store and replicate data on various DataNodes.
  • 6. YARN  Consider YARN as the brain of your Hadoop Ecosystem. It performs all your processing activities by allocating resources and scheduling tasks.  It has two major components, i.e. ResourceManager and NodeManager. ◦ ResourceManager is again a main node in the processing department. ◦ It receives the processing requests, and then passes the parts of requests to corresponding NodeManagers accordingly, where the actual processing takes place.
  • 7. ◦ NodeManagers are installed on every DataNode. It is responsible for execution of task on every single DataNode. ◦ Schedulers: Based on your application resource requirements, Schedulers perform scheduling algorithms and allocates the resources. ◦ ApplicationsManager: While ApplicationsManager accepts the job submission, negotiates to containers (i.e. the Data node environment where process executes) for executing the application specific ApplicationMaster and monitoring the progress. ApplicationMasters are the deamons which reside on DataNode and communicates to containers for execution of tasks on each DataNode.ResourceManager has two components,
  • 8. MAPREDUCE  It is the core component of processing in a Hadoop Ecosystem as it provides the logic of processing. In other words, MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment.  In a MapReduce program, Map() and Reduce() are two functions. ◦ The Map function performs actions like filtering, grouping and sorting. ◦ While Reduce function aggregates and summarizes the result produced by map function. ◦ The result generated by the Map function is a key value pair (K, V) which acts as the input for Reduce function.
  • 9. APACHE PIG  But don’t be shocked when I say that at the PIG has two parts: Pig Latin, the language and the pig runtime, for the execution environment. You can better understand it as Java and JVM.  It supports pig latin language, which has SQL like command structure.  As everyone does not belong from a programming background. So, Apache PIG relieves them. You might be curious to know how?  Well, I will tell you an interesting fact:  10 line of pig latin = approx. 200 lines of Map- Reduce Java code  back end of Pig job, a map-reduce job executes.
  • 10. HIVE:  Facebook created HIVE for people who are fluent with SQL. Thus, HIVE makes them feel at home while working in a Hadoop Ecosystem.  Basically, HIVE is a data warehousing component which performs reading, writing and managing large data sets in a distributed environment using SQL-like interface.  HIVE + SQL = HQL  The query language of Hive is called Hive Query Language(HQL), which is very similar like SQL.  It has 2 basic components: Hive Command Line and JDBC/ODBC driver.  The Hive Command line interface is used to execute HQL commands.  While, Java Database Connectivity (JDBC) and Object Database Connectivity (ODBC) is used to establish connection from data storage.  Secondly, Hive is highly scalable. As, it can serve both the purposes, i.e. large data set processing (i.e. Batch query processing) and uery processing).  It supports all primitive data types of SQL.  You can use predefined functions, or write tailored user defined functions (UDF) also to accomplish your specific needs.  As an alternative, you may go to this comprehensive video tutorial where each tool present in Hadoop Ecosystem has been discussed:  Hadoop Ecosystem | Edureka
  • 11.  APACHE MAHOUT Now, let us talk about Mahout which is renowned for machine learning. Mahout provides an environment for creating machine learning applications which are scalable.
  • 12.  Collaborative filtering: Mahout mines user behaviors, their patterns and their characteristics and based on that it predicts and make recommendations to the users. The typical use case is E-commerce website.  Clustering: It organizes a similar group of data together like articles can contain blogs, news, research papers etc.  Classification: It means classifying and categorizing data into various sub-departments like articles can be categorized into blogs, news, essay, research papers and other categories.  Frequent item set missing: Here Mahout checks, which objects are likely to be appearing together and make suggestions, if they are missing. For example, cell phone and cover are brought together in general. So, if you search for a cell phone, it will also recommend you the cover and cases.  Mahout provides a command line to invoke various algorithms. It has a predefined set of library which
  • 13. APACHE SPARK  Apache Spark is a framework for real time data analytics in a distributed computing environment.  The Spark is written in Scala and was originally developed at the University of California, Berkeley.  It executes in-memory computations to increase speed of data processing over Map- Reduce.  It is 100x faster than Hadoop for large scale data processing by exploiting in-memory computations and other
  • 14. APACHE HBASE  HBase is an open source, non-relational distributed database. In other words, it is a NoSQL database.  It supports all types of data and that is why, it’s capable of handling anything and everything inside a Hadoop ecosystem.  It is modelled after Google’s BigTable, which is a distributed storage system designed to cope up with large data sets.  The HBase was designed to run on top of HDFS and provides BigTable like capabilities.  It gives us a fault tolerant way of storing sparse data, which is common in most Big Data use cases.  The HBase is written in Java, whereas HBase applications can be written in REST, Avro and Thrift APIs.
  • 15. APACHE DRILL  As the name suggests, Apache Drill is used to drill into any kind of data. It’s an open source application which works with distributed environment to analyze large data sets.  It is a replica of Google Dremel.  It supports different kinds NoSQL databases and file systems, which is a powerful feature of Drill. For example: Azure Blob Storage, Google Cloud Storage, HBase, MongoDB, MapR-DB HDFS, MapR-FS, Amazon S3, Swift, NAS and local files.  So, basically the main aim behind Apache Drill is to provide scalability so that we can process petabytes and exabytes of data efficiently (or you can say in minutes).  The main power of Apache Drill lies in combining a variety of data stores just by using a single query.  Apache Drill basically follows the ANSI SQL.  It has a powerful scalability factor in supporting millions of users and serve their query requests over large scale data.