SlideShare une entreprise Scribd logo
1  sur  21
Big Data Analysis Using
Hadoop Cluster
By:
Syed Furqan Haider Shah #176
Introduction
What is
BIG DATA
The term Big data is used to describe a massive volume
of both structured and unstructured data that is so large
that it's difficult to process using traditional database
and software techniques.
BIG DATA(contd.)
• Big data consists of a heterogeneous mixture of structured and
unstructured data.
• Big data refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, process
and analyze.
Challenges
• These statistical records keep on increasing and increase
very fast.
• Unfortunately, as the data grows it becomes a tedious task
to process such a large data set and extract meaningful
information.
• If the data generated is in various formats, its processing
possesses new challenges.
Challenges(contd.)
• An issue with big data is that it uses NoSQL and has no Data
Description Language.
• Also, web-scale data is not universal and is heterogeneous. For
analysis of big data, database integration and cleaning is much
harder than the traditional mining approaches.
Solution
• Parallel computing programming
• An efficient platform for computing will not have centralized data
storage instead of that platform will be distributed in big scale
storage.
• Restricting access to the data
HADOOP
HADOOP
Hadoop is basically a tool which operates on a Distributive
File System. In this Architecture, all the Data Nodes
function parallel but functioning of a single Data Node is
still in sequential fashion.
HADOOP Architecture
•It is developed by Apache Software Foundation project and
open source software platform for scalable, distributed
computing.
•Apache Hadoop software library is a framework that allows
for the distributed processing of large data sets across
clusters of computers using simple programming models.
HADOOP Architecture(contd.)
•Hadoop provides fast and reliable analysis of both
Structured and un structured data.
•It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
•Hadoop uses Map/Reduce programming model to mine
data.
• This Map Reduce program is used to separate datasets which are sent as
input into independent subsets.Those are process parallel map task.
• Map() procedure that performs filtering and sorting
• Reduce() procedure that performs a summary operation
METHODOLOGY
Methodology
Hadoop’s library is designed to deliver a highly-available service on
top of a cluster of computers. Hadoop Cluster as a whole can be seen
as that consisting of:
1. Core Hadoop
2. Hadoop Ecosystem
Relationship b/w Core Hadoop and Hadoop
Ecosystem
Core Hadoop consists of :
• HDFS
• MapReduce.
Since the commencement of the project, a lot of other softwares
have grown around it.This is called Hadoop Ecosystem
HDFS(HADOOP distributed file system)
• An HDFS instance may consist of a large number of server machines,
each storing a part of the file system data.
• Detection of faults and quick automatic recovery from them is a core
architectural objective of HDFS.
• Applications that run on HDFS need streaming access to their datasets.
MapReduce
It is the basic logic flow of task execution. It comprises
mainly of Mappers and Reducers.
Mappers:
Mappers do the job of extracting the required raw information from
the whole dataset. i.e. In one case it extracts date of sale, name of the
product, selling price and cost price of various products.
MapReduce(contd.)
•Reducers:
It is then sorted according to the key value of Mappers and
passed to Reducers. Reducers do actual processing on this
reduced data provided by Mappers and accomplish the final
task yielding desired output.
Big data analysis using hadoop cluster

Contenu connexe

Tendances

Pivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-workingPivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-working
tts2086
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
Ganesh B
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

Tendances (20)

Pivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-workingPivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-working
 
Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedIn
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 

Similaire à Big data analysis using hadoop cluster

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Similaire à Big data analysis using hadoop cluster (20)

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
Anju
AnjuAnju
Anju
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Apache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptxApache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptx
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 

Dernier

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Big data analysis using hadoop cluster

  • 1. Big Data Analysis Using Hadoop Cluster By: Syed Furqan Haider Shah #176
  • 4. BIG DATA The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
  • 5. BIG DATA(contd.) • Big data consists of a heterogeneous mixture of structured and unstructured data. • Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, process and analyze.
  • 6. Challenges • These statistical records keep on increasing and increase very fast. • Unfortunately, as the data grows it becomes a tedious task to process such a large data set and extract meaningful information. • If the data generated is in various formats, its processing possesses new challenges.
  • 7. Challenges(contd.) • An issue with big data is that it uses NoSQL and has no Data Description Language. • Also, web-scale data is not universal and is heterogeneous. For analysis of big data, database integration and cleaning is much harder than the traditional mining approaches.
  • 8. Solution • Parallel computing programming • An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage. • Restricting access to the data
  • 10. HADOOP Hadoop is basically a tool which operates on a Distributive File System. In this Architecture, all the Data Nodes function parallel but functioning of a single Data Node is still in sequential fashion.
  • 11. HADOOP Architecture •It is developed by Apache Software Foundation project and open source software platform for scalable, distributed computing. •Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
  • 12. HADOOP Architecture(contd.) •Hadoop provides fast and reliable analysis of both Structured and un structured data. •It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. •Hadoop uses Map/Reduce programming model to mine data.
  • 13. • This Map Reduce program is used to separate datasets which are sent as input into independent subsets.Those are process parallel map task. • Map() procedure that performs filtering and sorting • Reduce() procedure that performs a summary operation
  • 14.
  • 16. Methodology Hadoop’s library is designed to deliver a highly-available service on top of a cluster of computers. Hadoop Cluster as a whole can be seen as that consisting of: 1. Core Hadoop 2. Hadoop Ecosystem
  • 17. Relationship b/w Core Hadoop and Hadoop Ecosystem Core Hadoop consists of : • HDFS • MapReduce. Since the commencement of the project, a lot of other softwares have grown around it.This is called Hadoop Ecosystem
  • 18. HDFS(HADOOP distributed file system) • An HDFS instance may consist of a large number of server machines, each storing a part of the file system data. • Detection of faults and quick automatic recovery from them is a core architectural objective of HDFS. • Applications that run on HDFS need streaming access to their datasets.
  • 19. MapReduce It is the basic logic flow of task execution. It comprises mainly of Mappers and Reducers. Mappers: Mappers do the job of extracting the required raw information from the whole dataset. i.e. In one case it extracts date of sale, name of the product, selling price and cost price of various products.
  • 20. MapReduce(contd.) •Reducers: It is then sorted according to the key value of Mappers and passed to Reducers. Reducers do actual processing on this reduced data provided by Mappers and accomplish the final task yielding desired output.