SlideShare une entreprise Scribd logo
1  sur  18
Hadoop
Development
Series
By Sandeep Patil
11/1/2017 1Footer Text
Introduction to Big Data
and Hadoop
11/1/2017Footer Text 2
What is Big Data??
• Large amount of Data .
• Its a popular term used to express exponential growth of
data .
• Big data is difficult to store , collect , maintain , Analyze
and Visualize .
11/1/2017Footer Text 3
Big Data characteristics
• Volume :-
Large amount of data .
• Velocity :-
The rate at which data is getting generated
• Variety :-
Different types of Data
- Structured data ,eg MySql
- Semi-Structured data, eg xml , json
- Unstructured data, eg text , audio, video
11/1/2017Footer Text 4
Big Data sources
• Social Media
• Banks
• Instruments
• Websites
• Stock Market
11/1/2017Footer Text 5
Use cases of Big Data
• Recommendation engines
• Analyzing Call Detail Record(CDR)
• Fraud Detection
• Market Basket Analysis
• Sentimental Analysis
11/1/2017Footer Text 6
Hadoop Introduction
• Open source framework that allows distributed
processing of large datasets on the cluster of commodity
hardware
• Hadoop is a data management tool and uses scale out
storage .
11/1/2017Footer Text 7
Defining Hadoop Cluster
• Size of data is most important factor while defining
hadoop cluster
11/1/2017Footer Text 8
5 Servers with 10 TB storage
capacity each
Total Storage Capacity : - 50TB
Defining Hadoop Cluster
11/1/2017Footer Text 9
7 Servers with 10 TB storage
capacity each
Total storage capacity : 70TB
Hadoop Components
• Hadoop 1 Componets
- HDFS (Hadoop distributed file system)
- MapReduce
• Hadoop 2 Component
- HDFS (Hadoop distributed file system)
- YARN/MRv2
11/1/2017Footer Text 10
HDFS
MR/
YARN
Storage/
Reads-Writes
Processing
Hadoop Daemons
• Hadoop 1 Daemos
Namenode
Datanode
Secondary Namenode
job Tracker
Task Tracker
11/1/2017Footer Text 11
HDFS MapReduce
NameNode
DataNode
Job Tracker
Task Tracker
Hadoop Daemons
• Hadoop 2 Daemos
Namenode
Datanode
Secondary Namenode
Resource Manager
Node Manager
11/1/2017Footer Text 12
HDFS YARN
NameNode
DataNode
Resource Manager
Node Manager
Hadoop Master Slave
Architecture
11/1/2017Footer Text 13
HDFS MR/YARN
NameNode DataNode ResourceManager NodeManager
Master Slave Master Slave
Hadoop Cluster
• Assume that we have hadoop cluster with 4 nodes
11/1/2017Footer Text 14
Master
NameNode
ResourceManager
Slave
DataNode
NodeManager
Secondary Name Node
• Secondary Namenode is not a hot backup for Namenode
.
• It just takes hourly backup of Namenode metadata
• It is can be used to Restart a crashed Hadoop Cluster
• Secondary Namenode is an important demon for
Hadoop1 , However in hadoop2 It is not that much
Important .
11/1/2017Footer Text 15
Modes of Operation
• Stand Alone
• Pseudo Distributed
• Fully Distributed
11/1/2017Footer Text 16
Next Video
• Comparison between Hadoop1 and Hadoop2
11/1/2017Footer Text 17
Like and Subscribe
11/1/2017Footer Text 18
sdp117@gmail.com

Contenu connexe

Tendances

Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 

Tendances (20)

Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop
HadoopHadoop
Hadoop
 

Similaire à Introduction to Big Data and hadoop

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 

Similaire à Introduction to Big Data and hadoop (20)

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop development series(1)
Hadoop development series(1)Hadoop development series(1)
Hadoop development series(1)
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Big data
Big dataBig data
Big data
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Introduction to Big Data and hadoop

  • 2. Introduction to Big Data and Hadoop 11/1/2017Footer Text 2
  • 3. What is Big Data?? • Large amount of Data . • Its a popular term used to express exponential growth of data . • Big data is difficult to store , collect , maintain , Analyze and Visualize . 11/1/2017Footer Text 3
  • 4. Big Data characteristics • Volume :- Large amount of data . • Velocity :- The rate at which data is getting generated • Variety :- Different types of Data - Structured data ,eg MySql - Semi-Structured data, eg xml , json - Unstructured data, eg text , audio, video 11/1/2017Footer Text 4
  • 5. Big Data sources • Social Media • Banks • Instruments • Websites • Stock Market 11/1/2017Footer Text 5
  • 6. Use cases of Big Data • Recommendation engines • Analyzing Call Detail Record(CDR) • Fraud Detection • Market Basket Analysis • Sentimental Analysis 11/1/2017Footer Text 6
  • 7. Hadoop Introduction • Open source framework that allows distributed processing of large datasets on the cluster of commodity hardware • Hadoop is a data management tool and uses scale out storage . 11/1/2017Footer Text 7
  • 8. Defining Hadoop Cluster • Size of data is most important factor while defining hadoop cluster 11/1/2017Footer Text 8 5 Servers with 10 TB storage capacity each Total Storage Capacity : - 50TB
  • 9. Defining Hadoop Cluster 11/1/2017Footer Text 9 7 Servers with 10 TB storage capacity each Total storage capacity : 70TB
  • 10. Hadoop Components • Hadoop 1 Componets - HDFS (Hadoop distributed file system) - MapReduce • Hadoop 2 Component - HDFS (Hadoop distributed file system) - YARN/MRv2 11/1/2017Footer Text 10 HDFS MR/ YARN Storage/ Reads-Writes Processing
  • 11. Hadoop Daemons • Hadoop 1 Daemos Namenode Datanode Secondary Namenode job Tracker Task Tracker 11/1/2017Footer Text 11 HDFS MapReduce NameNode DataNode Job Tracker Task Tracker
  • 12. Hadoop Daemons • Hadoop 2 Daemos Namenode Datanode Secondary Namenode Resource Manager Node Manager 11/1/2017Footer Text 12 HDFS YARN NameNode DataNode Resource Manager Node Manager
  • 13. Hadoop Master Slave Architecture 11/1/2017Footer Text 13 HDFS MR/YARN NameNode DataNode ResourceManager NodeManager Master Slave Master Slave
  • 14. Hadoop Cluster • Assume that we have hadoop cluster with 4 nodes 11/1/2017Footer Text 14 Master NameNode ResourceManager Slave DataNode NodeManager
  • 15. Secondary Name Node • Secondary Namenode is not a hot backup for Namenode . • It just takes hourly backup of Namenode metadata • It is can be used to Restart a crashed Hadoop Cluster • Secondary Namenode is an important demon for Hadoop1 , However in hadoop2 It is not that much Important . 11/1/2017Footer Text 15
  • 16. Modes of Operation • Stand Alone • Pseudo Distributed • Fully Distributed 11/1/2017Footer Text 16
  • 17. Next Video • Comparison between Hadoop1 and Hadoop2 11/1/2017Footer Text 17
  • 18. Like and Subscribe 11/1/2017Footer Text 18 sdp117@gmail.com