Introduction to Big Data and hadoop

Hadoop
Development
Series
By Sandeep Patil
11/1/2017 1Footer Text

Introduction to Big Data
and Hadoop
11/1/2017Footer Text 2

What is Big Data??
• Large amount of Data .
• Its a popular term used to express exponential growth of
data .
• Big data is difficult to store , collect , maintain , Analyze
and Visualize .

Big Data characteristics
• Volume :-
Large amount of data .
• Velocity :-
The rate at which data is getting generated
• Variety :-
Different types of Data
- Structured data ,eg MySql
- Semi-Structured data, eg xml , json
- Unstructured data, eg text , audio, video

Big Data sources
• Social Media
• Banks
• Instruments
• Websites
• Stock Market

Use cases of Big Data
• Recommendation engines
• Analyzing Call Detail Record(CDR)
• Fraud Detection
• Market Basket Analysis
• Sentimental Analysis

Hadoop Introduction
• Open source framework that allows distributed
processing of large datasets on the cluster of commodity
hardware
• Hadoop is a data management tool and uses scale out
storage .

Defining Hadoop Cluster
• Size of data is most important factor while defining
hadoop cluster
5 Servers with 10 TB storage
capacity each
Total Storage Capacity : - 50TB

Defining Hadoop Cluster
7 Servers with 10 TB storage
capacity each
Total storage capacity : 70TB

Hadoop Components
• Hadoop 1 Componets
- HDFS (Hadoop distributed file system)
- MapReduce
• Hadoop 2 Component
- HDFS (Hadoop distributed file system)
- YARN/MRv2
HDFS
MR/
YARN
Storage/
Reads-Writes
Processing

Hadoop Daemons
• Hadoop 1 Daemos
Namenode
Datanode
Secondary Namenode
job Tracker
Task Tracker
HDFS MapReduce
NameNode
DataNode
Job Tracker
Task Tracker

Hadoop Daemons
• Hadoop 2 Daemos
Namenode
Datanode
Secondary Namenode
Resource Manager
Node Manager
HDFS YARN
NameNode
DataNode
Resource Manager
Node Manager

Hadoop Master Slave
Architecture
HDFS MR/YARN
NameNode DataNode ResourceManager NodeManager
Master Slave Master Slave

Hadoop Cluster
• Assume that we have hadoop cluster with 4 nodes
Master
NameNode
ResourceManager
Slave
DataNode
NodeManager

Secondary Name Node
• Secondary Namenode is not a hot backup for Namenode
.
• It just takes hourly backup of Namenode metadata
• It is can be used to Restart a crashed Hadoop Cluster
• Secondary Namenode is an important demon for
Hadoop1 , However in hadoop2 It is not that much
Important .

Modes of Operation
• Stand Alone
• Pseudo Distributed
• Fully Distributed

Next Video
• Comparison between Hadoop1 and Hadoop2

Like and Subscribe
sdp117@gmail.com

Introduction to Big Data and hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Introduction to Big Data and hadoop

Similaire à Introduction to Big Data and hadoop (20)

Dernier

Dernier (20)

Introduction to Big Data and hadoop