Technological geeks Hindi Video 1 -
https://youtu.be/LSvAoo4pYjs
Contents :-
What is Big Data ?
Big Data characteristics
Big Data sources
Use cases of Big Data
Hadoop Daemons
Hadoop Master slave architecture
Hadoop cluster
Secondary namenode
3. What is Big Data??
• Large amount of Data .
• Its a popular term used to express exponential growth of
data .
• Big data is difficult to store , collect , maintain , Analyze
and Visualize .
11/1/2017Footer Text 3
4. Big Data characteristics
• Volume :-
Large amount of data .
• Velocity :-
The rate at which data is getting generated
• Variety :-
Different types of Data
- Structured data ,eg MySql
- Semi-Structured data, eg xml , json
- Unstructured data, eg text , audio, video
11/1/2017Footer Text 4
5. Big Data sources
• Social Media
• Banks
• Instruments
• Websites
• Stock Market
11/1/2017Footer Text 5
6. Use cases of Big Data
• Recommendation engines
• Analyzing Call Detail Record(CDR)
• Fraud Detection
• Market Basket Analysis
• Sentimental Analysis
11/1/2017Footer Text 6
7. Hadoop Introduction
• Open source framework that allows distributed
processing of large datasets on the cluster of commodity
hardware
• Hadoop is a data management tool and uses scale out
storage .
11/1/2017Footer Text 7
8. Defining Hadoop Cluster
• Size of data is most important factor while defining
hadoop cluster
11/1/2017Footer Text 8
5 Servers with 10 TB storage
capacity each
Total Storage Capacity : - 50TB
14. Hadoop Cluster
• Assume that we have hadoop cluster with 4 nodes
11/1/2017Footer Text 14
Master
NameNode
ResourceManager
Slave
DataNode
NodeManager
15. Secondary Name Node
• Secondary Namenode is not a hot backup for Namenode
.
• It just takes hourly backup of Namenode metadata
• It is can be used to Restart a crashed Hadoop Cluster
• Secondary Namenode is an important demon for
Hadoop1 , However in hadoop2 It is not that much
Important .
11/1/2017Footer Text 15
16. Modes of Operation
• Stand Alone
• Pseudo Distributed
• Fully Distributed
11/1/2017Footer Text 16