This Edureka Hadoop Training tutorial ( Hadoop Blog series: https://goo.gl/LFesy8 ) will help you to understand how Big Data emerged as a problem and how Hadoop solved that problem. This tutorial will be discussing about Hadoop Architecture, HDFS & it's architecture, YARN and MapReduce with a practical Aadhar use-case. Below are the topics covered in this tutorial:
1) What is Big Data?
2) Big Data in Different Domains
3) Problems Associated with Big Data
4) What is Hadoop?
5) HDFS
6) YARN
7) MapReduce
8) Hadoop Ecosystem
9) Aadhar Use-case
10) Edureka Big Data & Hadoop Training
Six Myths about Ontologies: The Basics of Formal Ontology
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
1.
2. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda
1. What is Big Data?
2. Big Data Growth Driver
3. Big Data Application in Different Domains
4. Problem with Big Data Processing
5. What is Hadoop
6. Hadoop Ecosystem
7. Aadhar Case Study
8. Hadoop Job Trends
9. Edureka Hadoop Training
4. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data!!!
Big data is the term for collection of data sets so large and complex that it becomes difficult to process using on-hand database system tools or
traditional data processing applications
6. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Data is being generated at an
alarming rate
Value ?
Mechanism to bring the correct
meaning out of the data Uncertainty and inconsistencies in the data
Volume Variety Velocity
VeracityValue
. . . . . .
V ’ s associated wit h B ig Dat a may
grow wit h t ime
Different kinds of data is being
generated from various sources
5 V’s of Big Data
9. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data!!!
Cisco Forecasts 30.6 Exabytes per Month of Mobile Data Traffic by 2020
3 major trends contributing to the growth of mobile data traffic.
• Adapting to Smarter Mobile Devices
• Defining Cell Network Advances—2G, 3G, and 4G (5G Perspectives)
• Reviewing Tiered Pricing—Unlimited Data and Shared Plans
ExabytesperMonth
Mobile Devices
IOT
Mobile
Social Media
12. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Different Domains
Banking & Finance
Early warning for securities fraud & trade visibility
Card fraud detection & audit trails
Enterprise credit risk reporting
Customer data transformation and analytics
Collecting, analyzing and utilizing consumer insights
Leveraging mobile and social media content
Understanding pattern of real-time, media content usage
Communication, Media & Entertainment
13. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Different Domains
Healthcare
Rising medical costs
Unavailability/ Unusable data
Patient history and disease case histories
Collecting, analyzing and utilizing consumer insights
Leveraging mobile and social media content
Understanding pattern of real-time, media content usage
Education
14. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Government
Energy & Utilities
60% of electricity grid assets will need replacement
Global installed wind capacity increased by 12.4%
Smart meters become main-stream, while consumers
want more control & insights into energy consumption
Integration and Interoperability of Big data from different
Government schemes
Different Domains
17. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Problems with Big Data
Highly Scalable
Storing huge and exponentially
growing datasets
Processing data having complex
structure (structured, un-
structured, semi-structured)
Bringing huge amount of data to
computation unit becomes a
bottleneck
2 31
19. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What is Hadoop?
MapReduce
(Processing)
HDFS
(Storage)
Allows to dump any kind of
data across the cluster
Allows parallel processing of
the data stored in HDFS
Master
Slaves
Hadoop Cluster
20. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop-as-a-Solution
1 hr.
HDFS
ReadWrite
Storing exponentially growing huge
datasets
Storing unstructured data Processing data faster
Allows to store any kind of data, be it structured, semi-
structured or unstructured
Provides parallel processing of data present in HDFS
Allows to process data locally i.e. each node works with
a part of data which is stored on it
HDFS, storage unit of Hadoop is a Distributed File
System
2 31
22. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
HDFS
NameNode
DataNode DataNodeDataNode
Slave
Node
Master
NodeHDFS
▪ Storage unit of Hadoop
▪ Distributed File System
▪ Divide files (input data) into smaller chunks and stores it
across the cluster
▪ Horizontal Scaling as per requirement
▪ Stores any kind of data
▪ No schema validation is done while dumping data
23. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
NameNode
• Master daemon
• Maintains and Manages DataNodes
• Records metadata e.g. location of blocks stored,
the size of the files, permissions, hierarchy, etc.
• Receives heartbeat and block report from all the
DataNodes
NameNode
NameNode
DataNode DataNodeDataNode
Secondary
NameNode
DataNode
▪ Slave daemons
▪ Stores actual data
▪ Serves read and write requests
24. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Secondary NameNode
Secondary NameNode
• Checkpointing is a process of combining
edit logs with FsImage
• Allows faster Failover as we have a back
up of the metadata
• Checkpointing happens periodically
(default: 1 hour)
NameNode
DataNode DataNodeDataNode
Secondary
NameNode
26. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
YARN
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
ResourceManager
• Receives the processing requests
• Passes the parts of requests to corresponding
NodeManagers
NodeManagers
• Installed on every DataNode
• Responsible for execution of task on every single
DataNode
28. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce
MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and
parallel algorithms inside Hadoop environment.
32. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Aadhar Case Study
• Aadhaar is a 12 digit unique-identity number issued to all Indian residents
based on their biometric and demographic data.
• The data is collected by the Unique Identification Authority of India (UIDAI),
a statutory authority established on 12 July 2016 by the Government of India
• Aadhaar is the world's largest biometric ID system, with over 1.133 billion
enrolled members as of 31 March 2017.
• As of this date, over 99% of Indians aged 18 and above had been enrolled in
Aadhaar.
Aadhaar is similar as SSN number
34. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Aadhar Case Study (Problem Statements)
1. Find out the total number of cards approved by States
2. Find out the total number of cards rejected by states.
3. Find out the total number of cards approved by cities.
4. Find out the total number of cards rejected by cities.
35. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Aadhar Case Study (Problem Statements)
1. Find out the total number of cards approved by States
2. Find out the total number of cards rejected by states.
3. Find out the total number of cards approved by cities.
4. Find out the total number of cards rejected by cities.
36. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Aadhar Case Study (Problem Statements)
1. Find out the total number of cards approved by States
2. Find out the total number of cards rejected by states.
3. Find out the total number of cards approved by cities.
4. Find out the total number of cards rejected by cities.
37. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Aadhar Case Study (Problem Statements)
1. Find out the total number of cards approved by States
2. Find out the total number of cards rejected by states.
3. Find out the total number of cards approved by cities.
4. Find out the total number of cards rejected by cities.
42. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Some Big Data & Hadoop Projects @ Edureka
1
2
3 Project #3: Tourism Data Analysis
Industry: Tourism
Project #1: Analyze social bookmarking sites
Industry: Social Media
Project #2: Customer Complaints Analysis
Industry: Retail
43. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Some Big Data & Hadoop Projects @ Edureka
4
5
6 Project #6: Analyze Movie Ratings
Industry: Media
Project #4: Airline Data Analysis
Industry: Aviation
Project #5: Analyze Loan Dataset
Industry: Banking and Finance
44. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Session In A Minute
Big Data In Different Domains
Big Data & Hadoop Training By Edureka
Growth Drivers
Hadoop-as-a-Solution
What is Big Data
Problems with Big Data
512
MB
File
128 MB
128 MB
128 MB
128 MB