SlideShare une entreprise Scribd logo
1  sur  63
Module-1
Understanding Big Data And Hadoop
www.edureka.co/big-data-and-hadoop
Verifiable Certificate
Experienced Instructor
Live Online Class
Survey Feedback
24x7 Support
In-class Questions
Class Recording in LMS
Module Wise Assessment
Project Work
Android & iOS App
www.edureka.co/big-data-and-hadoopSlide 2
How it Works?
Objectives
www.edureka.co/big-data-and-hadoopSlide 3
At the end of this module, you will be able to
 Understand What is Big data
 Analyse limitations and solutions of existing
Data Analytics Architecture
 Understand What is Hadoop and its features
 Hadoop Ecosystem
 Understand Hadoop 2.x core components
 Perform Read and Write in Hadoop
 Understand Rack Awareness concept
 Work on Edureka’s VM
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data sets so
large and complex that it becomes difficult to process
using on-hand database management tools or
traditional data processing applications
 The challenges include capture, curation, storage,
search, sharing, transfer, analysis, and visualization
What is Big Data?
cloud
www.edureka.co/big-data-and-hadoopSlide 4
tools
statistics
No SQL
compression
storage
support
database
analyze
information
terabytes
mobile
processing
Big Data
What is Big Data?
Amazon handles 15 million
customer click stream user
data per day to
recommend products.
www.edureka.co/big-data-and-hadoopSlide 5
Stock market generates about one
terabyte of new trade data per day
to perform stock trading analytics to
determine trends for optimal trades.
294 billion emails sent every
day. Services analyse this data
to find the spams.
Systems / Enterprises generate huge amount of data from Terabytes to Petabytes of information
Un-structured Data is Exploding
40,000 EB, or 40 Zettabytes By 2020, IDC (International Data Corporation) predicts the number will have reached
(ZB)
 The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on
Earth.
Un-structured Data
7000
6000
5000
4000
3000
2000
1000
0
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Structured Data Un-structured Data
www.edureka.co/big-data-and-hadoopSlide 6
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
IBM’s Definition of Big Data
VOLUME
Web
logs
Audios
Images
Videos
Sensor
Data
VARIETYVELOCITY VERACITY
Min Max Mean SD
4.3 7.9 5.84 0.83
2.0 4.4 3.05 0.43
0.1 2.5 1.20 0.76
www.edureka.co/big-data-and-hadoopSlide 7
Annie’s Introduction
www.edureka.co/big-data-and-hadoopSlide 8
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Annie’s Question
Map the following to corresponding data type:
» XML files, e-mail body
» Audio, Video, Images, Archived documents
» Data from Enterprise systems (ERP, CRM etc.)
www.edureka.co/big-data-and-hadoopSlide 9
Annie’s Answer
Ans. XML files, e-mail body  Semi-structured data
Audio, Video, Image, Files, Archived documents  Unstructured
data
Data from Enterprise systems (ERP, CRM etc.)  Structured
data
www.edureka.co/big-data-and-hadoopSlide 10
Further Reading
More on Big Data
http://www.edureka.in/blog/the-hype-behind-big-data/
Why Hadoop?
http://www.edureka.in/blog/why-hadoop/
Opportunities in Hadoop
http://www.edureka.in/blog/jobs-in-hadoop/
Big Data
http://en.wikipedia.org/wiki/Big_Data
IBM’s definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
www.edureka.co/big-data-and-hadoopSlide 11
Common Big Data Customer Scenarios
 Web and e-tailing
» Recommendation Engines
» Ad Targeting
» SearchQuality
» Abuse and Click Fraud Detection
 Telecommunications
» Customer Churn Prevention
» Network Performance Optimization
» Calling Data Record (CDR) Analysis
» Analysing Network to Predict Failure
http://wiki.apache.org/hadoop/PoweredBy
www.edureka.co/big-data-and-hadoopSlide 12
 Government
» Fraud Detection and Cyber Security
» Welfare Schemes
» Justice
 Healthcare and Life Sciences
» Health Information Exchange
» Gene Sequencing
» Serialization
» Healthcare Service Quality Improvements
» Drug Safety
Common Big Data Customer Scenarios (Contd.)
http://wiki.apache.org/hadoop/PoweredBy
www.edureka.co/big-data-and-hadoopSlide 13
Common Big Data Customer Scenarios (Contd.)
 Banks and Financial services
» Modeling True Risk
» ThreatAnalysis
» Fraud Detection
» Trade Surveillance
» Credit Scoring and Analysis
 Retail
» Point of Sales Transaction Analysis
» Customer Churn Analysis
» Sentiment Analysis
http://wiki.apache.org/hadoop/PoweredBy
www.edureka.co/big-data-and-hadoopSlide 14
Slide 17 www.edureka.co/big-data-and-hadoop
Hidden Treasure
*Sears was using traditional systems such as Oracle Exadata, Teradata and SAS etc., to store and process the customer activity and sales data.
 Insight into data can provide Business Advantage.
Case Study: Sears Holding Corporation
 Some key early indicators can mean Fortunes to Business.
 More Precise Analysis with more data.
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
Mostly Append
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
ETL Compute Grid
Storage only Grid (Original Raw Data)
Collection
Inctrumentation
A meagre
10% of the
~2PB data is
available for
BI
Storage
2. Moving data to compute
doesn’t scale
90% of
the ~2PB
archived
Processing
3. Premature data
death
1. Can’t explore original
high fidelity raw data
Limitations of Existing Data Analytics Architecture
www.edureka.co/big-data-and-hadoopSlide 16
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Hadoop : Storage + Compute Grid
Collection
Instrumentation
Both
Storage
And
Processing
Entire ~2PB
Data is
available for
processing
No Data
Archiving
1. Data Exploration &
Advanced analytics
2. Scalable throughput for ETL &
aggregation
Mostly Append
3. Keep data alive
forever
*Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as
was the case with existing Non-Hadoop solutions.
www.edureka.co/big-data-and-hadoopSlide 17
Solution: A Combined Storage Computer Layer
Why DFS?
Read 1 TB Data
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
4 I/O Channels
Each Channel – 100 MB/s
www.edureka.co/big-data-and-hadoopSlide 18
Why DFS? (Contd.)
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
4 I/O Channels
Each Channel – 100 MB/s
www.edureka.co/big-data-and-hadoopSlide 19
43 Minutes
Read 1 TB Data
Why DFS? (Contd.)
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
4 I/O Channels
Each Channel – 100 MB/s
www.edureka.co/big-data-and-hadoopSlide 20
4.3 Minutes43 Minutes
Read 1 TB Data
What is Hadoop?
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters
of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage and distributed processing.
www.edureka.co/big-data-and-hadoopSlide 21
Hadoop Key Characteristics
Reliable
EconomicalFlexible
Scalable
Hadoop
Features
www.edureka.co/big-data-and-hadoopSlide 22
Hadoop Design Principles
www.edureka.co/big-data-and-hadoopSlide 23
 Facilitate the storage and processing of large and/or rapidly growing data sets
» Structured and unstructured data
» Simple programming models
 Scale-Out rather than Scale-Up
 Bring Code to Data rather than data to code
 High scalability and availability
 Use commodity hardware
 Fault-tolerance
Hadoop – It’s about Scale and Structure
Structured Data Types Multi and Unstructured
Limited, No Data Processing Processing Processing coupled with Data
Standards & Structured Governance Loosely Structured
Required On Write Schema Required On Read
Reads are Fast Speed Writes are Fast
Software License Cost Support Only
Known Entity Resources Growing, Complexities, Wide
OLTP
Complex ACID Transactions
Operational Data Store
Best Fit Use Data Discovery
Processing Unstructured Data
Massive Storage/Processing
RDBMS HADOOP
www.edureka.co/big-data-and-hadoopSlide 24
Annie’s Question
Hadoop is a framework that allows for the distributed
processing of:
» Small Data Sets
» Large Data Sets
www.edureka.co/big-data-and-hadoopSlide 25
Annie’s Answer
Ans. Large Data Sets.
It is also capable of processing small data-sets. However, to
experience the true power of Hadoop, one needs to have
data in TB’s. Because this is where RDBMS takes hours and
fails whereas Hadoop does the same in couple of minutes.
www.edureka.co/big-data-and-hadoopSlide 26
Hadoop Ecosystem
Pig Latin
Data Analysis
Hive
DWSystem
Other
YARN
Frameworks
(MPI,GRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Mahout
MachineLearning
MapReduce Framework
HBase
Hadoop 1.0 Hadoop 2.0
Sqoop
Unstructured or
Semi-structured Data Structured Data
Flume
Sqoop
Unstructured or
Semi-structured Data Structured Data
Flume
Mahout
Machine Learning
www.edureka.co/big-data-and-hadoopSlide 27
Machine Learning with Mahout
https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout
Write intelligent applications using Apache Mahout
LinkedIn Recommendations
Hadoop and
MapReduce magic in
action
www.edureka.co/big-data-and-hadoopSlide 28
Hadoop 2.x Core Components
Hadoop 2.x Core Components
HDFS YARN
Storage Processing
DataNode
NameNode Resource Manager
Node Manager
Master
Slave
Secondary
NameNode
www.edureka.co/big-data-and-hadoopSlide 29
Hadoop 2.x Core Components ( Contd.)
DataNode
Node
Manager
DataNode DataNode DataNode
YARN
HDFS
Cluster
Resource
Manager
NameNode
Node
Manager
Node
Manager
Node
Manager
www.edureka.co/big-data-and-hadoopSlide 30
Main Components of HDFS
 NameNode:
» Master of the system
» Maintains and manages the blocks which are present on
the DataNodes
 DataNodes:
» Slaves which are deployed on each machine and provide
the actual storage
» Responsible for serving read and write requests for the
clients
www.edureka.co/big-data-and-hadoopSlide 31
NameNode Metadata
 Meta-data in Memory
» The entire metadata is in main memory
» No demand paging of FS meta-data
 Types of Metadata
» List of files
» List of Blocks for each file
» List of DataNode for each block
» File attributes, e.g. access time, replication factor
 A Transaction Log
» Records file creations, file deletions etc.
NameNode
(Stores metadata only)
METADATA:
/user/doug/hinfo -> 1 3 5
/user/doug/pdetail -> 4 2
www.edureka.co/big-data-and-hadoopSlide 32
NameNode:
Keeps track of overall file directory
structure and the placement of Data Block
File Blocks
 By Default, block size is 128mb in Hadoop 2.x and 64mb in Hadoop 1.x
 Why block size is large?
» The main reason for having the HDFS blocks in large size is to reduce the cost of seek time.
» The large block size is to account for proper usage of storage space while considering the limit on the
memory of name node.
200mb – emp.dat
64mb – Block1
64mb – Block2
64mb – Block3
8mb – Block4
200mb – abc.txt
128mb – Block 1
72mb – Block 2
Hadoop 2.x
www.edureka.co/big-data-and-hadoopSlide 33
Hadoop 1.x
Replication and Rack Awareness
www.edureka.co/big-data-and-hadoopSlide 34
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 35
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 36
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 37
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 38
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 39
Replication and Rack Awareness (Contd.)
www.edureka.co/big-data-and-hadoopSlide 40
Pipelined Write
www.edureka.co/big-data-and-hadoopSlide 41
Client reading file from HDFS
www.edureka.co/big-data-and-hadoopSlide 42
Annie’s Question
In HDFS, blocks of a file are written in parallel, however
the replication of the blocks are done sequentially:
a. TRUE
b. FALSE
www.edureka.co/big-data-and-hadoopSlide 43
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 44
Ans. True. A file is divided into Blocks, these blocks are
written in parallel, but the block replication happen in
sequence.
Annie’s Question
A file of 400MB is being copied to HDFS. The system has
finished copying 250MB. What happens if a client tries to
access that file:
a. Can read up to block that's successfully written.
b. Can read up to last bit successfully written.
c. Will throw an exception.
d. Cannot see that file until its finished copying.
www.edureka.co/big-data-and-hadoopSlide 45
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 46
Ans. Option (a)
Client can read up to the successfully written data block.
Another Hadoop Distributors
Cloudera distributes a platform of open-source projects called
Cloudera's Distribution including Apache Hadoop or CDH.
These include architectural services and technical support for
Hadoop clusters in development or in production.
Another major player in the Hadoop market, Hortonworks
has the largest number of committers and code contributors
for the Hadoop ecosystem components.Provider of expert
technical support, training and partner-enablement services
for both end-user organizations and technology vendors.
The MapR Distribution including Apache Hadoop provides
you with an enterprise-grade distributed data platform to
reliably store and process big data. MapR's Apache Hadoop
distribution claims to provide full data protection, no single
points of failure, improved performance
www.edureka.co/big-data-and-hadoopSlide 47
Another Hadoop Distributors
www.edureka.co/big-data-and-hadoopSlide 48
Further Reading
 Apache Hadoop and HDFS
http://www.edureka.in/blog/introduction-to-apache-hadoop-hdfs/
 Apache Hadoop HDFS Architecture
http://www.edureka.in/blog/apache-hadoop-hdfs-architecture/
www.edureka.co/big-data-and-hadoopSlide 49
A tour of Edureka’s Virtual Machine
www.edureka.co/big-data-and-hadoopSlide 50
Edureka VM
Pre-requisites
Remote Login usYinEgS
Putty Hadoop-2.2.0Condition
NO
YES
Run the commands to
start the daemons
manually
Edureka VM
Split Files
If the file is
downloaded
completely
NO
YES
Proceed with the steps
mentioned in the “Edureka
VM Installation “document
Import Edureka
VM to Virtual Box
Check for
daemons
YES
NO
Refer README.txt from
Edureka VM Desktop
Edureka VM
Installation
www.edureka.co/big-data-and-hadoopSlide 51
Pre-requisites
 Pre-requisites to install Edureka VM
» Minimum 4 GB RAM
» Dual Core Processor or above
If you don’t have the
mentioned hardware
requirements, refer to slide 57
www.edureka.co/big-data-and-hadoopSlide 52
Installing Edureka VM
 With reference to “ Edureka VM Installation ” document present in the LMS, you can install Edureka VM
We suggest you to use
Download Manager
the
while
downloading Edureka VM to avoid
any network issues that may occur.
You can download it from here
for different platforms which is an
open source tool.
www.edureka.co/big-data-and-hadoopSlide 53
Challenges Faced During Installation of VM
 File may not get downloaded completely, due to its huge size, without displaying any error.
After downloading VM always
check for its file size it should
be 4.5 GB
Download Complete Download Incomplete
Proceed with the steps
mentioned in the “Edureka
VM Installation “document Using “Edureka VM Split Files”
document present in LMS, you
can download Edureka VM
www.edureka.co/big-data-and-hadoopSlide 54
If your system does not meet
minimum requirements please refer
to the document “ Remote Login
using Putty Hadoop-3.0” present in
the LMS
 With reference to “ Remote Login Using Putty – Hadoop 2.2.0 ” document present in the LMS, you can
access our Edureka VM Remote Server
Edureka VM Remote Server
www.edureka.co/big-data-and-hadoopSlide 55
Challenges Faced While Importing VM in Virtual Box
 Below is the error you may face while importing the Edureka VM in Virtual Box, if the Edureka VM file has
not been downloaded completely
In case you are using split
files, then before importing
split files in Virtual Box
combine all the downloaded
split files using hjsplit
application.
www.edureka.co/big-data-and-hadoopSlide 56
 If daemons are not up and running, execute below commands

 After executing the above commands check if all daemons are running fine by using:
 After importing VM in Virtual Box, start the VM and check if all the daemons are up and running by using:
Command: sudo jps
Challenges Faced After Importing VM in Virtual Box
Command: sudo service hadoop-master stop
Command: sudo service hadoop-master start
Command: hadoop dfsadmin -safemode leave
www.edureka.co/big-data-and-hadoopSlide 57
Command: sudo jps
 Please refer to the README.txt file present in the Desktop of Edureka VM to know more about it
Edureka VM Desktop
www.edureka.co/big-data-and-hadoopSlide 58
Assignment
Referring the documents present in the LMS under assignment solve the below problem.
How many such DataNodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
www.edureka.co/big-data-and-hadoopSlide 59
Pre-work for next Class
Setup the Hadoop development environment using the documents present in the LMS.
» Setup Edureka’s VirtualMachine
» Execute Linux BasicCommands
» Execute HDFS Hands On commands
Attempt the Module-1 Assignments present in the LMS.
www.edureka.co/big-data-and-hadoopSlide 60
 Hadoop 2.x Cluster Architecture
 Hadoop clustermodes
 Basic Hadoopcommands
 Hadoop 2.x configuration files and its parameters
 Password-less SSH on Hadoopcluster
 Dump of a MapReduceprogram
 Data loadingtechniques
www.edureka.co/big-data-and-hadoopSlide 61
Agenda for Next Class
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/big-data-and-hadoopSlide 62
Survey
Understanding Big Data And Hadoop

Contenu connexe

Tendances

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Hadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationHadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationEdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopGiovanna Roda
 

Tendances (20)

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationHadoop Career Path and Interview Preparation
Hadoop Career Path and Interview Preparation
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

En vedette

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with KafkaEdureka!
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!Edureka!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
Apache spark
Apache spark Apache spark
Apache spark Edureka!
 
Converting Visitors into Leads
Converting Visitors into LeadsConverting Visitors into Leads
Converting Visitors into LeadsEdureka!
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
AWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewAWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewEdureka!
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?Edureka!
 
SEO Techniques
SEO TechniquesSEO Techniques
SEO TechniquesEdureka!
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceEdureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analyticsEdureka!
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaEdureka!
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduceEdureka!
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in REdureka!
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 

En vedette (20)

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Apache spark
Apache spark Apache spark
Apache spark
 
Converting Visitors into Leads
Converting Visitors into LeadsConverting Visitors into Leads
Converting Visitors into Leads
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
AWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewAWS Cloud Essentials - An Overview
AWS Cloud Essentials - An Overview
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
SEO Techniques
SEO TechniquesSEO Techniques
SEO Techniques
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 

Similaire à Understanding Big Data And Hadoop

Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big DataEdureka!
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceEdureka!
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 

Similaire à Understanding Big Data And Hadoop (20)

Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big Data
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Hadoop_Presentation
Hadoop_PresentationHadoop_Presentation
Hadoop_Presentation
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data Science
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 

Plus de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Plus de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Understanding Big Data And Hadoop

  • 1. Module-1 Understanding Big Data And Hadoop www.edureka.co/big-data-and-hadoop
  • 2. Verifiable Certificate Experienced Instructor Live Online Class Survey Feedback 24x7 Support In-class Questions Class Recording in LMS Module Wise Assessment Project Work Android & iOS App www.edureka.co/big-data-and-hadoopSlide 2 How it Works?
  • 3. Objectives www.edureka.co/big-data-and-hadoopSlide 3 At the end of this module, you will be able to  Understand What is Big data  Analyse limitations and solutions of existing Data Analytics Architecture  Understand What is Hadoop and its features  Hadoop Ecosystem  Understand Hadoop 2.x core components  Perform Read and Write in Hadoop  Understand Rack Awareness concept  Work on Edureka’s VM
  • 4.  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization What is Big Data? cloud www.edureka.co/big-data-and-hadoopSlide 4 tools statistics No SQL compression storage support database analyze information terabytes mobile processing Big Data
  • 5. What is Big Data? Amazon handles 15 million customer click stream user data per day to recommend products. www.edureka.co/big-data-and-hadoopSlide 5 Stock market generates about one terabyte of new trade data per day to perform stock trading analytics to determine trends for optimal trades. 294 billion emails sent every day. Services analyse this data to find the spams. Systems / Enterprises generate huge amount of data from Terabytes to Petabytes of information
  • 6. Un-structured Data is Exploding 40,000 EB, or 40 Zettabytes By 2020, IDC (International Data Corporation) predicts the number will have reached (ZB)  The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on Earth. Un-structured Data 7000 6000 5000 4000 3000 2000 1000 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Structured Data Un-structured Data www.edureka.co/big-data-and-hadoopSlide 6
  • 7. IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ IBM’s Definition of Big Data VOLUME Web logs Audios Images Videos Sensor Data VARIETYVELOCITY VERACITY Min Max Mean SD 4.3 7.9 5.84 0.83 2.0 4.4 3.05 0.43 0.1 2.5 1.20 0.76 www.edureka.co/big-data-and-hadoopSlide 7
  • 8. Annie’s Introduction www.edureka.co/big-data-and-hadoopSlide 8 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions.
  • 9. Annie’s Question Map the following to corresponding data type: » XML files, e-mail body » Audio, Video, Images, Archived documents » Data from Enterprise systems (ERP, CRM etc.) www.edureka.co/big-data-and-hadoopSlide 9
  • 10. Annie’s Answer Ans. XML files, e-mail body  Semi-structured data Audio, Video, Image, Files, Archived documents  Unstructured data Data from Enterprise systems (ERP, CRM etc.)  Structured data www.edureka.co/big-data-and-hadoopSlide 10
  • 11. Further Reading More on Big Data http://www.edureka.in/blog/the-hype-behind-big-data/ Why Hadoop? http://www.edureka.in/blog/why-hadoop/ Opportunities in Hadoop http://www.edureka.in/blog/jobs-in-hadoop/ Big Data http://en.wikipedia.org/wiki/Big_Data IBM’s definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ www.edureka.co/big-data-and-hadoopSlide 11
  • 12. Common Big Data Customer Scenarios  Web and e-tailing » Recommendation Engines » Ad Targeting » SearchQuality » Abuse and Click Fraud Detection  Telecommunications » Customer Churn Prevention » Network Performance Optimization » Calling Data Record (CDR) Analysis » Analysing Network to Predict Failure http://wiki.apache.org/hadoop/PoweredBy www.edureka.co/big-data-and-hadoopSlide 12
  • 13.  Government » Fraud Detection and Cyber Security » Welfare Schemes » Justice  Healthcare and Life Sciences » Health Information Exchange » Gene Sequencing » Serialization » Healthcare Service Quality Improvements » Drug Safety Common Big Data Customer Scenarios (Contd.) http://wiki.apache.org/hadoop/PoweredBy www.edureka.co/big-data-and-hadoopSlide 13
  • 14. Common Big Data Customer Scenarios (Contd.)  Banks and Financial services » Modeling True Risk » ThreatAnalysis » Fraud Detection » Trade Surveillance » Credit Scoring and Analysis  Retail » Point of Sales Transaction Analysis » Customer Churn Analysis » Sentiment Analysis http://wiki.apache.org/hadoop/PoweredBy www.edureka.co/big-data-and-hadoopSlide 14
  • 15. Slide 17 www.edureka.co/big-data-and-hadoop Hidden Treasure *Sears was using traditional systems such as Oracle Exadata, Teradata and SAS etc., to store and process the customer activity and sales data.  Insight into data can provide Business Advantage. Case Study: Sears Holding Corporation  Some key early indicators can mean Fortunes to Business.  More Precise Analysis with more data. http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
  • 16. Mostly Append BI Reports + Interactive Apps RDBMS (Aggregated Data) ETL Compute Grid Storage only Grid (Original Raw Data) Collection Inctrumentation A meagre 10% of the ~2PB data is available for BI Storage 2. Moving data to compute doesn’t scale 90% of the ~2PB archived Processing 3. Premature data death 1. Can’t explore original high fidelity raw data Limitations of Existing Data Analytics Architecture www.edureka.co/big-data-and-hadoopSlide 16
  • 17. BI Reports + Interactive Apps RDBMS (Aggregated Data) Hadoop : Storage + Compute Grid Collection Instrumentation Both Storage And Processing Entire ~2PB Data is available for processing No Data Archiving 1. Data Exploration & Advanced analytics 2. Scalable throughput for ETL & aggregation Mostly Append 3. Keep data alive forever *Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions. www.edureka.co/big-data-and-hadoopSlide 17 Solution: A Combined Storage Computer Layer
  • 18. Why DFS? Read 1 TB Data 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine 4 I/O Channels Each Channel – 100 MB/s www.edureka.co/big-data-and-hadoopSlide 18
  • 19. Why DFS? (Contd.) 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine 4 I/O Channels Each Channel – 100 MB/s www.edureka.co/big-data-and-hadoopSlide 19 43 Minutes Read 1 TB Data
  • 20. Why DFS? (Contd.) 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine 4 I/O Channels Each Channel – 100 MB/s www.edureka.co/big-data-and-hadoopSlide 20 4.3 Minutes43 Minutes Read 1 TB Data
  • 21. What is Hadoop?  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage and distributed processing. www.edureka.co/big-data-and-hadoopSlide 21
  • 23. Hadoop Design Principles www.edureka.co/big-data-and-hadoopSlide 23  Facilitate the storage and processing of large and/or rapidly growing data sets » Structured and unstructured data » Simple programming models  Scale-Out rather than Scale-Up  Bring Code to Data rather than data to code  High scalability and availability  Use commodity hardware  Fault-tolerance
  • 24. Hadoop – It’s about Scale and Structure Structured Data Types Multi and Unstructured Limited, No Data Processing Processing Processing coupled with Data Standards & Structured Governance Loosely Structured Required On Write Schema Required On Read Reads are Fast Speed Writes are Fast Software License Cost Support Only Known Entity Resources Growing, Complexities, Wide OLTP Complex ACID Transactions Operational Data Store Best Fit Use Data Discovery Processing Unstructured Data Massive Storage/Processing RDBMS HADOOP www.edureka.co/big-data-and-hadoopSlide 24
  • 25. Annie’s Question Hadoop is a framework that allows for the distributed processing of: » Small Data Sets » Large Data Sets www.edureka.co/big-data-and-hadoopSlide 25
  • 26. Annie’s Answer Ans. Large Data Sets. It is also capable of processing small data-sets. However, to experience the true power of Hadoop, one needs to have data in TB’s. Because this is where RDBMS takes hours and fails whereas Hadoop does the same in couple of minutes. www.edureka.co/big-data-and-hadoopSlide 26
  • 27. Hadoop Ecosystem Pig Latin Data Analysis Hive DWSystem Other YARN Frameworks (MPI,GRAPH) HBaseMapReduce Framework YARN Cluster Resource Management Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Mahout MachineLearning MapReduce Framework HBase Hadoop 1.0 Hadoop 2.0 Sqoop Unstructured or Semi-structured Data Structured Data Flume Sqoop Unstructured or Semi-structured Data Structured Data Flume Mahout Machine Learning www.edureka.co/big-data-and-hadoopSlide 27
  • 28. Machine Learning with Mahout https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout Write intelligent applications using Apache Mahout LinkedIn Recommendations Hadoop and MapReduce magic in action www.edureka.co/big-data-and-hadoopSlide 28
  • 29. Hadoop 2.x Core Components Hadoop 2.x Core Components HDFS YARN Storage Processing DataNode NameNode Resource Manager Node Manager Master Slave Secondary NameNode www.edureka.co/big-data-and-hadoopSlide 29
  • 30. Hadoop 2.x Core Components ( Contd.) DataNode Node Manager DataNode DataNode DataNode YARN HDFS Cluster Resource Manager NameNode Node Manager Node Manager Node Manager www.edureka.co/big-data-and-hadoopSlide 30
  • 31. Main Components of HDFS  NameNode: » Master of the system » Maintains and manages the blocks which are present on the DataNodes  DataNodes: » Slaves which are deployed on each machine and provide the actual storage » Responsible for serving read and write requests for the clients www.edureka.co/big-data-and-hadoopSlide 31
  • 32. NameNode Metadata  Meta-data in Memory » The entire metadata is in main memory » No demand paging of FS meta-data  Types of Metadata » List of files » List of Blocks for each file » List of DataNode for each block » File attributes, e.g. access time, replication factor  A Transaction Log » Records file creations, file deletions etc. NameNode (Stores metadata only) METADATA: /user/doug/hinfo -> 1 3 5 /user/doug/pdetail -> 4 2 www.edureka.co/big-data-and-hadoopSlide 32 NameNode: Keeps track of overall file directory structure and the placement of Data Block
  • 33. File Blocks  By Default, block size is 128mb in Hadoop 2.x and 64mb in Hadoop 1.x  Why block size is large? » The main reason for having the HDFS blocks in large size is to reduce the cost of seek time. » The large block size is to account for proper usage of storage space while considering the limit on the memory of name node. 200mb – emp.dat 64mb – Block1 64mb – Block2 64mb – Block3 8mb – Block4 200mb – abc.txt 128mb – Block 1 72mb – Block 2 Hadoop 2.x www.edureka.co/big-data-and-hadoopSlide 33 Hadoop 1.x
  • 34. Replication and Rack Awareness www.edureka.co/big-data-and-hadoopSlide 34
  • 35. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 35
  • 36. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 36
  • 37. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 37
  • 38. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 38
  • 39. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 39
  • 40. Replication and Rack Awareness (Contd.) www.edureka.co/big-data-and-hadoopSlide 40
  • 42. Client reading file from HDFS www.edureka.co/big-data-and-hadoopSlide 42
  • 43. Annie’s Question In HDFS, blocks of a file are written in parallel, however the replication of the blocks are done sequentially: a. TRUE b. FALSE www.edureka.co/big-data-and-hadoopSlide 43
  • 44. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 44 Ans. True. A file is divided into Blocks, these blocks are written in parallel, but the block replication happen in sequence.
  • 45. Annie’s Question A file of 400MB is being copied to HDFS. The system has finished copying 250MB. What happens if a client tries to access that file: a. Can read up to block that's successfully written. b. Can read up to last bit successfully written. c. Will throw an exception. d. Cannot see that file until its finished copying. www.edureka.co/big-data-and-hadoopSlide 45
  • 46. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 46 Ans. Option (a) Client can read up to the successfully written data block.
  • 47. Another Hadoop Distributors Cloudera distributes a platform of open-source projects called Cloudera's Distribution including Apache Hadoop or CDH. These include architectural services and technical support for Hadoop clusters in development or in production. Another major player in the Hadoop market, Hortonworks has the largest number of committers and code contributors for the Hadoop ecosystem components.Provider of expert technical support, training and partner-enablement services for both end-user organizations and technology vendors. The MapR Distribution including Apache Hadoop provides you with an enterprise-grade distributed data platform to reliably store and process big data. MapR's Apache Hadoop distribution claims to provide full data protection, no single points of failure, improved performance www.edureka.co/big-data-and-hadoopSlide 47
  • 49. Further Reading  Apache Hadoop and HDFS http://www.edureka.in/blog/introduction-to-apache-hadoop-hdfs/  Apache Hadoop HDFS Architecture http://www.edureka.in/blog/apache-hadoop-hdfs-architecture/ www.edureka.co/big-data-and-hadoopSlide 49
  • 50. A tour of Edureka’s Virtual Machine www.edureka.co/big-data-and-hadoopSlide 50
  • 51. Edureka VM Pre-requisites Remote Login usYinEgS Putty Hadoop-2.2.0Condition NO YES Run the commands to start the daemons manually Edureka VM Split Files If the file is downloaded completely NO YES Proceed with the steps mentioned in the “Edureka VM Installation “document Import Edureka VM to Virtual Box Check for daemons YES NO Refer README.txt from Edureka VM Desktop Edureka VM Installation www.edureka.co/big-data-and-hadoopSlide 51
  • 52. Pre-requisites  Pre-requisites to install Edureka VM » Minimum 4 GB RAM » Dual Core Processor or above If you don’t have the mentioned hardware requirements, refer to slide 57 www.edureka.co/big-data-and-hadoopSlide 52
  • 53. Installing Edureka VM  With reference to “ Edureka VM Installation ” document present in the LMS, you can install Edureka VM We suggest you to use Download Manager the while downloading Edureka VM to avoid any network issues that may occur. You can download it from here for different platforms which is an open source tool. www.edureka.co/big-data-and-hadoopSlide 53
  • 54. Challenges Faced During Installation of VM  File may not get downloaded completely, due to its huge size, without displaying any error. After downloading VM always check for its file size it should be 4.5 GB Download Complete Download Incomplete Proceed with the steps mentioned in the “Edureka VM Installation “document Using “Edureka VM Split Files” document present in LMS, you can download Edureka VM www.edureka.co/big-data-and-hadoopSlide 54
  • 55. If your system does not meet minimum requirements please refer to the document “ Remote Login using Putty Hadoop-3.0” present in the LMS  With reference to “ Remote Login Using Putty – Hadoop 2.2.0 ” document present in the LMS, you can access our Edureka VM Remote Server Edureka VM Remote Server www.edureka.co/big-data-and-hadoopSlide 55
  • 56. Challenges Faced While Importing VM in Virtual Box  Below is the error you may face while importing the Edureka VM in Virtual Box, if the Edureka VM file has not been downloaded completely In case you are using split files, then before importing split files in Virtual Box combine all the downloaded split files using hjsplit application. www.edureka.co/big-data-and-hadoopSlide 56
  • 57.  If daemons are not up and running, execute below commands   After executing the above commands check if all daemons are running fine by using:  After importing VM in Virtual Box, start the VM and check if all the daemons are up and running by using: Command: sudo jps Challenges Faced After Importing VM in Virtual Box Command: sudo service hadoop-master stop Command: sudo service hadoop-master start Command: hadoop dfsadmin -safemode leave www.edureka.co/big-data-and-hadoopSlide 57 Command: sudo jps
  • 58.  Please refer to the README.txt file present in the Desktop of Edureka VM to know more about it Edureka VM Desktop www.edureka.co/big-data-and-hadoopSlide 58
  • 59. Assignment Referring the documents present in the LMS under assignment solve the below problem. How many such DataNodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster? www.edureka.co/big-data-and-hadoopSlide 59
  • 60. Pre-work for next Class Setup the Hadoop development environment using the documents present in the LMS. » Setup Edureka’s VirtualMachine » Execute Linux BasicCommands » Execute HDFS Hands On commands Attempt the Module-1 Assignments present in the LMS. www.edureka.co/big-data-and-hadoopSlide 60
  • 61.  Hadoop 2.x Cluster Architecture  Hadoop clustermodes  Basic Hadoopcommands  Hadoop 2.x configuration files and its parameters  Password-less SSH on Hadoopcluster  Dump of a MapReduceprogram  Data loadingtechniques www.edureka.co/big-data-and-hadoopSlide 61 Agenda for Next Class
  • 62. Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better! Please spare few minutes to take the survey after the webinar. www.edureka.co/big-data-and-hadoopSlide 62 Survey