Contenu connexe Similaire à Predicting Consumer Behaviour via Hadoop (20) Predicting Consumer Behaviour via Hadoop1. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Predicting Consumer
Behaviour via Hadoop
2. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
In this session you will understand
ᗍ Big Data and Hadoop
ᗍ HDFS
ᗍ MapReduce with examples and Scenarios
ᗍ Predictive Analytics and its process
ᗍ Three Pillars of Predictive Analytics
ᗍ Applications of Predictive Analytics
3. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management tools
or traditional data processing applications
Systems / Enterprises generate huge amount of data
from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
4. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA….
5. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable
6. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
7. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Data(Sources, Types,
Forms)
Capture Predict
• Data Mining
• Text Mining
• Statistical Analytics
Act
Act on the model
Predictive Analysis
8. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why Predictive Analytics?
ᗍ Predictive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine
learning to make predictions and then suggests decision options to take advantage of a future opportunity
ᗍ The purpose of predictive analytics is to tell you what will happen in the future
ᗍ Predictive Analytics is branch of the Data Mining process
ᗍ An example of using predictive analytics is optimizing customer relationship management systems
9. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Monitor Progress
Implement
Results
Draw Conclusions
Run Analysis
Check the data fits
the tool
Draw Hypothesis
Implement
Results
Extract data
needed
Predictive Analytics – Process
10. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Three Pillars of Predictive Analytics
Predictive Operational Analytics
ᗍ Plan
ᗍ Manage
ᗍ Maximize
Predictive Threat and Fraud
Analytics
ᗍ Monitor
ᗍ Detect
ᗍ Control
Predictive Customer Analytics
ᗍ Acquire
ᗍ Grow
ᗍ Retain
11. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Most Common Predictive Modelling Tasks
ᗍ Classification
ᗍ Clustering
ᗍ Association
ᗍ Detection
ᗍ Estimation and Time Series
ᗍ Link Analysis
ᗍ Web and Text Mining
12. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Applications of Predictive Analytics
ᗍ Analytical customer relationship management (CRM)
ᗍ Clinical decision support systems
ᗍ Customer retention
ᗍ Direct marketing
ᗍ Fraud detection
ᗍ Risk management
ᗍ Underwriting
13. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Predictive Analytics all about?
Predictive analytics is really about solving problems with data
Predictive Analytics is the technology that learns from experience(data) to predict the future behaviour of
individuals in order to drive better decisions
Predictive Analytics helps to connect data to effective action by drawing reliable conclusions about current
conditions and future events
Enables businesses to use predictive models to exploit patterns found in historical data to identify potential risks
and opportunities before they occur
14. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce – Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop
Suppose, you are the handling
a project which has x tasks and
takes 100 hours for one
resource to complete
1 x 100 = 100 hours
100/10(resources) = 10 hours
15. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Similarly,
= 100 hours 100/10 = 10 hours
Map Reduce – Scenario
16. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
More Scenarios on Map-Reduce
Problem Statement:
Find maximum stock market levels recorded in a span of 5 years
Problem Statement:
De-identify personal identifier information
17. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
Very
Big
Data
All
matches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
18. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
Very
Big
Input
Split Data
All
matches
:
Split Data
Split Data
Split Data
M
A
P
R
E
D
U
C
E
MapReduce Framework
19. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
Two biggest advantages:
ᗍ Takes processing to the data
ᗍ Allows processing data in parallel
a b
c
Map Task
HDFS Block
Data Center
Rack
Node
20. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Flow
1. Input data is present in data nodes
2. Map tasks = Input Splits
3. Mappers produce intermediate data
4. Data exchanged among nodes in “shuffling”
5. All data of same key goes to same reducer
6. Reducer output stored at output location
Node 1
INPUT DATA
Map
Node 2
Map
Node 1
Reduce
Node 1
Reduce
21. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
22. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and Data
Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase
Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up Discussion
23. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime
access to
Course
Content via
LMS
100%
Placement
Assistance
24x7 Support
24. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
25. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact us..
26. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Images Credits: Google, Facebook and LinkedIn LOGO and Snapshots
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010
Notes de l'éditeur SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology.
This industry expert designs, develops, and delivers the course.
SkillSpeed provides you:
Course Curriculum from Industry Experts
Instructor Led Live Virtual Sessions
Real life industry case studies
- Live Virtual Interactions Interaction with industry experts
- Lifetime access to all course content via the LMS
- 24*7 support
- 100% placement assistance