SlideShare une entreprise Scribd logo
1  sur  27
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Predicting Consumer
Behaviour via Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
In this session you will understand
ᗍ Big Data and Hadoop
ᗍ HDFS
ᗍ MapReduce with examples and Scenarios
ᗍ Predictive Analytics and its process
ᗍ Three Pillars of Predictive Analytics
ᗍ Applications of Predictive Analytics
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management tools
or traditional data processing applications
Systems / Enterprises generate huge amount of data
from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA….
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Data(Sources, Types,
Forms)
Capture Predict
• Data Mining
• Text Mining
• Statistical Analytics
Act
Act on the model
Predictive Analysis
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why Predictive Analytics?
ᗍ Predictive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine
learning to make predictions and then suggests decision options to take advantage of a future opportunity
ᗍ The purpose of predictive analytics is to tell you what will happen in the future
ᗍ Predictive Analytics is branch of the Data Mining process
ᗍ An example of using predictive analytics is optimizing customer relationship management systems
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Monitor Progress
Implement
Results
Draw Conclusions
Run Analysis
Check the data fits
the tool
Draw Hypothesis
Implement
Results
Extract data
needed
Predictive Analytics – Process
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Three Pillars of Predictive Analytics
Predictive Operational Analytics
ᗍ Plan
ᗍ Manage
ᗍ Maximize
Predictive Threat and Fraud
Analytics
ᗍ Monitor
ᗍ Detect
ᗍ Control
Predictive Customer Analytics
ᗍ Acquire
ᗍ Grow
ᗍ Retain
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Most Common Predictive Modelling Tasks
ᗍ Classification
ᗍ Clustering
ᗍ Association
ᗍ Detection
ᗍ Estimation and Time Series
ᗍ Link Analysis
ᗍ Web and Text Mining
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Applications of Predictive Analytics
ᗍ Analytical customer relationship management (CRM)
ᗍ Clinical decision support systems
ᗍ Customer retention
ᗍ Direct marketing
ᗍ Fraud detection
ᗍ Risk management
ᗍ Underwriting
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Predictive Analytics all about?
Predictive analytics is really about solving problems with data
Predictive Analytics is the technology that learns from experience(data) to predict the future behaviour of
individuals in order to drive better decisions
Predictive Analytics helps to connect data to effective action by drawing reliable conclusions about current
conditions and future events
Enables businesses to use predictive models to exploit patterns found in historical data to identify potential risks
and opportunities before they occur
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce – Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop
Suppose, you are the handling
a project which has x tasks and
takes 100 hours for one
resource to complete
1 x 100 = 100 hours
100/10(resources) = 10 hours
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Similarly,
= 100 hours 100/10 = 10 hours
Map Reduce – Scenario
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
More Scenarios on Map-Reduce
Problem Statement:
Find maximum stock market levels recorded in a span of 5 years
Problem Statement:
De-identify personal identifier information
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
Very
Big
Data
All
matches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
Very
Big
Input
Split Data
All
matches
:
Split Data
Split Data
Split Data
M
A
P
R
E
D
U
C
E
MapReduce Framework
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
Two biggest advantages:
ᗍ Takes processing to the data
ᗍ Allows processing data in parallel
a b
c
Map Task
HDFS Block
Data Center
Rack
Node
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Flow
1. Input data is present in data nodes
2. Map tasks = Input Splits
3. Mappers produce intermediate data
4. Data exchanged among nodes in “shuffling”
5. All data of same key goes to same reducer
6. Reducer output stored at output location
Node 1
INPUT DATA
Map
Node 2
Map
Node 1
Reduce
Node 1
Reduce
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and Data
Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase
Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up Discussion
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime
access to
Course
Content via
LMS
100%
Placement
Assistance
24x7 Support
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact us..
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Images Credits: Google, Facebook and LinkedIn LOGO and Snapshots
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010
Predicting Consumer Behaviour via Hadoop

Contenu connexe

Tendances

IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
DataWorks Summit
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
Dataiku
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
Lviv Startup Club
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit
 

Tendances (20)

Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning  to Create a Decision Engine, CluedInUsing Neo4j and Machine Learning  to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
Hadoop for Business Intelligence Professionals
Hadoop for Business Intelligence ProfessionalsHadoop for Business Intelligence Professionals
Hadoop for Business Intelligence Professionals
 
MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
MetaScale Case Study: Hadoop Extends DataStage ETL CapacityMetaScale Case Study: Hadoop Extends DataStage ETL Capacity
MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
Yaroslav Ravlinko “Evolution of Data Processing platform from Hadoop to nowad...
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
10 Step Guide to Analytics
10 Step Guide to Analytics10 Step Guide to Analytics
10 Step Guide to Analytics
 

En vedette

Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Revolution Analytics
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
Mohammad Mustaqeem
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 

En vedette (20)

Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduce
 
Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?
 
Population Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text AnalyticsPopulation Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text Analytics
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Hadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce DetailsHadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce Details
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 

Similaire à Predicting Consumer Behaviour via Hadoop

Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
Skillspeed
 

Similaire à Predicting Consumer Behaviour via Hadoop (20)

Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
The Journey to Big Data Analytics
The Journey to Big Data AnalyticsThe Journey to Big Data Analytics
The Journey to Big Data Analytics
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 

Plus de Skillspeed

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
Skillspeed
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
Skillspeed
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 

Plus de Skillspeed (6)

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
 
Skillspeed Affiliate Program
Skillspeed Affiliate ProgramSkillspeed Affiliate Program
Skillspeed Affiliate Program
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
BIG Data & Hadoop Applications in Retail
BIG Data & Hadoop Applications in RetailBIG Data & Hadoop Applications in Retail
BIG Data & Hadoop Applications in Retail
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Predicting Consumer Behaviour via Hadoop

  • 1. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Predicting Consumer Behaviour via Hadoop
  • 2. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Session Objectives In this session you will understand ᗍ Big Data and Hadoop ᗍ HDFS ᗍ MapReduce with examples and Scenarios ᗍ Predictive Analytics and its process ᗍ Three Pillars of Predictive Analytics ᗍ Applications of Predictive Analytics
  • 3. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data……
  • 4. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA….
  • 5. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable
  • 6. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management
  • 7. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Data(Sources, Types, Forms) Capture Predict • Data Mining • Text Mining • Statistical Analytics Act Act on the model Predictive Analysis
  • 8. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why Predictive Analytics? ᗍ Predictive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine learning to make predictions and then suggests decision options to take advantage of a future opportunity ᗍ The purpose of predictive analytics is to tell you what will happen in the future ᗍ Predictive Analytics is branch of the Data Mining process ᗍ An example of using predictive analytics is optimizing customer relationship management systems
  • 9. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Monitor Progress Implement Results Draw Conclusions Run Analysis Check the data fits the tool Draw Hypothesis Implement Results Extract data needed Predictive Analytics – Process
  • 10. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Three Pillars of Predictive Analytics Predictive Operational Analytics ᗍ Plan ᗍ Manage ᗍ Maximize Predictive Threat and Fraud Analytics ᗍ Monitor ᗍ Detect ᗍ Control Predictive Customer Analytics ᗍ Acquire ᗍ Grow ᗍ Retain
  • 11. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Most Common Predictive Modelling Tasks ᗍ Classification ᗍ Clustering ᗍ Association ᗍ Detection ᗍ Estimation and Time Series ᗍ Link Analysis ᗍ Web and Text Mining
  • 12. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Applications of Predictive Analytics ᗍ Analytical customer relationship management (CRM) ᗍ Clinical decision support systems ᗍ Customer retention ᗍ Direct marketing ᗍ Fraud detection ᗍ Risk management ᗍ Underwriting
  • 13. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Predictive Analytics all about? Predictive analytics is really about solving problems with data Predictive Analytics is the technology that learns from experience(data) to predict the future behaviour of individuals in order to drive better decisions Predictive Analytics helps to connect data to effective action by drawing reliable conclusions about current conditions and future events Enables businesses to use predictive models to exploit patterns found in historical data to identify potential risks and opportunities before they occur
  • 14. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce – Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop Suppose, you are the handling a project which has x tasks and takes 100 hours for one resource to complete 1 x 100 = 100 hours 100/10(resources) = 10 hours
  • 15. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Similarly, = 100 hours 100/10 = 10 hours Map Reduce – Scenario
  • 16. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com More Scenarios on Map-Reduce Problem Statement: Find maximum stock market levels recorded in a span of 5 years Problem Statement: De-identify personal identifier information
  • 17. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Traditional Solution matchesSplit Data Very Big Data All matches grep grep grep cat grep : matches matches matches Split Data Split Data Split Data
  • 18. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Solution Very Big Input Split Data All matches : Split Data Split Data Split Data M A P R E D U C E MapReduce Framework
  • 19. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Advantages Two biggest advantages: ᗍ Takes processing to the data ᗍ Allows processing data in parallel a b c Map Task HDFS Block Data Center Rack Node
  • 20. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Flow 1. Input data is present in data nodes 2. Map tasks = Input Splits 3. Mappers produce intermediate data 4. Data exchanged among nodes in “shuffling” 5. All data of same key goes to same reducer 6. Reducer output stored at output location Node 1 INPUT DATA Map Node 2 Map Node 1 Reduce Node 1 Reduce
  • 21. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends – Hadoop
  • 22. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion
  • 23. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support
  • 24. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners
  • 25. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact us..
  • 26. Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Images Credits: Google, Facebook and LinkedIn LOGO and Snapshots http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010

Notes de l'éditeur

  1. SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology. This industry expert designs, develops, and delivers the course. SkillSpeed provides you: Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Real life industry case studies  - Live Virtual Interactions Interaction with industry experts  - Lifetime access to all course content via the LMS   - 24*7 support   - 100% placement assistance