SlideShare a Scribd company logo
1 of 20
Download to read offline
Retail Analytics
Online Retail Big Data Landscape
Product Recommendation
Problem: Product Recommendation
Type of Data Source of Data
Product Information Product Catalogue
Customer Information Customer Data (demographic)
Customer Purchase History Transactional Data (RDBMS and HDFS)
User Activity Tracking Data Third Party Hybrid Cloud Data - Heat map, Click data,
Demographic (Heat map tool: crazyegg, google
analytics etc.); Real Time Bidding (RTB) Ad inventory
data (e.g. Deltax)
User activity log – on premise data e.g. browser
cookie, local storage data
Social Network Activity (Analytics Data about
products, likes, usage, share, no. of participants etc.)
Social Network (Facebook, Twitter, Google+,
Instagram)
Is it a Big Data Problem: Product
Recommendation
Remarks
Volume Yes As per the capacity planning, total data generated during 7 years is
5932 TB.
Velocity Yes Speed of data generation and analysis of Transactional and Social
Network activities; speed of capturing browser cookie data and
structured/unstructured data generation and analysis
Variety Yes Use of incompatible and non-integrated data from heterogeneous
sources such as customer purchase data, activity logs, social network
ML Approach: Product Recommendation
Machine Learning Problem Reasoning
Unsupervised The problem states that we need to derive product recommendation based on
observed similarity among customer data.
Clustering We will cluster similar customer attributes (browsing patterns, purchase history,
demographic information, and behavioral data) based on observed data set . [K-
means clustering]
Recommendation Within a cluster, we will use user-based collaborative filtering as
recommendation will be driven by customer attributes. [User-based
Collaborative Filtering]
Big Data Components: Product Recommendation
Remarks
Hadoop
Distributed File
System (HDFS) -
Primary
Will use to store structured, semi-structured data (activity log,
purchase history, user information, social analytics data etc.) in raw
format
Sqoop Bringing transactional data to hdfs and vice versa
Flume Collecting, aggregating and moving large amount of user activity log
data
Chukwa Get the log data generated from primary HDFS to another HDFS to
analyze
Pig/Hive Help to write Map reduce scripts to get data in key-value structured
format
Mahout/R-
Hadoop
To get product recommendation we can use Mahout’s core algorithm
for clustering, classification and batch based collaborative filtering are
implemented
Zookeeper To monitor some common services like namespaces, configuration
management, synchronization of data and services among
namenodes & datanodes in Hadoop
Demand Analysis and Forecasting
Problem: Demand Analysis and Forecasting
for existing product line
Type of Data Source of Data
Product Information Product Catalogue
Customer Information Customer Data (demographic)
Product Purchase Information, inventory life time,
wish list, product sales volume
Transaction Data (RDBMS and HDFS)
Social Network Activity (Analytics data about
products, likes, usage, share, no. of participants etc.)
Social Network (Facebook, Twitter, Google+,
Instagram)
Is it Big Data Problem: Demand Analysis and
Forecasting for existing product line
Remarks
Volume Yes As per the capacity planning, total data generated during 7 years is
5932 TB.
Velocity Yes Speed of data generation and analysis of Transactional and Social
Network activities. speed of capturing product inventory life time,
point of sale (pos), sales volume and structured/unstructured data
generation and analysis
Variety Yes Use of incompatible and non-integrated data from heterogeneous
sources such as customer purchase data, activity logs, social network
ML Approach: Demand Analysis and
Forecasting for existing product line
Machine Learning Problem Reasoning
Supervised Our target is to determine the demand of merchandise in the future
Prediction We are predicting the demand of merchandise in the future.
Regression Based on observed data set, we are trying to predict the demand in the future.
We are doing it by establishing correlation between the data set and the
outcome. [ Linear Regression Tree]
Time Series We are trying to establish a continuous time interval pattern of merchandise
demand based on correlation between demand and observed data set. [ARIMA
parametric time series modeling]
Big Data Components: Demand Analysis and
Forecasting for existing product line
Remarks
Hadoop Distributed File System
(HDFS) - Primary
Will use to store structured, semi-structured data (purchase history, product
inventory lifetime, wish list, user information, social analytics data etc.) in raw
format
Sqoop Bringing transactional data, product inventory lifetime, pos, wishlist, etc. to hdfs
and vice versa
Flume Collecting, aggregating and moving large amount of product activity log as well as
purchase log information
Chukwa Get the log data generated from primary HDFS to another HDFS to analyze
Pig/Hive Help to write Map reduce scripts to get data in key-value structured format
Mahout/R-Hadoop Time series data consisting of four components - trend, season, cycle and noise.
Need to estimate the trend and seasonal component (Ex:- day of week/month in a
year ), for any specific region or location etc. from the data and use these to forecast
future. ML packages allows for forecasting which are quick and effective in
collaboration.
Zookeeper To monitor some common services like namespaces, configuration management,
synchronization of data and services among namenodes & datanodes in Hadoop
Customer Churn
Problem: Customer Churn
Type of Data Source of Data
Customer Purchase History Transaction Database
Customer complaints (rating, sentiment score etc.) Complain data (NoSQL, e,g. – Mongodb)
User Activity (Page navigation, Product Catalogue visit) Heat map, Click data, Navigation data, Demographic
(Heat map tool: crazyegg, google analytics etc); Real
Time Bidding (RTB) Ad inventory data (e.g. Deltax)
User Activity (E.g., Wish List, Abandoned Kart) User Activity Logs
Comparative Product Analysis (Reviews, Price, Product
Description etc.)
Thrid Party Vnedor data e.g. Compareraja.in,
compare.buy.hatke.com
Customer Sentiment score Aggregated data from different Social Networks
(Facebook, Twitter, Google+, Instagram)
Customer Loyalty Transaction Database, User Activity Logs
Is it a Big Data Problem: Customer Churn?
Remarks
Volume Yes As per the capacity planning, total data generated during 7 years is
5932 TB.
Velocity Yes Speed of data generation and analysis of Transactional, Sentimental
and Social Network activities
Variety Yes Use of incompatible and non-integrated data from heterogeneous
sources such as customer purchase data, activity logs, social network
Problem: Customer Churn
Machine Learning Model Reasoning
Supervised Our target is to determine whether a customer will churn or not.
Classification Problem states that whether customer will churn or not. It asks for a
categorical outcome.
Binary Problem states that whether customer will churn or not. [Decision
Tree]
Unbiased Problem states that whether customer will churn or not. The initial
probability of customer churn is equally positive and negative. Hence,
it is under unbiased model. [C5.0]
Big Data Components: Customer Churn
Remarks
Hadoop Distributed File System
(HDFS) - Primary
Will use to store structured, semi-structured data (purchase history, activity log,
competitive analysis data, aggregated social data, RTB data etc.) in raw format
Sqoop Bringing transactional data, real time wish list, kart information to HDFS and vice
versa
Flume Collecting, aggregating and moving large amount of product activity log as well as
purchase log information
Chukwa Get the log data generated from primary HDFS to another HDFS to analyze
Pig/Hive Help to write Map reduce scripts to get data in key-value structured format
Mahout/R-Hadoop To predict customer churn we can use Decision Tree / C5.0 algorithm
NLP Toolkit (nltk.org)/IBM
Watson
Can use to parse customer feedback, comments about products to find out
sentimental scoring/insight analysis data and then fed the output to Hadoop
Zookeeper To monitor some common services like namespaces, configuration management,
synchronization of data and services among name nodes & data nodes in Hadoop
Product & Service OfferingsCustomer Profile Customer feedback/Social MediaAccount Transactions Customer Service Logs &
Surveys
Marketing Campaigns
Hadoop cluster
HDFS
Big Data Infrastructure Visualization
Analytics Systems
NLP Data Processing
Assumptions
Type M(Millions) /MB (Mega
byte
Reference
Baseline Assumptions No of Online Customers
Our Market Share
No of Products
100 M
25 M
12M
http://goo.gl/hHb66n
Assume 25% Share
Problem Space Assumptions Customer ‘s Growth Rate
Growth Rate of Product
Avg Monthly Transactions
Avg Monthly Complaints
40%
15%
9 M
0.12 M
http://goo.gl/pm9ydJ
Avg
http://tinyurl.com/gw9dm43
Assume 0.01%
Data/Infra-structure Avg Customer info size
Avg Complaint info size
Avg Data Node RAM size
Replica Factor
Data Block size
1 MB
0.5MB
8GB
3
128 MB
Capacity Planning
Problem
Product Recommendation No. of Data Nodes 23713
RAM Capacity 2145 GB
Demand Forecasting No. of Data Nodes 23713
RAM Capacity 2145 GB
Customer Churn No. of Data Nodes 23724
RAM Capacity 2147 GB
Detailed Planning:
Microsoft Excel
Worksheet

More Related Content

What's hot

Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?Seval Çapraz
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Peter Gfader
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 

What's hot (20)

Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
02 Data Mining
02 Data Mining02 Data Mining
02 Data Mining
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
03 data mining : data warehouse
03 data mining : data warehouse03 data mining : data warehouse
03 data mining : data warehouse
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
Data Mining
Data MiningData Mining
Data Mining
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Data mining
Data miningData mining
Data mining
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 

Similar to Online retail a look at data consulting approach

Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfJayanthSram
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...WebExpo
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 

Similar to Online retail a look at data consulting approach (20)

No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Essbase intro
Essbase introEssbase intro
Essbase intro
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Infographics and big data
Infographics and big dataInfographics and big data
Infographics and big data
 

Recently uploaded

The 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyThe 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyKatherineBishop4
 
Supermarket Floral Ad Roundup- Week 17 2024.pdf
Supermarket Floral Ad Roundup- Week 17 2024.pdfSupermarket Floral Ad Roundup- Week 17 2024.pdf
Supermarket Floral Ad Roundup- Week 17 2024.pdfKarliNelson4
 
The 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyThe 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyTinuiti
 
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779Best VIP Call Girls Noida Sector 51 Call Me: 8448380779
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779Best VIP Call Girls Noida Sector 55 Call Me: 8448380779
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779Delhi Call girls
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka
call Now 9811711561 Cash Payment乂 Call Girls in Dwarkacall Now 9811711561 Cash Payment乂 Call Girls in Dwarka
call Now 9811711561 Cash Payment乂 Call Girls in Dwarkavikas rana
 
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000Sapana Sha
 
Dubai Call Girls O525547&19 (Asii) Call Girls Dubai
Dubai Call Girls O525547&19 (Asii) Call Girls DubaiDubai Call Girls O525547&19 (Asii) Call Girls Dubai
Dubai Call Girls O525547&19 (Asii) Call Girls Dubaikojalkojal131
 
Film= Dubai Call Girls O525547819 Call Girls Dubai Whsatapp
Film= Dubai Call Girls O525547819 Call Girls Dubai WhsatappFilm= Dubai Call Girls O525547819 Call Girls Dubai Whsatapp
Film= Dubai Call Girls O525547819 Call Girls Dubai Whsatappkojalkojal131
 
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779Best VIP Call Girls Noida Sector 50 Call Me: 8448380779
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779Delhi Call girls
 
Top Rated Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated  Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...Top Rated  Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...Call Girls in Nagpur High Profile
 
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call Girl
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call GirlIndian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call Girl
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call GirlAroojKhan71
 

Recently uploaded (12)

The 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyThe 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing Study
 
Supermarket Floral Ad Roundup- Week 17 2024.pdf
Supermarket Floral Ad Roundup- Week 17 2024.pdfSupermarket Floral Ad Roundup- Week 17 2024.pdf
Supermarket Floral Ad Roundup- Week 17 2024.pdf
 
The 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing StudyThe 15 Minute Breakdown: 2024 Beauty Marketing Study
The 15 Minute Breakdown: 2024 Beauty Marketing Study
 
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779Best VIP Call Girls Noida Sector 51 Call Me: 8448380779
Best VIP Call Girls Noida Sector 51 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779Best VIP Call Girls Noida Sector 55 Call Me: 8448380779
Best VIP Call Girls Noida Sector 55 Call Me: 8448380779
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka
call Now 9811711561 Cash Payment乂 Call Girls in Dwarkacall Now 9811711561 Cash Payment乂 Call Girls in Dwarka
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka
 
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000
Call Girls In Dev kunj Delhi 9654467111 Short 1500 Night 6000
 
Dubai Call Girls O525547&19 (Asii) Call Girls Dubai
Dubai Call Girls O525547&19 (Asii) Call Girls DubaiDubai Call Girls O525547&19 (Asii) Call Girls Dubai
Dubai Call Girls O525547&19 (Asii) Call Girls Dubai
 
Film= Dubai Call Girls O525547819 Call Girls Dubai Whsatapp
Film= Dubai Call Girls O525547819 Call Girls Dubai WhsatappFilm= Dubai Call Girls O525547819 Call Girls Dubai Whsatapp
Film= Dubai Call Girls O525547819 Call Girls Dubai Whsatapp
 
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779Best VIP Call Girls Noida Sector 50 Call Me: 8448380779
Best VIP Call Girls Noida Sector 50 Call Me: 8448380779
 
Top Rated Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated  Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...Top Rated  Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated Pune Call Girls Talegaon Dabhade ⟟ 6297143586 ⟟ Call Me For Genuin...
 
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call Girl
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call GirlIndian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call Girl
Indian Call Girl In Dubai #$# O5634O3O18 #$# Dubai Call Girl
 

Online retail a look at data consulting approach

  • 2. Online Retail Big Data Landscape
  • 4. Problem: Product Recommendation Type of Data Source of Data Product Information Product Catalogue Customer Information Customer Data (demographic) Customer Purchase History Transactional Data (RDBMS and HDFS) User Activity Tracking Data Third Party Hybrid Cloud Data - Heat map, Click data, Demographic (Heat map tool: crazyegg, google analytics etc.); Real Time Bidding (RTB) Ad inventory data (e.g. Deltax) User activity log – on premise data e.g. browser cookie, local storage data Social Network Activity (Analytics Data about products, likes, usage, share, no. of participants etc.) Social Network (Facebook, Twitter, Google+, Instagram)
  • 5. Is it a Big Data Problem: Product Recommendation Remarks Volume Yes As per the capacity planning, total data generated during 7 years is 5932 TB. Velocity Yes Speed of data generation and analysis of Transactional and Social Network activities; speed of capturing browser cookie data and structured/unstructured data generation and analysis Variety Yes Use of incompatible and non-integrated data from heterogeneous sources such as customer purchase data, activity logs, social network
  • 6. ML Approach: Product Recommendation Machine Learning Problem Reasoning Unsupervised The problem states that we need to derive product recommendation based on observed similarity among customer data. Clustering We will cluster similar customer attributes (browsing patterns, purchase history, demographic information, and behavioral data) based on observed data set . [K- means clustering] Recommendation Within a cluster, we will use user-based collaborative filtering as recommendation will be driven by customer attributes. [User-based Collaborative Filtering]
  • 7. Big Data Components: Product Recommendation Remarks Hadoop Distributed File System (HDFS) - Primary Will use to store structured, semi-structured data (activity log, purchase history, user information, social analytics data etc.) in raw format Sqoop Bringing transactional data to hdfs and vice versa Flume Collecting, aggregating and moving large amount of user activity log data Chukwa Get the log data generated from primary HDFS to another HDFS to analyze Pig/Hive Help to write Map reduce scripts to get data in key-value structured format Mahout/R- Hadoop To get product recommendation we can use Mahout’s core algorithm for clustering, classification and batch based collaborative filtering are implemented Zookeeper To monitor some common services like namespaces, configuration management, synchronization of data and services among namenodes & datanodes in Hadoop
  • 8. Demand Analysis and Forecasting
  • 9. Problem: Demand Analysis and Forecasting for existing product line Type of Data Source of Data Product Information Product Catalogue Customer Information Customer Data (demographic) Product Purchase Information, inventory life time, wish list, product sales volume Transaction Data (RDBMS and HDFS) Social Network Activity (Analytics data about products, likes, usage, share, no. of participants etc.) Social Network (Facebook, Twitter, Google+, Instagram)
  • 10. Is it Big Data Problem: Demand Analysis and Forecasting for existing product line Remarks Volume Yes As per the capacity planning, total data generated during 7 years is 5932 TB. Velocity Yes Speed of data generation and analysis of Transactional and Social Network activities. speed of capturing product inventory life time, point of sale (pos), sales volume and structured/unstructured data generation and analysis Variety Yes Use of incompatible and non-integrated data from heterogeneous sources such as customer purchase data, activity logs, social network
  • 11. ML Approach: Demand Analysis and Forecasting for existing product line Machine Learning Problem Reasoning Supervised Our target is to determine the demand of merchandise in the future Prediction We are predicting the demand of merchandise in the future. Regression Based on observed data set, we are trying to predict the demand in the future. We are doing it by establishing correlation between the data set and the outcome. [ Linear Regression Tree] Time Series We are trying to establish a continuous time interval pattern of merchandise demand based on correlation between demand and observed data set. [ARIMA parametric time series modeling]
  • 12. Big Data Components: Demand Analysis and Forecasting for existing product line Remarks Hadoop Distributed File System (HDFS) - Primary Will use to store structured, semi-structured data (purchase history, product inventory lifetime, wish list, user information, social analytics data etc.) in raw format Sqoop Bringing transactional data, product inventory lifetime, pos, wishlist, etc. to hdfs and vice versa Flume Collecting, aggregating and moving large amount of product activity log as well as purchase log information Chukwa Get the log data generated from primary HDFS to another HDFS to analyze Pig/Hive Help to write Map reduce scripts to get data in key-value structured format Mahout/R-Hadoop Time series data consisting of four components - trend, season, cycle and noise. Need to estimate the trend and seasonal component (Ex:- day of week/month in a year ), for any specific region or location etc. from the data and use these to forecast future. ML packages allows for forecasting which are quick and effective in collaboration. Zookeeper To monitor some common services like namespaces, configuration management, synchronization of data and services among namenodes & datanodes in Hadoop
  • 14. Problem: Customer Churn Type of Data Source of Data Customer Purchase History Transaction Database Customer complaints (rating, sentiment score etc.) Complain data (NoSQL, e,g. – Mongodb) User Activity (Page navigation, Product Catalogue visit) Heat map, Click data, Navigation data, Demographic (Heat map tool: crazyegg, google analytics etc); Real Time Bidding (RTB) Ad inventory data (e.g. Deltax) User Activity (E.g., Wish List, Abandoned Kart) User Activity Logs Comparative Product Analysis (Reviews, Price, Product Description etc.) Thrid Party Vnedor data e.g. Compareraja.in, compare.buy.hatke.com Customer Sentiment score Aggregated data from different Social Networks (Facebook, Twitter, Google+, Instagram) Customer Loyalty Transaction Database, User Activity Logs
  • 15. Is it a Big Data Problem: Customer Churn? Remarks Volume Yes As per the capacity planning, total data generated during 7 years is 5932 TB. Velocity Yes Speed of data generation and analysis of Transactional, Sentimental and Social Network activities Variety Yes Use of incompatible and non-integrated data from heterogeneous sources such as customer purchase data, activity logs, social network
  • 16. Problem: Customer Churn Machine Learning Model Reasoning Supervised Our target is to determine whether a customer will churn or not. Classification Problem states that whether customer will churn or not. It asks for a categorical outcome. Binary Problem states that whether customer will churn or not. [Decision Tree] Unbiased Problem states that whether customer will churn or not. The initial probability of customer churn is equally positive and negative. Hence, it is under unbiased model. [C5.0]
  • 17. Big Data Components: Customer Churn Remarks Hadoop Distributed File System (HDFS) - Primary Will use to store structured, semi-structured data (purchase history, activity log, competitive analysis data, aggregated social data, RTB data etc.) in raw format Sqoop Bringing transactional data, real time wish list, kart information to HDFS and vice versa Flume Collecting, aggregating and moving large amount of product activity log as well as purchase log information Chukwa Get the log data generated from primary HDFS to another HDFS to analyze Pig/Hive Help to write Map reduce scripts to get data in key-value structured format Mahout/R-Hadoop To predict customer churn we can use Decision Tree / C5.0 algorithm NLP Toolkit (nltk.org)/IBM Watson Can use to parse customer feedback, comments about products to find out sentimental scoring/insight analysis data and then fed the output to Hadoop Zookeeper To monitor some common services like namespaces, configuration management, synchronization of data and services among name nodes & data nodes in Hadoop
  • 18. Product & Service OfferingsCustomer Profile Customer feedback/Social MediaAccount Transactions Customer Service Logs & Surveys Marketing Campaigns Hadoop cluster HDFS Big Data Infrastructure Visualization Analytics Systems NLP Data Processing
  • 19. Assumptions Type M(Millions) /MB (Mega byte Reference Baseline Assumptions No of Online Customers Our Market Share No of Products 100 M 25 M 12M http://goo.gl/hHb66n Assume 25% Share Problem Space Assumptions Customer ‘s Growth Rate Growth Rate of Product Avg Monthly Transactions Avg Monthly Complaints 40% 15% 9 M 0.12 M http://goo.gl/pm9ydJ Avg http://tinyurl.com/gw9dm43 Assume 0.01% Data/Infra-structure Avg Customer info size Avg Complaint info size Avg Data Node RAM size Replica Factor Data Block size 1 MB 0.5MB 8GB 3 128 MB
  • 20. Capacity Planning Problem Product Recommendation No. of Data Nodes 23713 RAM Capacity 2145 GB Demand Forecasting No. of Data Nodes 23713 RAM Capacity 2145 GB Customer Churn No. of Data Nodes 23724 RAM Capacity 2147 GB Detailed Planning: Microsoft Excel Worksheet