SlideShare une entreprise Scribd logo
1  sur  44
CAPSTONE PRESENTATION ON
“PURCHASE PREDICTION ON
BLACK FRIDAY”
Submitted towards partial fulfilment of the criteria
for award of PGP-DSE by GLIM
Submitted By
Group No. 8 [Batch: 2018-19]
Group Members
Arjun Thumbayil – DSEFTCJUL18006
Sahil Bansal - DSEFTCJUL18014
Shahrukh Buland Iqbal – DSEFTCJUL18042
Research Supervisor
P V Subramanian
Contents
Introduction
• Background
• Objective
• Motivation
Dataset
• Collection
• Description
• Pre-procession
• Exploratory Data
Analysis
• Statistical
Analysis
Feature
Engineering
• Data Conversion
• Discretization
• Polychotomization
• Response/Target
Transformation
• Feature Creation
Modeling
• Model Selection
• Model
Development
• Model Evaluation
• Model
Optimization
• Model in
Production
Statistical
Learning
• Residual
Analysis
Results
Future
Scope
• Model
Deployment
Background
• The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start
to the holiday shopping season.
• It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure,
competition and urgency around getting great deals.
Background
• The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start
to the holiday shopping season.
• It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure,
competition and urgency around getting great deals.
• Although Cyber Monday is gaining popularity, Black Friday shopping continues to be popular
because of an abundance of doorbuster deals, instant gratification, and the benefit of social
shopping.
Objective
• Predicting Purchase
• Build a simple Machine Learning model that can predict how much a
Customer is likely to spend on the eve of Black Friday.
• Pattern Recognition
• Reveal and Understand the most important factors from predictors such as
Age, Gender, City of Residence etc., that influence the spending of a
Customer.
• Establish a quantitative impact of the revealed factors and how they influence
Purchase by a Customer on a personal level i.e., whether they have a positive
or negative contribution on the Purchase.
• Black Friday sales in US still accounts for a whopping 6 Billion $ in
revenue.[1]
• In order to compete with Online Shopping Platforms, Brick and
Mortar based Retailers need to figure out how to boost Sales during
the most important Shopping Day of the Year.
• By understanding the Purchase Patterns of the Customers Retailers
can provide improved Service Quality.
• Improve Staffing and Inventory of the Retail Store.
• Increase Revenue and Sales.
Motivation
[1] https://www.forbes.com/sites/andriacheng/2018/11/26/black-friday-cyber-monday-sales-are-hitting-another-
high-but-its-not-time-to-cheer-yet/#6d2ac36256c6
Tools
Dataset
• Collection:
• The data comes from a
competition hosted by Analytics
Vidhya[2].
• Description:
• The Dataset comprises of 550000
observations about the Black
Friday in a retail store.
• It contains various kinds of
variables either Numeric or
Categorical in nature. The dataset
contains 2 columns with missing
values:
• 166986 observations missing in
column ‘Product_Category_2’.
• 373299 observations missing in
column ‘Product_Category_3’.
[2] https://www.kaggle.com/mehdidag/black-friday/home
Description
Name Data Type
User ID Integer(Discrete)
Product ID Categorical(Discrete)
Gender Categorical(Nominal)
Age Categorical(Ordinal)
Occupation Categorical(Nominal)[Masked]
City_Category Categorical(Nominal)
Stay_In_Current_City Categorical(Ordinal)
Marital_Status Categorical(Nominal)
Product_Category_1 Categorical(Nominal)[Masked]
Product_Category_2 Categorical(Nominal) [Masked]
Product_Category_3 Categorical(Nominal) [Masked]
Purchase Integer(Continuous)
Pre-Processing
• Most of the raw data contained in any given Dataset is usually
unprocessed, incomplete, and noisy.
• In order to be useful for data mining purposes, the Dataset needs to
undergo pre-processing, in the form of ‘Data Cleaning’ and ‘Data
Transformation’.
• Handling Missing Values[3] .
• Handling Outliers.
[3] Gallit Shmueli, Nitin Patel, and Peter Bruce, Data Mining for Business Intelligence, 2nd edition, John Wiley and Sons, 2010
Exploring Categorical Variables
• Male shoppers are more
frequent than Female Shoppers.
Exploring Categorical Variables
• Age bracket 18-45 shops the
most.
Exploring Categorical Variables
• Top 5 Customers by Purchase: 4,
0,7,1,17
• Lowest 5 Customers by
Purchase: 19,13,18,9,8
Exploring Categorical Variables
• Un-Married People are more
frequent shoppers.
Exploring Categorical Variables
• Top 5 Product Categories
account for 82% of the items
sold.
• Product belonging to category
5, 1 and 8 are most likely to be
sold on
Exploring Multivariate Relationships
Exploring Multivariate Relationships
Exploring Multivariate Relationships
Statistical Analysis
• Univariate Statistical Analysis
• Multivariate Statistics
• Chi-square Test of Independence
• One-Way ANOVA
Univariate Statistical Analysis
Parameter Purchase(in US $)
Mean(µ ) 9333.86
Standard Deviation 4981.02
Median 8062
Minimum 185
Maximum 23961
Multivariate Statistics: Chi Square Test of
Independence
AGE
CITY
CATEGORY
GENDER
MARITAL
STATUS
OCCUPATION
PRODUCT
CATEGORY-
1
STAY
AGE
CITY
CATEGORY
YES
GENDER YES YES
MARITAL
STATUS
YES YES YES
OCCUPATION YES YES YES YES
PRODUCT
CATEGORY-1
YES YES YES YES YES
STAY YES YES YES YES YES YES
• A chi-square analysis was
performed to determine
whether each Category was
represented across all the
groups proportionally to their
numbers in the sample. The
analysis produced a significant
χ2 value, indicating that groups
were overrepresented in any of
the categories.
Multivariate Statistics: One Way ANOVA
• GENDER
• We performed a one-way ANOVA to compare the Two group’s average Purchase on the eve of
Black Friday. This analysis produced a statistically significant result (F(1,9998) = 47.34 , p < .05 ).
• Post hoc Tukey test revealed that the only significant difference between the groups was found
between Male(µ = 9504.77) and Female(µ = 8809.76), with the Male spending more on Purchase
significantly more than the Females.
• CITY CATEGORY
• We performed a one-way ANOVA to compare the Three group’s average Purchase on the eve of
Black Friday. This analysis produced a statistically significant result (F(2,9997) =37.26 , p < .05 ).
• Post hoc Tukey test revealed that significant difference between the groups was found between
City A(µ = 8958.01), City B(µ =9198.65), and City C(µ = 9844.44 )with the City C Purchasing
significantly more than City A and City B.
Feature Engineering
Variable Conversion Type
‘User_ID’ Used as Raw Feature.
‘Product_ID’ Used as Raw Feature.
‘Gender’ Converted to Binary.
‘Age’ Converted to Numeric.
‘Marital_Status’ Converted to Binary.
‘Occupation’ Used as Raw Feature.
‘City_Category’ One-Hot Encoded.
‘Stay_In_Current_City’ Converted to Numeric.
‘Product_Category_1’ Used as Raw Feature.
Feature Engineering: Incorporating Ordinality
Feature Engineering
• Discretization
• Polychotomization
• Response/Target Transformation
• Feature Creation:
• Based on Average Feature Purchase
• Based on Feature Frequency
Model Selection: Multiple Linear Regression
• Model selection criteria:
• Simple
• Retains explainability
• Easy to understand and Implement
• Model that helps in answering important Business related Questions such
as:
• Is there a relationship between Purchase on Black Friday by a Customer and
Predictor variables?
• How strong is the relationship?
• Which Predictor contributes to the Purchase on the eve of Black Friday?
• How large is the effect of each predictor on Purchase?
• How accurately can we predict the Purchase?
• Is the relationship linear?
Model Development
• Step 1: Data Transformation
• Step 2: Data division using ‘Validation Set Approach’[4]
• Step 3: Model Development
[4] G. James et al., An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, © Springer Science+Business Media New York 2013
Model Evaluation
• Metrics used:
• RMSE
• R2
• Adjusted R2
Model Evaluation
Feature Engineering Techniques
DC Data Conversion
DB Data Binning
AFP Average Feature Purchase
FF Feature Frequency
Regression Models
Training Set Validation Set
RMSE R2
Adjuste
d R2
RMSE R2
Adjusted
R2
Baseline Model
4707.5
3 0.11 0.11
4715.4
9 0.11 0.11
Model 1(DB)
3888.1
7 0.39 0.39
3895.5
5 0.39 0.39
Model 2(AFP + FF)
4979.6
7 0 0
4984.4
4 0 0
Model 3(DC + FF) 2903.5 0.66 0.66
2906.6
5 0.66 0.66
Model 4(DC + AFP)
4979.7
1 0 0
4984.3
6 0 0
Ridge
Regression(Model 3)
2903.8
4 0.66 0.66
2906.9
6 0.66 0.66
LASSO
Regression(Model 3)
2928.4
8 0.65 0.65
2930.1
2 0.66 0.66
LASSO Regression
• Performs variable selection by forcing some of coefficient estimates
to be zero.
• Simpler and more interpretable model than Ridge.
• Handles Multicollinearity.
• Initial 52 variables were in Model-3.
• Post LASSO Regularization:18 variables were left.
Statistical Learning
OLS Regression Results
Dep. Variable: Purchase R-squared: 0.653
Model: OLS Adj. R-squared: 0.653
Method: Least Squares F-statistic: 3.935e+04
Date: Sun, 06 Jan 2019 Prob (F-statistic): 0.00
Time: 17:40:07 Log-Likelihood: -3.5381e+06
No. Observations: 376303 AIC: 7.076e+06
Df Residuals: 376284 BIC: 7.076e+06
Df Model: 18
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Const 1.134e+04 20.082 564.769 0.000 1.13e+04 1.14e+04
Product_Category_1_10 6534.2674 50.434 129.561 0.000 6435.419 6633.116
Product_Category_1_7 4267.2267 58.652 72.755 0.000 4152.270 4382.183
Product_Category_1_6 2659.5272 26.192 101.541 0.000 2608.192 2710.862
Product_Category_1_16 2026.6088 36.722 55.187 0.000 1954.634 2098.583
Product_Category_1_15 2123.6187 45.426 46.749 0.000 2034.586 2212.651
City_Category_C 283.9126 10.471 27.114 0.000 263.389 304.436
Age 10.0330 0.359 27.939 0.000 9.329 10.737
Product_ID_Counts 2.5978 0.014 185.461 0.000 2.570 2.625
Stay_In_Current_City_Years 7.8901 3.708 2.128 0.033 0.622 15.158
Occupation_1 -162.6174 17.166 -9.473 0.000 -196.262 -128.973
Product_Category_1_3 -2811.2377 26.454 -106.270 0.000 -2863.086 -2759.389
Product_Category_1_8 -5218.7197 13.907 -375.253 0.000 -5245.977 -5191.462
Product_Category_1_18 -9453.6809 64.223 -147.202 0.000 -9579.555 -9327.806
Product_Category_1_11 -7742.6858 24.644 -314.179 0.000 -7790.988 -7694.384
Product_Category_1_5 -6633.2756 12.698 -522.406 0.000 -6658.162 -6608.389
Product_Category_1_12 -1.122e+04 56.755 -197.758 0.000 -1.13e+04 -1.11e+04
Product_Category_1_4 -1.045e+04 33.805 -309.155 0.000 -1.05e+04 -1.04e+04
Product_Category_1_13 -1.191e+04 48.513 -245.426 0.000 -1.2e+04 -1.18e+04
Residual Analysis
• Normality of the Residuals
Residual Analysis
• Non-Linearity of the Response-
Predictor Relationship:
• No visible pattern in the residuals.
Residual Analysis
• Heteroskedasticity:
• Funnel shape is evident
• Response Log-Transformed in
order to achieve Homoskedasticity
Results
• Based on Descriptive Analytics
• Based on Behavioural Analytics
• Based on Predictive Analytics
• Based on Prescriptive Analytics
Results
• Based on Descriptive Analytics:
• Male Shoppers are likely to buy more Products than Female Shoppers.
• Older(40+) people are likely to spend more irrespective of their marital status.
• Customers who arrived recently in City-B and City-C are likely to shop less
frequently than those who stayed longer(Acclimatization can be an issue).
Results
• Based on Behavioural Analytics:
• Keeping Products that are more likely to sell on the front of the store will lead
to an increase in the Sales.[6]
• Products ‘1’, ‘5’ and ‘8’ of Product_Category_1 are highest selling Products.
So, should be kept at the front of the Store.
[6] Fließ, Sabine & Hogreve, Jens & Nonnenmacher, Dirk. (2004). Emotional Effects of Shop Window Displays on Consumer Behavior.
Results
• Based on Predictive Analytics:
• Purchase is heavily influenced by Product Category.
• People of 60+ Age will spend as much as 600$ more than Teenagers.
• People belonging to Occupation-1 are likely to spend less.
• Product Category that have an average price over 9000$ are likely to
influence Purchase positively and vice versa.
• City C Customers will spend 283$ more than other city Customers.
Results
• Based on Prescriptive Analytics:
• If the Price of ‘Product-5’ is
increased by 5%, ‘Product-1’ by
3% and ‘Product-8’ by 4% then the
Revenue will increase by 150
Million $ which is higher than the
combined Revenue of eight lowest
selling Products.
Future Scope: Model Deployment
Black Friday Shopping Prediction_ PPT

Contenu connexe

Tendances

How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
Eugene Yan Ziyou
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
Kapil Rode
 

Tendances (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Enterprise Data Warehouse
Enterprise Data Warehouse Enterprise Data Warehouse
Enterprise Data Warehouse
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk Management
 
AI in Supply chains
AI in Supply chainsAI in Supply chains
AI in Supply chains
 
Data analytics
Data analyticsData analytics
Data analytics
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copy
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
 
Recommendation system for ecommerce
Recommendation system for ecommerceRecommendation system for ecommerce
Recommendation system for ecommerce
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptx
 

Similaire à Black Friday Shopping Prediction_ PPT

Intelli-Global Overview 051313
Intelli-Global Overview 051313Intelli-Global Overview 051313
Intelli-Global Overview 051313
Intelli-Global
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
Vivastream
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
Vivastream
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data Warehouse
Vinita Prasad
 

Similaire à Black Friday Shopping Prediction_ PPT (20)

Black Friday Shopping Prediction
Black Friday Shopping PredictionBlack Friday Shopping Prediction
Black Friday Shopping Prediction
 
Bmgt 411 week3
Bmgt 411 week3Bmgt 411 week3
Bmgt 411 week3
 
Intelli-Global Overview 051313
Intelli-Global Overview 051313Intelli-Global Overview 051313
Intelli-Global Overview 051313
 
Final presentation
Final presentationFinal presentation
Final presentation
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
 
Decision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and BehaviorsDecision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and Behaviors
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: Data
 
Retail Design
Retail DesignRetail Design
Retail Design
 
6sigma1 167
6sigma1 1676sigma1 167
6sigma1 167
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Power Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataPower Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web Data
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Customer analytics
Customer analyticsCustomer analytics
Customer analytics
 
Improving profitability of campaigns through data science
Improving profitability of campaigns through data scienceImproving profitability of campaigns through data science
Improving profitability of campaigns through data science
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data Warehouse
 
Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data
 
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest
 

Dernier

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Dernier (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 

Black Friday Shopping Prediction_ PPT

  • 1. CAPSTONE PRESENTATION ON “PURCHASE PREDICTION ON BLACK FRIDAY” Submitted towards partial fulfilment of the criteria for award of PGP-DSE by GLIM Submitted By Group No. 8 [Batch: 2018-19] Group Members Arjun Thumbayil – DSEFTCJUL18006 Sahil Bansal - DSEFTCJUL18014 Shahrukh Buland Iqbal – DSEFTCJUL18042 Research Supervisor P V Subramanian
  • 2. Contents Introduction • Background • Objective • Motivation Dataset • Collection • Description • Pre-procession • Exploratory Data Analysis • Statistical Analysis Feature Engineering • Data Conversion • Discretization • Polychotomization • Response/Target Transformation • Feature Creation Modeling • Model Selection • Model Development • Model Evaluation • Model Optimization • Model in Production Statistical Learning • Residual Analysis Results Future Scope • Model Deployment
  • 3. Background • The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start to the holiday shopping season. • It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure, competition and urgency around getting great deals.
  • 4.
  • 5. Background • The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start to the holiday shopping season. • It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure, competition and urgency around getting great deals. • Although Cyber Monday is gaining popularity, Black Friday shopping continues to be popular because of an abundance of doorbuster deals, instant gratification, and the benefit of social shopping.
  • 6. Objective • Predicting Purchase • Build a simple Machine Learning model that can predict how much a Customer is likely to spend on the eve of Black Friday. • Pattern Recognition • Reveal and Understand the most important factors from predictors such as Age, Gender, City of Residence etc., that influence the spending of a Customer. • Establish a quantitative impact of the revealed factors and how they influence Purchase by a Customer on a personal level i.e., whether they have a positive or negative contribution on the Purchase.
  • 7. • Black Friday sales in US still accounts for a whopping 6 Billion $ in revenue.[1] • In order to compete with Online Shopping Platforms, Brick and Mortar based Retailers need to figure out how to boost Sales during the most important Shopping Day of the Year. • By understanding the Purchase Patterns of the Customers Retailers can provide improved Service Quality. • Improve Staffing and Inventory of the Retail Store. • Increase Revenue and Sales. Motivation [1] https://www.forbes.com/sites/andriacheng/2018/11/26/black-friday-cyber-monday-sales-are-hitting-another- high-but-its-not-time-to-cheer-yet/#6d2ac36256c6
  • 9. Dataset • Collection: • The data comes from a competition hosted by Analytics Vidhya[2]. • Description: • The Dataset comprises of 550000 observations about the Black Friday in a retail store. • It contains various kinds of variables either Numeric or Categorical in nature. The dataset contains 2 columns with missing values: • 166986 observations missing in column ‘Product_Category_2’. • 373299 observations missing in column ‘Product_Category_3’. [2] https://www.kaggle.com/mehdidag/black-friday/home
  • 10. Description Name Data Type User ID Integer(Discrete) Product ID Categorical(Discrete) Gender Categorical(Nominal) Age Categorical(Ordinal) Occupation Categorical(Nominal)[Masked] City_Category Categorical(Nominal) Stay_In_Current_City Categorical(Ordinal) Marital_Status Categorical(Nominal) Product_Category_1 Categorical(Nominal)[Masked] Product_Category_2 Categorical(Nominal) [Masked] Product_Category_3 Categorical(Nominal) [Masked] Purchase Integer(Continuous)
  • 11. Pre-Processing • Most of the raw data contained in any given Dataset is usually unprocessed, incomplete, and noisy. • In order to be useful for data mining purposes, the Dataset needs to undergo pre-processing, in the form of ‘Data Cleaning’ and ‘Data Transformation’. • Handling Missing Values[3] . • Handling Outliers. [3] Gallit Shmueli, Nitin Patel, and Peter Bruce, Data Mining for Business Intelligence, 2nd edition, John Wiley and Sons, 2010
  • 12. Exploring Categorical Variables • Male shoppers are more frequent than Female Shoppers.
  • 13. Exploring Categorical Variables • Age bracket 18-45 shops the most.
  • 14. Exploring Categorical Variables • Top 5 Customers by Purchase: 4, 0,7,1,17 • Lowest 5 Customers by Purchase: 19,13,18,9,8
  • 15. Exploring Categorical Variables • Un-Married People are more frequent shoppers.
  • 16. Exploring Categorical Variables • Top 5 Product Categories account for 82% of the items sold. • Product belonging to category 5, 1 and 8 are most likely to be sold on
  • 20. Statistical Analysis • Univariate Statistical Analysis • Multivariate Statistics • Chi-square Test of Independence • One-Way ANOVA
  • 21. Univariate Statistical Analysis Parameter Purchase(in US $) Mean(µ ) 9333.86 Standard Deviation 4981.02 Median 8062 Minimum 185 Maximum 23961
  • 22. Multivariate Statistics: Chi Square Test of Independence AGE CITY CATEGORY GENDER MARITAL STATUS OCCUPATION PRODUCT CATEGORY- 1 STAY AGE CITY CATEGORY YES GENDER YES YES MARITAL STATUS YES YES YES OCCUPATION YES YES YES YES PRODUCT CATEGORY-1 YES YES YES YES YES STAY YES YES YES YES YES YES • A chi-square analysis was performed to determine whether each Category was represented across all the groups proportionally to their numbers in the sample. The analysis produced a significant χ2 value, indicating that groups were overrepresented in any of the categories.
  • 23. Multivariate Statistics: One Way ANOVA • GENDER • We performed a one-way ANOVA to compare the Two group’s average Purchase on the eve of Black Friday. This analysis produced a statistically significant result (F(1,9998) = 47.34 , p < .05 ). • Post hoc Tukey test revealed that the only significant difference between the groups was found between Male(µ = 9504.77) and Female(µ = 8809.76), with the Male spending more on Purchase significantly more than the Females. • CITY CATEGORY • We performed a one-way ANOVA to compare the Three group’s average Purchase on the eve of Black Friday. This analysis produced a statistically significant result (F(2,9997) =37.26 , p < .05 ). • Post hoc Tukey test revealed that significant difference between the groups was found between City A(µ = 8958.01), City B(µ =9198.65), and City C(µ = 9844.44 )with the City C Purchasing significantly more than City A and City B.
  • 24. Feature Engineering Variable Conversion Type ‘User_ID’ Used as Raw Feature. ‘Product_ID’ Used as Raw Feature. ‘Gender’ Converted to Binary. ‘Age’ Converted to Numeric. ‘Marital_Status’ Converted to Binary. ‘Occupation’ Used as Raw Feature. ‘City_Category’ One-Hot Encoded. ‘Stay_In_Current_City’ Converted to Numeric. ‘Product_Category_1’ Used as Raw Feature.
  • 26. Feature Engineering • Discretization • Polychotomization • Response/Target Transformation • Feature Creation: • Based on Average Feature Purchase • Based on Feature Frequency
  • 27. Model Selection: Multiple Linear Regression • Model selection criteria: • Simple • Retains explainability • Easy to understand and Implement • Model that helps in answering important Business related Questions such as: • Is there a relationship between Purchase on Black Friday by a Customer and Predictor variables? • How strong is the relationship? • Which Predictor contributes to the Purchase on the eve of Black Friday? • How large is the effect of each predictor on Purchase? • How accurately can we predict the Purchase? • Is the relationship linear?
  • 28. Model Development • Step 1: Data Transformation • Step 2: Data division using ‘Validation Set Approach’[4] • Step 3: Model Development [4] G. James et al., An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, © Springer Science+Business Media New York 2013
  • 29. Model Evaluation • Metrics used: • RMSE • R2 • Adjusted R2
  • 30. Model Evaluation Feature Engineering Techniques DC Data Conversion DB Data Binning AFP Average Feature Purchase FF Feature Frequency Regression Models Training Set Validation Set RMSE R2 Adjuste d R2 RMSE R2 Adjusted R2 Baseline Model 4707.5 3 0.11 0.11 4715.4 9 0.11 0.11 Model 1(DB) 3888.1 7 0.39 0.39 3895.5 5 0.39 0.39 Model 2(AFP + FF) 4979.6 7 0 0 4984.4 4 0 0 Model 3(DC + FF) 2903.5 0.66 0.66 2906.6 5 0.66 0.66 Model 4(DC + AFP) 4979.7 1 0 0 4984.3 6 0 0 Ridge Regression(Model 3) 2903.8 4 0.66 0.66 2906.9 6 0.66 0.66 LASSO Regression(Model 3) 2928.4 8 0.65 0.65 2930.1 2 0.66 0.66
  • 31. LASSO Regression • Performs variable selection by forcing some of coefficient estimates to be zero. • Simpler and more interpretable model than Ridge. • Handles Multicollinearity. • Initial 52 variables were in Model-3. • Post LASSO Regularization:18 variables were left.
  • 32. Statistical Learning OLS Regression Results Dep. Variable: Purchase R-squared: 0.653 Model: OLS Adj. R-squared: 0.653 Method: Least Squares F-statistic: 3.935e+04 Date: Sun, 06 Jan 2019 Prob (F-statistic): 0.00 Time: 17:40:07 Log-Likelihood: -3.5381e+06 No. Observations: 376303 AIC: 7.076e+06 Df Residuals: 376284 BIC: 7.076e+06 Df Model: 18 Covariance Type: nonrobust
  • 33. coef std err t P>|t| [0.025 0.975] Const 1.134e+04 20.082 564.769 0.000 1.13e+04 1.14e+04 Product_Category_1_10 6534.2674 50.434 129.561 0.000 6435.419 6633.116 Product_Category_1_7 4267.2267 58.652 72.755 0.000 4152.270 4382.183 Product_Category_1_6 2659.5272 26.192 101.541 0.000 2608.192 2710.862 Product_Category_1_16 2026.6088 36.722 55.187 0.000 1954.634 2098.583 Product_Category_1_15 2123.6187 45.426 46.749 0.000 2034.586 2212.651 City_Category_C 283.9126 10.471 27.114 0.000 263.389 304.436 Age 10.0330 0.359 27.939 0.000 9.329 10.737 Product_ID_Counts 2.5978 0.014 185.461 0.000 2.570 2.625 Stay_In_Current_City_Years 7.8901 3.708 2.128 0.033 0.622 15.158 Occupation_1 -162.6174 17.166 -9.473 0.000 -196.262 -128.973 Product_Category_1_3 -2811.2377 26.454 -106.270 0.000 -2863.086 -2759.389 Product_Category_1_8 -5218.7197 13.907 -375.253 0.000 -5245.977 -5191.462 Product_Category_1_18 -9453.6809 64.223 -147.202 0.000 -9579.555 -9327.806 Product_Category_1_11 -7742.6858 24.644 -314.179 0.000 -7790.988 -7694.384 Product_Category_1_5 -6633.2756 12.698 -522.406 0.000 -6658.162 -6608.389 Product_Category_1_12 -1.122e+04 56.755 -197.758 0.000 -1.13e+04 -1.11e+04 Product_Category_1_4 -1.045e+04 33.805 -309.155 0.000 -1.05e+04 -1.04e+04 Product_Category_1_13 -1.191e+04 48.513 -245.426 0.000 -1.2e+04 -1.18e+04
  • 35. Residual Analysis • Non-Linearity of the Response- Predictor Relationship: • No visible pattern in the residuals.
  • 36. Residual Analysis • Heteroskedasticity: • Funnel shape is evident • Response Log-Transformed in order to achieve Homoskedasticity
  • 37.
  • 38. Results • Based on Descriptive Analytics • Based on Behavioural Analytics • Based on Predictive Analytics • Based on Prescriptive Analytics
  • 39. Results • Based on Descriptive Analytics: • Male Shoppers are likely to buy more Products than Female Shoppers. • Older(40+) people are likely to spend more irrespective of their marital status. • Customers who arrived recently in City-B and City-C are likely to shop less frequently than those who stayed longer(Acclimatization can be an issue).
  • 40. Results • Based on Behavioural Analytics: • Keeping Products that are more likely to sell on the front of the store will lead to an increase in the Sales.[6] • Products ‘1’, ‘5’ and ‘8’ of Product_Category_1 are highest selling Products. So, should be kept at the front of the Store. [6] Fließ, Sabine & Hogreve, Jens & Nonnenmacher, Dirk. (2004). Emotional Effects of Shop Window Displays on Consumer Behavior.
  • 41. Results • Based on Predictive Analytics: • Purchase is heavily influenced by Product Category. • People of 60+ Age will spend as much as 600$ more than Teenagers. • People belonging to Occupation-1 are likely to spend less. • Product Category that have an average price over 9000$ are likely to influence Purchase positively and vice versa. • City C Customers will spend 283$ more than other city Customers.
  • 42. Results • Based on Prescriptive Analytics: • If the Price of ‘Product-5’ is increased by 5%, ‘Product-1’ by 3% and ‘Product-8’ by 4% then the Revenue will increase by 150 Million $ which is higher than the combined Revenue of eight lowest selling Products.
  • 43. Future Scope: Model Deployment