SlideShare une entreprise Scribd logo
1  sur  44
CAPSTONE PRESENTATION ON
“PURCHASE PREDICTION ON
BLACK FRIDAY”
Submitted towards partial fulfilment of the criteria
for award of PGP-DSE by GLIM
Submitted By
Group No. 8 [Batch: 2018-19]
Group Members
Arjun Thumbayil – DSEFTCJUL18006
Sahil Bansal - DSEFTCJUL18014
Shahrukh Buland Iqbal – DSEFTCJUL18042
Research Supervisor
P V Subramanian
Contents
Introduction
• Background
• Objective
• Motivation
Dataset
• Collection
• Description
• Pre-procession
• Exploratory Data
Analysis
• Statistical
Analysis
Feature
Engineering
• Data Conversion
• Discretization
• Polychotomization
• Response/Target
Transformation
• Feature Creation
Modeling
• Model Selection
• Model
Development
• Model Evaluation
• Model
Optimization
• Model in
Production
Statistical
Learning
• Residual
Analysis
Results
Future
Scope
• Model
Deployment
Background
• The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start
to the holiday shopping season.
• It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure,
competition and urgency around getting great deals.
Background
• The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start
to the holiday shopping season.
• It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure,
competition and urgency around getting great deals.
• Although Cyber Monday is gaining popularity, Black Friday shopping continues to be popular
because of an abundance of doorbuster deals, instant gratification, and the benefit of social
shopping.
Objective
• Predicting Purchase
• Build a simple Machine Learning model that can predict how much a
Customer is likely to spend on the eve of Black Friday.
• Pattern Recognition
• Reveal and Understand the most important factors from predictors such as
Age, Gender, City of Residence etc., that influence the spending of a
Customer.
• Establish a quantitative impact of the revealed factors and how they influence
Purchase by a Customer on a personal level i.e., whether they have a positive
or negative contribution on the Purchase.
• Black Friday sales in US still accounts for a whopping 6 Billion $ in
revenue.[1]
• In order to compete with Online Shopping Platforms, Brick and
Mortar based Retailers need to figure out how to boost Sales during
the most important Shopping Day of the Year.
• By understanding the Purchase Patterns of the Customers Retailers
can provide improved Service Quality.
• Improve Staffing and Inventory of the Retail Store.
• Increase Revenue and Sales.
Motivation
[1] https://www.forbes.com/sites/andriacheng/2018/11/26/black-friday-cyber-monday-sales-are-hitting-another-
high-but-its-not-time-to-cheer-yet/#6d2ac36256c6
Tools
Dataset
• Collection:
• The data comes from a
competition hosted by Analytics
Vidhya[2].
• Description:
• The Dataset comprises of 550000
observations about the Black
Friday in a retail store.
• It contains various kinds of
variables either Numeric or
Categorical in nature. The dataset
contains 2 columns with missing
values:
• 166986 observations missing in
column ‘Product_Category_2’.
• 373299 observations missing in
column ‘Product_Category_3’.
[2] https://www.kaggle.com/mehdidag/black-friday/home
Description
Name Data Type
User ID Integer(Discrete)
Product ID Categorical(Discrete)
Gender Categorical(Nominal)
Age Categorical(Ordinal)
Occupation Categorical(Nominal)[Masked]
City_Category Categorical(Nominal)
Stay_In_Current_City Categorical(Ordinal)
Marital_Status Categorical(Nominal)
Product_Category_1 Categorical(Nominal)[Masked]
Product_Category_2 Categorical(Nominal) [Masked]
Product_Category_3 Categorical(Nominal) [Masked]
Purchase Integer(Continuous)
Pre-Processing
• Most of the raw data contained in any given Dataset is usually
unprocessed, incomplete, and noisy.
• In order to be useful for data mining purposes, the Dataset needs to
undergo pre-processing, in the form of ‘Data Cleaning’ and ‘Data
Transformation’.
• Handling Missing Values[3] .
• Handling Outliers.
[3] Gallit Shmueli, Nitin Patel, and Peter Bruce, Data Mining for Business Intelligence, 2nd edition, John Wiley and Sons, 2010
Exploring Categorical Variables
• Male shoppers are more
frequent than Female Shoppers.
Exploring Categorical Variables
• Age bracket 18-45 shops the
most.
Exploring Categorical Variables
• Top 5 Customers by Purchase: 4,
0,7,1,17
• Lowest 5 Customers by
Purchase: 19,13,18,9,8
Exploring Categorical Variables
• Un-Married People are more
frequent shoppers.
Exploring Categorical Variables
• Top 5 Product Categories
account for 82% of the items
sold.
• Product belonging to category
5, 1 and 8 are most likely to be
sold on
Exploring Multivariate Relationships
Exploring Multivariate Relationships
Exploring Multivariate Relationships
Statistical Analysis
• Univariate Statistical Analysis
• Multivariate Statistics
• Chi-square Test of Independence
• One-Way ANOVA
Univariate Statistical Analysis
Parameter Purchase(in US $)
Mean(µ ) 9333.86
Standard Deviation 4981.02
Median 8062
Minimum 185
Maximum 23961
Multivariate Statistics: Chi Square Test of
Independence
AGE
CITY
CATEGORY
GENDER
MARITAL
STATUS
OCCUPATION
PRODUCT
CATEGORY-
1
STAY
AGE
CITY
CATEGORY
YES
GENDER YES YES
MARITAL
STATUS
YES YES YES
OCCUPATION YES YES YES YES
PRODUCT
CATEGORY-1
YES YES YES YES YES
STAY YES YES YES YES YES YES
• A chi-square analysis was
performed to determine
whether each Category was
represented across all the
groups proportionally to their
numbers in the sample. The
analysis produced a significant
χ2 value, indicating that groups
were overrepresented in any of
the categories.
Multivariate Statistics: One Way ANOVA
• GENDER
• We performed a one-way ANOVA to compare the Two group’s average Purchase on the eve of
Black Friday. This analysis produced a statistically significant result (F(1,9998) = 47.34 , p < .05 ).
• Post hoc Tukey test revealed that the only significant difference between the groups was found
between Male(µ = 9504.77) and Female(µ = 8809.76), with the Male spending more on Purchase
significantly more than the Females.
• CITY CATEGORY
• We performed a one-way ANOVA to compare the Three group’s average Purchase on the eve of
Black Friday. This analysis produced a statistically significant result (F(2,9997) =37.26 , p < .05 ).
• Post hoc Tukey test revealed that significant difference between the groups was found between
City A(µ = 8958.01), City B(µ =9198.65), and City C(µ = 9844.44 )with the City C Purchasing
significantly more than City A and City B.
Feature Engineering
Variable Conversion Type
‘User_ID’ Used as Raw Feature.
‘Product_ID’ Used as Raw Feature.
‘Gender’ Converted to Binary.
‘Age’ Converted to Numeric.
‘Marital_Status’ Converted to Binary.
‘Occupation’ Used as Raw Feature.
‘City_Category’ One-Hot Encoded.
‘Stay_In_Current_City’ Converted to Numeric.
‘Product_Category_1’ Used as Raw Feature.
Feature Engineering: Incorporating Ordinality
Feature Engineering
• Discretization
• Polychotomization
• Response/Target Transformation
• Feature Creation:
• Based on Average Feature Purchase
• Based on Feature Frequency
Model Selection: Multiple Linear Regression
• Model selection criteria:
• Simple
• Retains explainability
• Easy to understand and Implement
• Model that helps in answering important Business related Questions such
as:
• Is there a relationship between Purchase on Black Friday by a Customer and
Predictor variables?
• How strong is the relationship?
• Which Predictor contributes to the Purchase on the eve of Black Friday?
• How large is the effect of each predictor on Purchase?
• How accurately can we predict the Purchase?
• Is the relationship linear?
Model Development
• Step 1: Data Transformation
• Step 2: Data division using ‘Validation Set Approach’[4]
• Step 3: Model Development
[4] G. James et al., An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, © Springer Science+Business Media New York 2013
Model Evaluation
• Metrics used:
• RMSE
• R2
• Adjusted R2
Model Evaluation
Feature Engineering Techniques
DC Data Conversion
DB Data Binning
AFP Average Feature Purchase
FF Feature Frequency
Regression Models
Training Set Validation Set
RMSE R2
Adjuste
d R2
RMSE R2
Adjusted
R2
Baseline Model
4707.5
3 0.11 0.11
4715.4
9 0.11 0.11
Model 1(DB)
3888.1
7 0.39 0.39
3895.5
5 0.39 0.39
Model 2(AFP + FF)
4979.6
7 0 0
4984.4
4 0 0
Model 3(DC + FF) 2903.5 0.66 0.66
2906.6
5 0.66 0.66
Model 4(DC + AFP)
4979.7
1 0 0
4984.3
6 0 0
Ridge
Regression(Model 3)
2903.8
4 0.66 0.66
2906.9
6 0.66 0.66
LASSO
Regression(Model 3)
2928.4
8 0.65 0.65
2930.1
2 0.66 0.66
LASSO Regression
• Performs variable selection by forcing some of coefficient estimates
to be zero.
• Simpler and more interpretable model than Ridge.
• Handles Multicollinearity.
• Initial 52 variables were in Model-3.
• Post LASSO Regularization:18 variables were left.
Statistical Learning
OLS Regression Results
Dep. Variable: Purchase R-squared: 0.653
Model: OLS Adj. R-squared: 0.653
Method: Least Squares F-statistic: 3.935e+04
Date: Sun, 06 Jan 2019 Prob (F-statistic): 0.00
Time: 17:40:07 Log-Likelihood: -3.5381e+06
No. Observations: 376303 AIC: 7.076e+06
Df Residuals: 376284 BIC: 7.076e+06
Df Model: 18
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Const 1.134e+04 20.082 564.769 0.000 1.13e+04 1.14e+04
Product_Category_1_10 6534.2674 50.434 129.561 0.000 6435.419 6633.116
Product_Category_1_7 4267.2267 58.652 72.755 0.000 4152.270 4382.183
Product_Category_1_6 2659.5272 26.192 101.541 0.000 2608.192 2710.862
Product_Category_1_16 2026.6088 36.722 55.187 0.000 1954.634 2098.583
Product_Category_1_15 2123.6187 45.426 46.749 0.000 2034.586 2212.651
City_Category_C 283.9126 10.471 27.114 0.000 263.389 304.436
Age 10.0330 0.359 27.939 0.000 9.329 10.737
Product_ID_Counts 2.5978 0.014 185.461 0.000 2.570 2.625
Stay_In_Current_City_Years 7.8901 3.708 2.128 0.033 0.622 15.158
Occupation_1 -162.6174 17.166 -9.473 0.000 -196.262 -128.973
Product_Category_1_3 -2811.2377 26.454 -106.270 0.000 -2863.086 -2759.389
Product_Category_1_8 -5218.7197 13.907 -375.253 0.000 -5245.977 -5191.462
Product_Category_1_18 -9453.6809 64.223 -147.202 0.000 -9579.555 -9327.806
Product_Category_1_11 -7742.6858 24.644 -314.179 0.000 -7790.988 -7694.384
Product_Category_1_5 -6633.2756 12.698 -522.406 0.000 -6658.162 -6608.389
Product_Category_1_12 -1.122e+04 56.755 -197.758 0.000 -1.13e+04 -1.11e+04
Product_Category_1_4 -1.045e+04 33.805 -309.155 0.000 -1.05e+04 -1.04e+04
Product_Category_1_13 -1.191e+04 48.513 -245.426 0.000 -1.2e+04 -1.18e+04
Residual Analysis
• Normality of the Residuals
Residual Analysis
• Non-Linearity of the Response-
Predictor Relationship:
• No visible pattern in the residuals.
Residual Analysis
• Heteroskedasticity:
• Funnel shape is evident
• Response Log-Transformed in
order to achieve Homoskedasticity
Results
• Based on Descriptive Analytics
• Based on Behavioural Analytics
• Based on Predictive Analytics
• Based on Prescriptive Analytics
Results
• Based on Descriptive Analytics:
• Male Shoppers are likely to buy more Products than Female Shoppers.
• Older(40+) people are likely to spend more irrespective of their marital status.
• Customers who arrived recently in City-B and City-C are likely to shop less
frequently than those who stayed longer(Acclimatization can be an issue).
Results
• Based on Behavioural Analytics:
• Keeping Products that are more likely to sell on the front of the store will lead
to an increase in the Sales.[6]
• Products ‘1’, ‘5’ and ‘8’ of Product_Category_1 are highest selling Products.
So, should be kept at the front of the Store.
[6] Fließ, Sabine & Hogreve, Jens & Nonnenmacher, Dirk. (2004). Emotional Effects of Shop Window Displays on Consumer Behavior.
Results
• Based on Predictive Analytics:
• Purchase is heavily influenced by Product Category.
• People of 60+ Age will spend as much as 600$ more than Teenagers.
• People belonging to Occupation-1 are likely to spend less.
• Product Category that have an average price over 9000$ are likely to
influence Purchase positively and vice versa.
• City C Customers will spend 283$ more than other city Customers.
Results
• Based on Prescriptive Analytics:
• If the Price of ‘Product-5’ is
increased by 5%, ‘Product-1’ by
3% and ‘Product-8’ by 4% then the
Revenue will increase by 150
Million $ which is higher than the
combined Revenue of eight lowest
selling Products.
Future Scope: Model Deployment
Black Friday Shopping Prediction_ PPT

Contenu connexe

Tendances

Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification303Computing
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...Swiss Big Data User Group
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
Stock Market Prediction
Stock Market PredictionStock Market Prediction
Stock Market PredictionMRIDUL GUPTA
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningAravind Balaji
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Predicting house price
Predicting house pricePredicting house price
Predicting house priceDivya Tiwari
 
Stock Price Prediction PPT
Stock Price Prediction  PPTStock Price Prediction  PPT
Stock Price Prediction PPTPrashantGanji4
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-CommerceArul Bharathi
 

Tendances (20)

Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Image captioning
Image captioningImage captioning
Image captioning
 
Data Mining
Data MiningData Mining
Data Mining
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Data science
Data scienceData science
Data science
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...
14.05.12 Analysis and Prediction of Flight Prices using Historical Pricing Da...
 
Text summarization
Text summarizationText summarization
Text summarization
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
Stock Market Prediction
Stock Market PredictionStock Market Prediction
Stock Market Prediction
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
Project report
Project reportProject report
Project report
 
Stock Price Prediction PPT
Stock Price Prediction  PPTStock Price Prediction  PPT
Stock Price Prediction PPT
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-Commerce
 

Similaire à Black Friday Shopping Prediction_ PPT

Black Friday Shopping Prediction
Black Friday Shopping PredictionBlack Friday Shopping Prediction
Black Friday Shopping PredictionSBIqbal
 
Intelli-Global Overview 051313
Intelli-Global Overview 051313Intelli-Global Overview 051313
Intelli-Global Overview 051313Intelli-Global
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGZaranTech LLC
 
Decision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and BehaviorsDecision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and BehaviorsLarry Boyer
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataMinha Hwang
 
Retail Design
Retail DesignRetail Design
Retail Designjagishar
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Power Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataPower Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataConnotate
 
Customer analytics
Customer analyticsCustomer analytics
Customer analyticsKarl Melo
 
Improving profitability of campaigns through data science
Improving profitability of campaigns through data scienceImproving profitability of campaigns through data science
Improving profitability of campaigns through data scienceswebi
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseVinita Prasad
 
Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data ShiSh Shridhar
 

Similaire à Black Friday Shopping Prediction_ PPT (20)

Black Friday Shopping Prediction
Black Friday Shopping PredictionBlack Friday Shopping Prediction
Black Friday Shopping Prediction
 
Bmgt 411 week3
Bmgt 411 week3Bmgt 411 week3
Bmgt 411 week3
 
Intelli-Global Overview 051313
Intelli-Global Overview 051313Intelli-Global Overview 051313
Intelli-Global Overview 051313
 
Final presentation
Final presentationFinal presentation
Final presentation
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
 
Decision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and BehaviorsDecision Analytics: Revealing Customer Preferences and Behaviors
Decision Analytics: Revealing Customer Preferences and Behaviors
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: Data
 
Retail Design
Retail DesignRetail Design
Retail Design
 
6sigma1 167
6sigma1 1676sigma1 167
6sigma1 167
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Power Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataPower Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web Data
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Customer analytics
Customer analyticsCustomer analytics
Customer analytics
 
Improving profitability of campaigns through data science
Improving profitability of campaigns through data scienceImproving profitability of campaigns through data science
Improving profitability of campaigns through data science
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data Warehouse
 
Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data Delivering Personalized Experiences using the Power of Data
Delivering Personalized Experiences using the Power of Data
 
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest
 

Dernier

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Black Friday Shopping Prediction_ PPT

  • 1. CAPSTONE PRESENTATION ON “PURCHASE PREDICTION ON BLACK FRIDAY” Submitted towards partial fulfilment of the criteria for award of PGP-DSE by GLIM Submitted By Group No. 8 [Batch: 2018-19] Group Members Arjun Thumbayil – DSEFTCJUL18006 Sahil Bansal - DSEFTCJUL18014 Shahrukh Buland Iqbal – DSEFTCJUL18042 Research Supervisor P V Subramanian
  • 2. Contents Introduction • Background • Objective • Motivation Dataset • Collection • Description • Pre-procession • Exploratory Data Analysis • Statistical Analysis Feature Engineering • Data Conversion • Discretization • Polychotomization • Response/Target Transformation • Feature Creation Modeling • Model Selection • Model Development • Model Evaluation • Model Optimization • Model in Production Statistical Learning • Residual Analysis Results Future Scope • Model Deployment
  • 3. Background • The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start to the holiday shopping season. • It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure, competition and urgency around getting great deals.
  • 4.
  • 5. Background • The day after Thanksgiving in the U.S. is called Black Friday (BF) and serves as the traditional start to the holiday shopping season. • It is known for deep discounts (e.g., doorbusters), Black Friday shopping manifests adventure, competition and urgency around getting great deals. • Although Cyber Monday is gaining popularity, Black Friday shopping continues to be popular because of an abundance of doorbuster deals, instant gratification, and the benefit of social shopping.
  • 6. Objective • Predicting Purchase • Build a simple Machine Learning model that can predict how much a Customer is likely to spend on the eve of Black Friday. • Pattern Recognition • Reveal and Understand the most important factors from predictors such as Age, Gender, City of Residence etc., that influence the spending of a Customer. • Establish a quantitative impact of the revealed factors and how they influence Purchase by a Customer on a personal level i.e., whether they have a positive or negative contribution on the Purchase.
  • 7. • Black Friday sales in US still accounts for a whopping 6 Billion $ in revenue.[1] • In order to compete with Online Shopping Platforms, Brick and Mortar based Retailers need to figure out how to boost Sales during the most important Shopping Day of the Year. • By understanding the Purchase Patterns of the Customers Retailers can provide improved Service Quality. • Improve Staffing and Inventory of the Retail Store. • Increase Revenue and Sales. Motivation [1] https://www.forbes.com/sites/andriacheng/2018/11/26/black-friday-cyber-monday-sales-are-hitting-another- high-but-its-not-time-to-cheer-yet/#6d2ac36256c6
  • 9. Dataset • Collection: • The data comes from a competition hosted by Analytics Vidhya[2]. • Description: • The Dataset comprises of 550000 observations about the Black Friday in a retail store. • It contains various kinds of variables either Numeric or Categorical in nature. The dataset contains 2 columns with missing values: • 166986 observations missing in column ‘Product_Category_2’. • 373299 observations missing in column ‘Product_Category_3’. [2] https://www.kaggle.com/mehdidag/black-friday/home
  • 10. Description Name Data Type User ID Integer(Discrete) Product ID Categorical(Discrete) Gender Categorical(Nominal) Age Categorical(Ordinal) Occupation Categorical(Nominal)[Masked] City_Category Categorical(Nominal) Stay_In_Current_City Categorical(Ordinal) Marital_Status Categorical(Nominal) Product_Category_1 Categorical(Nominal)[Masked] Product_Category_2 Categorical(Nominal) [Masked] Product_Category_3 Categorical(Nominal) [Masked] Purchase Integer(Continuous)
  • 11. Pre-Processing • Most of the raw data contained in any given Dataset is usually unprocessed, incomplete, and noisy. • In order to be useful for data mining purposes, the Dataset needs to undergo pre-processing, in the form of ‘Data Cleaning’ and ‘Data Transformation’. • Handling Missing Values[3] . • Handling Outliers. [3] Gallit Shmueli, Nitin Patel, and Peter Bruce, Data Mining for Business Intelligence, 2nd edition, John Wiley and Sons, 2010
  • 12. Exploring Categorical Variables • Male shoppers are more frequent than Female Shoppers.
  • 13. Exploring Categorical Variables • Age bracket 18-45 shops the most.
  • 14. Exploring Categorical Variables • Top 5 Customers by Purchase: 4, 0,7,1,17 • Lowest 5 Customers by Purchase: 19,13,18,9,8
  • 15. Exploring Categorical Variables • Un-Married People are more frequent shoppers.
  • 16. Exploring Categorical Variables • Top 5 Product Categories account for 82% of the items sold. • Product belonging to category 5, 1 and 8 are most likely to be sold on
  • 20. Statistical Analysis • Univariate Statistical Analysis • Multivariate Statistics • Chi-square Test of Independence • One-Way ANOVA
  • 21. Univariate Statistical Analysis Parameter Purchase(in US $) Mean(µ ) 9333.86 Standard Deviation 4981.02 Median 8062 Minimum 185 Maximum 23961
  • 22. Multivariate Statistics: Chi Square Test of Independence AGE CITY CATEGORY GENDER MARITAL STATUS OCCUPATION PRODUCT CATEGORY- 1 STAY AGE CITY CATEGORY YES GENDER YES YES MARITAL STATUS YES YES YES OCCUPATION YES YES YES YES PRODUCT CATEGORY-1 YES YES YES YES YES STAY YES YES YES YES YES YES • A chi-square analysis was performed to determine whether each Category was represented across all the groups proportionally to their numbers in the sample. The analysis produced a significant χ2 value, indicating that groups were overrepresented in any of the categories.
  • 23. Multivariate Statistics: One Way ANOVA • GENDER • We performed a one-way ANOVA to compare the Two group’s average Purchase on the eve of Black Friday. This analysis produced a statistically significant result (F(1,9998) = 47.34 , p < .05 ). • Post hoc Tukey test revealed that the only significant difference between the groups was found between Male(µ = 9504.77) and Female(µ = 8809.76), with the Male spending more on Purchase significantly more than the Females. • CITY CATEGORY • We performed a one-way ANOVA to compare the Three group’s average Purchase on the eve of Black Friday. This analysis produced a statistically significant result (F(2,9997) =37.26 , p < .05 ). • Post hoc Tukey test revealed that significant difference between the groups was found between City A(µ = 8958.01), City B(µ =9198.65), and City C(µ = 9844.44 )with the City C Purchasing significantly more than City A and City B.
  • 24. Feature Engineering Variable Conversion Type ‘User_ID’ Used as Raw Feature. ‘Product_ID’ Used as Raw Feature. ‘Gender’ Converted to Binary. ‘Age’ Converted to Numeric. ‘Marital_Status’ Converted to Binary. ‘Occupation’ Used as Raw Feature. ‘City_Category’ One-Hot Encoded. ‘Stay_In_Current_City’ Converted to Numeric. ‘Product_Category_1’ Used as Raw Feature.
  • 26. Feature Engineering • Discretization • Polychotomization • Response/Target Transformation • Feature Creation: • Based on Average Feature Purchase • Based on Feature Frequency
  • 27. Model Selection: Multiple Linear Regression • Model selection criteria: • Simple • Retains explainability • Easy to understand and Implement • Model that helps in answering important Business related Questions such as: • Is there a relationship between Purchase on Black Friday by a Customer and Predictor variables? • How strong is the relationship? • Which Predictor contributes to the Purchase on the eve of Black Friday? • How large is the effect of each predictor on Purchase? • How accurately can we predict the Purchase? • Is the relationship linear?
  • 28. Model Development • Step 1: Data Transformation • Step 2: Data division using ‘Validation Set Approach’[4] • Step 3: Model Development [4] G. James et al., An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, © Springer Science+Business Media New York 2013
  • 29. Model Evaluation • Metrics used: • RMSE • R2 • Adjusted R2
  • 30. Model Evaluation Feature Engineering Techniques DC Data Conversion DB Data Binning AFP Average Feature Purchase FF Feature Frequency Regression Models Training Set Validation Set RMSE R2 Adjuste d R2 RMSE R2 Adjusted R2 Baseline Model 4707.5 3 0.11 0.11 4715.4 9 0.11 0.11 Model 1(DB) 3888.1 7 0.39 0.39 3895.5 5 0.39 0.39 Model 2(AFP + FF) 4979.6 7 0 0 4984.4 4 0 0 Model 3(DC + FF) 2903.5 0.66 0.66 2906.6 5 0.66 0.66 Model 4(DC + AFP) 4979.7 1 0 0 4984.3 6 0 0 Ridge Regression(Model 3) 2903.8 4 0.66 0.66 2906.9 6 0.66 0.66 LASSO Regression(Model 3) 2928.4 8 0.65 0.65 2930.1 2 0.66 0.66
  • 31. LASSO Regression • Performs variable selection by forcing some of coefficient estimates to be zero. • Simpler and more interpretable model than Ridge. • Handles Multicollinearity. • Initial 52 variables were in Model-3. • Post LASSO Regularization:18 variables were left.
  • 32. Statistical Learning OLS Regression Results Dep. Variable: Purchase R-squared: 0.653 Model: OLS Adj. R-squared: 0.653 Method: Least Squares F-statistic: 3.935e+04 Date: Sun, 06 Jan 2019 Prob (F-statistic): 0.00 Time: 17:40:07 Log-Likelihood: -3.5381e+06 No. Observations: 376303 AIC: 7.076e+06 Df Residuals: 376284 BIC: 7.076e+06 Df Model: 18 Covariance Type: nonrobust
  • 33. coef std err t P>|t| [0.025 0.975] Const 1.134e+04 20.082 564.769 0.000 1.13e+04 1.14e+04 Product_Category_1_10 6534.2674 50.434 129.561 0.000 6435.419 6633.116 Product_Category_1_7 4267.2267 58.652 72.755 0.000 4152.270 4382.183 Product_Category_1_6 2659.5272 26.192 101.541 0.000 2608.192 2710.862 Product_Category_1_16 2026.6088 36.722 55.187 0.000 1954.634 2098.583 Product_Category_1_15 2123.6187 45.426 46.749 0.000 2034.586 2212.651 City_Category_C 283.9126 10.471 27.114 0.000 263.389 304.436 Age 10.0330 0.359 27.939 0.000 9.329 10.737 Product_ID_Counts 2.5978 0.014 185.461 0.000 2.570 2.625 Stay_In_Current_City_Years 7.8901 3.708 2.128 0.033 0.622 15.158 Occupation_1 -162.6174 17.166 -9.473 0.000 -196.262 -128.973 Product_Category_1_3 -2811.2377 26.454 -106.270 0.000 -2863.086 -2759.389 Product_Category_1_8 -5218.7197 13.907 -375.253 0.000 -5245.977 -5191.462 Product_Category_1_18 -9453.6809 64.223 -147.202 0.000 -9579.555 -9327.806 Product_Category_1_11 -7742.6858 24.644 -314.179 0.000 -7790.988 -7694.384 Product_Category_1_5 -6633.2756 12.698 -522.406 0.000 -6658.162 -6608.389 Product_Category_1_12 -1.122e+04 56.755 -197.758 0.000 -1.13e+04 -1.11e+04 Product_Category_1_4 -1.045e+04 33.805 -309.155 0.000 -1.05e+04 -1.04e+04 Product_Category_1_13 -1.191e+04 48.513 -245.426 0.000 -1.2e+04 -1.18e+04
  • 35. Residual Analysis • Non-Linearity of the Response- Predictor Relationship: • No visible pattern in the residuals.
  • 36. Residual Analysis • Heteroskedasticity: • Funnel shape is evident • Response Log-Transformed in order to achieve Homoskedasticity
  • 37.
  • 38. Results • Based on Descriptive Analytics • Based on Behavioural Analytics • Based on Predictive Analytics • Based on Prescriptive Analytics
  • 39. Results • Based on Descriptive Analytics: • Male Shoppers are likely to buy more Products than Female Shoppers. • Older(40+) people are likely to spend more irrespective of their marital status. • Customers who arrived recently in City-B and City-C are likely to shop less frequently than those who stayed longer(Acclimatization can be an issue).
  • 40. Results • Based on Behavioural Analytics: • Keeping Products that are more likely to sell on the front of the store will lead to an increase in the Sales.[6] • Products ‘1’, ‘5’ and ‘8’ of Product_Category_1 are highest selling Products. So, should be kept at the front of the Store. [6] Fließ, Sabine & Hogreve, Jens & Nonnenmacher, Dirk. (2004). Emotional Effects of Shop Window Displays on Consumer Behavior.
  • 41. Results • Based on Predictive Analytics: • Purchase is heavily influenced by Product Category. • People of 60+ Age will spend as much as 600$ more than Teenagers. • People belonging to Occupation-1 are likely to spend less. • Product Category that have an average price over 9000$ are likely to influence Purchase positively and vice versa. • City C Customers will spend 283$ more than other city Customers.
  • 42. Results • Based on Prescriptive Analytics: • If the Price of ‘Product-5’ is increased by 5%, ‘Product-1’ by 3% and ‘Product-8’ by 4% then the Revenue will increase by 150 Million $ which is higher than the combined Revenue of eight lowest selling Products.
  • 43. Future Scope: Model Deployment