Require to build a recommendation engine for new and existing users. Each users should be recommended top five offers on basis of their likings and preferences
2. Problem Statement
• As Internet population is growing exponentially
need for monitoring web user’s traffic is useful for
future business growth of ecommerce players and
other retailers.
• Online shopping platform generates huge amount of
log data with transactional data of users. What
products should be offer to target customers and
customer acquisition requires appropriate Data
analysis and prescriptive analytics to convert results
into actionable insights and recommend products
accordingly .
3. Objective
Require to build a recommendation engine for new
and existing users. Each users should be
recommended top five offers on basis of their likings
and preferences
4. Available Data
• Following Data files are available:-
Customer Details
Seller and Merchant
Bank and payment
Offer and category
• Event data:- Text log file consists of user’s interaction
data with app including preference and likings.
16. Hypothesis Statements
• Customer who spent more than average session time
on Navigating app may buy items.
• Discount offers may have significant impact on
buying pattern of customer.
• The people who bought only item1 but they have
similar characteristics (in terms of variables) to the
people who bought both item1 and item2, are more
likely to buy item2 also.
• Customers who are younger than or equal to average
Age of population are potential buyers.
• Customers who does transaction frequently are
regular buyers.
17. Next Steps
• Feature Engineering
• Hypothesis Testing
• Model Building
• Model Evaluation
18. Feature Engineering
• Offer Tenure:-It is difference in offer start date and
end date
• Age of Customer
• Vintage:-calculates period of customer being active
on system starts from acquired date.
• Transaction count:-Total unique count of transaction
done by customer
• Session Time:-Difference between user logging time
and exit time on application.
19. Hypothesis testing
Alternative hypothesis:-Customer who spent more than average
session time on Navigating app may buy items.
Null hypothesis:-No significant impact on buying due to session
time .
Calculated variables:-
a. population mean(mu0)
b. Sample mean(x bar)
c. Population standard deviation(sigma)
d. Level of significance (alpha=5%)
e. Sample size(n)
f. Test statistics(Z test)
g. Critical Value(Z critical)
22. Collaborative Filtering
• The process of identifying similar users and
recommending what similar users like is called
collaborative filtering.
Basic assumption and idea
• –Users buys items (implicitly or explicitly)
• –Customers who had similar tastes in the past,
will have similar tastes in the future
23. Content Based Filtering
• A content based recommender works with data that
the user provides, either explicitly or implicitly
(clicking on a app). Based on that data, a user profile
is generated, which is then used to make suggestions
to the user. As the user provides more inputs or takes
actions on the recommendations, the engine
becomes more and more accurate.
24. Important Terms
• Find the Term Frequency(TF): TF is simply the
frequency of a word in a document.
• Find the Inverse Document Frequency(IDF): It is the
inverse of the document frequency among the whole
corpus of documents.
• Vector space model: In this model, each item is
stored as a vector of its attributes (which are also
vectors) in an n-dimensional space and the angles
between the vectors are calculated to determine the
similarity between the vectors.
25. Dataset
Training data: We have user, offer id and
category id, transaction count and
demographic details .
• –For example, (125 , 13942 , 4) means that the
user with ID 125 shops with offer ID 13942
transaction count of 4 .
• 1430 users and 1430 offers.
26. Popularity Model Result
where all the users have same recommendation based on the most popular choices.
We’ll use the graphlab recommender functions popularity_recommender for this.
27. Item similarity Model
There are 3 types of item similarity metrics
• Jaccard Similarity:
– It is typically used where we don’t have a numeric
rating but just a boolean value like a product being
bought or an add being clicked
• Cosine Similarity:
– Similarity is the cosine of the angle between the 2
vectors of the item vectors of A and B
– Closer the vectors, smaller will be the angle and
larger the cosine
• Pearson Similarity
– Similarity is the pearson coefficient between the
two vectors.
29. Model Evaluation
• Recall:
– What ratio of items that a user likes were actually
recommended.
– If a user likes say 5 items and the
recommendation decided to show 3 of them, then
the recall is 0.6
• Precision
– Out of all the recommended items, how many the
user actually liked?
– If 5 items were recommended to the user out of
which he liked say 4 of them, then precision is 0.8