Recommendation Systems Roadtrip

Recommendation Roadtrip
DYLAN VALERIO
Recommendation
Systems using
Python
WEAR SEATBELTS

Dylan Valerio
Software engineer > data scientist > Kaggler > academic
ADMU, BS CS, UP Diliman, MS CS
IT Security, e-commerce, internet technologies
Natural language processing, computer vision, deep
learning, recommendation systems

(Yes, this is my to-read list)
I like collecting these art for my tabletop games Ask and answer away in Quora.
“Recommendation is invaluable
for companies with content and
users of all sizes. It boosts
engagement and loyalty to the
brand.”
Mendeley is a site for researchers and their references
The Spotify Mix automatically crafts
recommendations from your favorite music
Productivity-buster!

Amazon has more than 500M
products in the US and is
estimated to have 65M
Amazon Prime Users.
There is a
deluge of
content for
users.
Netflix has 130M
subscribers and 8000
movies and TV shows
Spotify has 180M users
and 30M songs.Pinterest has 70M active users, 50B pins and
1B boards.
Quora has some 11M
questions and 30M
answers.
That’s a lot of content. Recommendation is an
absolute must for the user to even begin
consuming content.

Different Paradigms of Recommendation
watched
Content Filtering Collaborative Filtering
watched
Similar tags: crime, Robert de Niro, dark, mob
recommends
recommends
Pros More interesting for users
Cons Items with no usage (Cold-start)
Pros Readily explainable; Fast
Cons Stale and unchanging
• No free lunch
• It’s a quickly growing field with vast literature and domain-specific nuances

Content-Based
Filtering
“Are we not like two volumes of
the same book?”

Separate Tags to Individual Columns

Item to Item Content-Based Similarity

Examples
get_similar_items is my own function – it just maps the anime id to its name.
Dragon Ball and Naruto Slam Dunk and Kuroko no Basket

Examples
get_similar_items is my own function – it just maps the anime id to its name.
Psycho Pass and Kimi no Nawa Evangelion and Gundam Seed

Collaborative
Filtering - Memory
Based Algorithms
Alone we can do so little;
together we can do so much.

Ratings Matrix
Conceptual View Data View

Memory-Based Models
User-based K-Nearest Neighbor Recommendation
Intuition: Find the most enjoyed items by the users closest to me in terms of what they watch.
Item-based K-Nearest Neighbor Recommendation
Intuition: Find the closest items of the items I enjoyed in terms of the users that enjoyed both.
Similarity between users
Similarity between items

K-Nearest Neighbor Recommenders
User Based
For each user u:
neighbors <- get closest users to u
new_items <- get items u has not rated before from neighbors v
For each new_item i and neighbor v:
Accumulate weighted_scores <- similarity(u,v) * rating(v,i)
Normalize and sort
Item Based
For each user u:
my_items <- u’s rated items
close_items <- get items close to my_items
For each close_item i and my_item j:
Accumulate weighted_scores <- similarity(i,j) * rating(u,j)
Normalize and sort

MovieLens 20M My Recommendations
My Top Ratings20M ratings
138k users
26k movies
99.46% zero entries

My Own User-Based Recommender
Get closest users
Get items I haven’t
rated
For each neighbor and
new items, compute
weighted score
Normalize
Sort & serve

Example Recommendations
Top 20 Rated Items Top 10 Recommendations

Goals of Recommendation
Minimize difference of
ratings
Rank the
recommendation list
Business metrics
Click through rate
Customer conversion rates
MachineLearning
Metrics

Evaluating a Good Recommender
We take out a fraction of watches from each user.
We then compare the similarity of our predicted
recommendations to actual watched items.
Error-Based Metrics
• RMSE, MAE
Ranking Metrics
• Precision
• Recall
• Normalized Discounted Cumulative Gain
Other metrics
• Diversity
• Novelty
• Serendipity

Surprise Library
Cross Validation:
Use 2/3 of the data, test on 1/3,
repeat 3 times.
Root Mean Squared Error (RMSE)
Sum of the square of the
differences of the expected from
the predicted rating.
Simple Python Recommendation System Engine (????)

Extension to User-Based Recommender
Get closest users
Get items I haven’t
rated
For each neighbor and new
items, compute weighted score
taking into account the mean
of how I and others rate
Normalize
Sort & serve

Recommendations from the extension
Top 20 Rated Items Top 10 Recommendations

Collaborative
Filtering – Matrix
Factorization
THREE THINGS CANNOT BE
LONG HIDDEN: THE SUN, THE
MOON AND THE TRUTH

Collaborative Filtering : Matrix Factorization
• Latent factors describe the structure of the data beyond the noise
• There are two latent variables, the items and the users, rows and columns, respectively.
• Can “recover” the missing values in the ratings matrix
User
Item
• Surprise covers SVD, which uses explicit ratings
• Implicit covers Weighted Matrix Factorization (WMF), which uses implicit ratings

Steam Games
200k ratings
12k users
5k games
99.68% zero entries
You Might Like… Why? https://itstherealdyl.wordpress.com/2017/07/30/you-might-like-why/

Implicit Library
Data Preparation
Convert matrix to sparse format.
Sparse format can accommodate
BIG datasets.
Model:
100 dimension latent matrix
0.1 regularization
20 iterations
4 threads

Implicit Library
Bookkeeping:
Since in sparse format, the
original ID’s are lost.
Model Explainability:
Similarity of item i to j
weighted by how much the
user enjoyed “i”

A Simple Production Setup
Your App
User to Item
Interactions
Machine Learning
Recommendations
& Explanations
Recommendations
API

Rest Stop
TODAY
Content-based Filtering
Collaborative Filtering
Deployment Strategy
NEXT STOPS
Other matrix factorization algorithms
Hybrid Recommenders
Session-based recommendation
RecSys + NLP / CV
Dynamic RecSys
Security in RecSys

Recommendation Systems Roadtrip

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Recommendation Systems Roadtrip

Similaire à Recommendation Systems Roadtrip (20)

Dernier

Dernier (20)

Recommendation Systems Roadtrip

Notes de l'éditeur