Recommender Systems

Usman Sharif

RECOMMENDATION SYSTEMS

Why recommendation systems?

 Provide a better experience to your users.
 Understand the behavior and patterns of
users.
 Enables an opportunity to re-engage inactive
users.
 Boost sales
 Better than a search feature

How some companies are using
Recommendation Systems - Amazon

How some companies are using
Recommendation Systems - Gmail

A simple recommendation system

 Consider the following scenario
 A library has books and has members
 Members can have books issued
 The library wants to build a recommender system
to recommend books to their members

Scoring Matrices
Book 1 Book 2 Book 3 Book 4
User 1 X X
User 2 X
User 3 X X
User 4 X X X
User 5 X X

Book 1 Book 2 Book 3 Book 4
Book 1 4 1 2 1
Book 2 1 2 0 1
Book 3 2 0 2 1
Book 4 1 1 1 2

Using the scoring matrices

 If a user has read Book 1 recommend Book 3, 2, 4.

Advantages

 Very simple to understand and implement.
 Works really well if you’re interested in
looking at user’s one activity to recommend
further.

Disadvantages

 Cannot work for a new user with no history.
 In a real world scenario where there are
thousands of books and thousands of
members, there are bound to be too many
zeroes (a sparse matrix).
 Does not consider more than 1 item.

Another Try
 Our Books records might look like this:
BookId Title Genre Writer Language
1 The Great Gatsby Classic F Scott Fitzgerald English
2 Nine Stories Short Stories J D Salinger English
3 The Sun Also Rises Classic Ernest Hemingway English
4 The Hunger Games Action Suzanne Collins English
5 The Ambler Warning Thriller Robert Ludlum English
6 The Catcher in the Rye Classic J D Salinger English
7 To Kill a Mockingbird Classic Harper Lee English

Create an Item Similarity
Matrix
Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7
Book 1 3 1 2 1 1 2 2
Book 2 1 3 1 1 1 2 1
Book 3 2 1 3 1 1 2 2
Book 4 1 1 1 3 1 1 1
Book 5 1 1 1 1 3 1 1
Book 6 2 2 2 1 1 3 2
Book 7 2 1 2 1 1 2 3
• This would always be a square (n x n) matrix.
• Each cell has the count of similar attributes (excluding unique attributes).
• In general any measure for similarity can be used here.

To Recommend

 Look at what a user has previously read.
 Use the values from the similarity matrix and
recommend books based on how similar it is
to the book the user has already read.

Advantages

 Recommendations can be pre-computed for
a very large Item base.
 Fast lookups can be built to perform
recommendations.
 For example, if a user is seeing the page of
Book 3, you may want to recommend them
Books 1, 6 and 7.
 Would work for new/non-registered users.

Disadvantage

 Does not consider the user’s history.
 Instead looks at a collective trend.

Another Approach - The Users

 Our Users records might look like this:
UserId Gender Age Location
1 Male 34 Pakistan
2 Female 28 Pakistan
3 Male 38 India
4 Male 32 India

The User Borrowing
UserId BookId
1 3
1 7
2 2
3 1
3 5
3 7
4 6
4 7
5 2
6 4
6 6
6 7

Transforming User Borrowing
User 1 User 2 User 3 User 4 User 5 User 6
Book 1 X
Book 2 X X
Book 3 X
Book 4 X
Book 5 X
Book 6 X X
Book 7 X X X X

• Issue with too many zero values.
• Any solutions?

Transform the Users Records

 Consider Age as a discrete column with
ranges like {0-10, 11-20, 21-30, 31-40, …} so
that we can create some partitions like this:
PartitionId Gender AgeGroup Location
1 Male 31-40 Pakistan
2 Female 21-30 Pakistan
3 Male 31-40 India

Recreate User Borrowing using
Partition Information
 Lesser zero valued records (11/21 compared to
30/42 previously)
 Much less columns than we previously had!
 The notation has been changed from ‘X’ to
count. Partition 1 Partition 2 Partition 3
Book 1 1
Book 2 2
Book 3 1
Book 4 1
Book 5 1
Book 6 1 1
Book 7 1 1 2

To Recommend

 See what partition a user belongs to.
 Look at the column of that partition and sort
the books in descending order based on their
frequency count.

Advantages

 Continues to improve over time.
 More partitions can be added over time.
 Instead of using a collective scoring, the
technique partitions the user base into
‘similar’ users.
 The technique can easily be extended on the
item side and rather than having books as
rows, we can have book clusters.

Disadvantages

 Needs some seed data to start.
 Requires some transformations.
 Can become very complex as the number of
users/items grow.

Evaluating Performance
(Metrics)
 Almost any Information Retrieval metric can
be used.
 Three interesting ones:
 Accuracy
 Coverage
 Normalized Distance Based Performance Measure
(NDPM)

Accuracy
• Takes into account the order in which recommendations are
shown to users and how they responded to them.
• For rank position = 1:
• Acc(1) = # of Positive responses with rank less than or
equal to 1 / total recommendations with rank less than or
equal to 1
• Therefore, Acc(1) = 1 / 3 = 33.33%
• Similarly, Acc(2) = 2 / 6 = 33.33%
UserId BookId Rank Response
1 3 1 Yes
1 2 2 No
2 7 1 No
2 5 2 Yes
3 3 1 No
3 7 2 No

Coverage
 Shows the coverage of items that appear in the
recommendations for all users.
 For rank position = 1:
 Cov(1) = Unique items in recommendations with rank less
than or equal to 1 / total items.
 Therefore, Cov(1) = 2 / 7 = 28.57%
 Similarly, Cov(2) = 4 / 7 = 57.14%
1 3 1 Yes
1 2 2 No
2 7 1 No
2 5 2 Yes
3 3 1 No
3 7 2 No

Normalized Distance Based Performance
Measure (NDPM)
 Assesses the quality of the measure of recommendation system taking into account the
ordering in which items are shown.
 NDPM = (C- + 0.5 x C+) / Cu
 C- - is the number of recommended item pairs where user responded as (No, Yes).
 C+ - is the number of recommended item pairs where user responded as (Yes, No).
 Cu - is the number of all item pairs where the user’s response was not same.
 In our example,
 C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%
 C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%
 NDPM = (0.75 + 0.5) / 2 = 62.5%
1 3 1 Yes
1 2 2 No
1 7 3 No
1 5 4 Yes
2 3 1 Yes
2 7 2 No

How to improve results

 Ensure that you maintain a list of already
seen recommendations for users and don’t
recommend them back for some time.
 Provide some sort of mechanism to user to
provide information about what they’re
looking for.
 Infer the above from user searches.

Some standard algorithms
 Item Hierarchy
 You bought a printer, you will also need ink.
 Attribute-based recommendations
 You like reading classics, written by Salinger, you might like “Catcher in
the Rye”.
 Collaborative Filtering – User-User Similarity
 People like you who read “The Hunger Games” also read “The Ambler
Warning”.
 Collaborative Filtering – Item-Item Similarity
 You like “Catcher in the Rye” so you will like “Nine Stories”.
 Social + Interest Graph Based
 Your friends like “The Great Gatsby” so you will like “The Great Gatsby”
too.
 Model Based
 Training SVM, LDA, SVD for implicit features.

Some Tools

 Apache Mahout (Java)

 Crab (Python)

 Easyrec (RESTful API)

Thankyou!

www.usman-sharif.com
@sharif_usman

Recommender Systems

Recommandé

Recommandé

Contenu connexe

Similaire à Recommender Systems

Similaire à Recommender Systems (20)

Dernier

Dernier (20)

Recommender Systems