Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

IIR 2016, VENEZIA, ITALY
COMPUTING NEIGHBOURHOODS WITH LANGUAGE
MODELS IN A COLLABORATIVE FILTERING SCENARIO
Daniel Valcarce, Javier Parapar, Álvaro Barreiro
@dvalcarce @jparapar @AlvaroBarreiroG
Information Retrieval Lab
@IRLab_UDC
University of A Coruña
Spain

Outline
1. Introduction to Recommender Systems
2. Neighbourhood-based Methods
3. Computing Neighbourhoods
4. Language Models for Neighbourhoods
5. Experiments
6. Conclusions and Future Directions
1/26

INTRODUCTION TO RECOMMENDER SYSTEMS

Recommender Systems
Recommender systems provide personalised suggestions for
items that may be of interest to the users.
Top-N Recommendation: create a ranking of the N most
relevant items for each user.
Different approaches:
Content-based: exploit item description to recommend
items similar to those the target user liked in the past.
Collaborative filtering: rely on the user feedback such as
ratings or clicks to generate recommendations.
Hybrid: combination of content-based and collaborative
filtering approaches.
3/26

Collaborative Filtering
Collaborative Filtering (CF) exploit feedback from users:
Explicit: ratings or reviews.
Implicit: clicks or purchases.
Two main families of CF methods:
Model-based: learn a model from the data and use it for
recommendation.
Neighbourhood-based (or memory-based): compute
recommendations using directly part of the ratings.
4/26

Neighbourhood-based Methods
Two perspectives:
User-based: recommend items that users with common
interests with you liked.
Item-based: recommend items similar to those you liked.
Similarity between items is computed using common users
among items (not the content!).
6/26

Weighted Sum Recommender (WSR)
Very simple but eﬀective approach (Valcarce et al., ECIR 2016).
WSR computes a weighted sum of the ratings in the
neighbourhood. Weights are calculated using cosine similarity.
Item-based version (WSR-IB):
ˆru,i
j∈Ji
cosine i, j ru,j (1)
User-based version (WSR-UB):
ˆru,i
v∈Vu
cosine (u, v) rv,i (2)
7/26

Weighted Sum Recommender (WSR)
Very simple but eﬀective approach (Valcarce et al., ECIR 2016).
WSR computes a weighted sum of the ratings in the
neighbourhood. Weights are calculated using cosine similarity.
Item-based version (WSR-IB):
ˆru,i
j∈Ji
cosine i, j ru,j (1)
User-based version (WSR-UB):
ˆru,i
v∈Vu
cosine (u, v) rv,i (2)
The computation of neighbourhoods is crucial!
7/26

Computing Neighbourhoods with k-NN algorithm
The effectiveness of neighbourhood-based methods relies
largely on how neighbours are computed.
The most common approach is to compute the k nearest
neighbours (k-NN algorithm) using a pairwise similarity.
The most common similarities are Pearson’s correlation
coefficient or cosine similarity.
Cosine provides important improvements over Pearson’s
correlation coefficient (Cremonesi et al., RecSys 2010).
9/26

Computing Neighbourhoods with k-NN algorithm
The effectiveness of neighbourhood-based methods relies
largely on how neighbours are computed.
The most common approach is to compute the k nearest
neighbours (k-NN algorithm) using a pairwise similarity.
The most common similarities are Pearson’s correlation
coefficient or cosine similarity.
Cosine provides important improvements over Pearson’s
correlation coefficient (Cremonesi et al., RecSys 2010).
Let’s study cosine similarity from the perspective of
Information Retrieval.
9/26

Cosine Similarity and the Vector Space Model
Recommendation Information Retrieval
Target user Query
Rest of users Documents
Items Terms
10/26

Target user Query
Items Terms
Under this scheme, using cosine similarity for ﬁnding
neighbours is equivalent to search in the Vector Space Model.
10/26

Target user Query
Items Terms
If we swap users and items, we can derive an analogous
item-based approach.
10/26

Target user Query
Items Terms
If we swap users and items, we can derive an analogous
item-based approach.
We can use sophisticated search techniques for ﬁnding
neighbours!
10/26

LANGUAGE MODELS FOR NEIGHBOURHOODS

Language Models
Statistical language models are a state-of-the-art framework for
document retrieval.
Documents are ranked according to their posterior probability
given the query:
p(d|q)
p(q|d) p(d)
p(q)
rank
p(q|d) p(d)
12/26

Language Models for Finding Neighbourhoods (II)
p(v|u)
rank
p(v)
i∈Iu
p(i|v)rv,i
We assume a multinomial distribution over the count of ratings.
The maximum likelihood estimate (MLE) is:
pmle(i|v)
rv,i
j∈Iv
rv,j
14/26

Language Models for Finding Neighbourhoods (II)
p(v|u)
rank
p(v)
i∈Iu
p(i|v)rv,i
We assume a multinomial distribution over the count of ratings.
The maximum likelihood estimate (MLE) is:
pmle(i|v)
rv,i
j∈Iv
rv,j
However it suﬀers from sparsity. We need smoothing!
14/26

Experimental settings
Baselines:
Pearson’s correlation coeﬃcient
RM1Sim: user-based similarity (Bellogín et al., RecSys ’13)
Cosine similarity
Our similarities are Language Models using:
Absolute Discounting smoothing
Jelinek-Mercer smoothing
Dirichlet Priors smoothing
17/26

Parameter Sensibility of WSR-UB on MovieLens 100k
0.18
0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k
0.28
0.30
0.32
0.34
0.36
0.38
0.40
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
nDCG@10
λ, δ
Pearson
Cosine
RM1Sim (λ)
LM-Absolute Discounting (δ)
LM-Jelinek-Mercer (λ)
LM-Dirichlet Priors (µ)
18/26

Parameter Sensibility of WSR-IB on R3-Yahoo!
0.012
0.014
0.016
0.018
0.020
0.022
0.024
0.026
0.028
0.030
100
101
102
103
104
105
106
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
nDCG@10
µ
λ, δ
Pearson
Cosine
LM-Absolute Discounting (δ)
LM-Jelinek-Mercer (λ)
LM-Dirichlet Priors (µ)
19/26

Precision (nDCG@10)
Algorithm ML 100k ML 1M R3-Yahoo LibraryThing
NNCosNgbr 0.1427 0.1042 0.0138 0.0550
PureSVD 0.3595a 0.3499ac 0.0198a 0.2245a
Cosine-WSR 0.3899ab 0.3430a 0.0274ab 0.2476ab
LM-DP-WSR 0.4017abc 0.3585abc 0.0271ab 0.2464ab
LM-JM-WSR 0.4013abc 0.3622abcd 0.0276ab 0.2537abcd
Table: Values of precision in terms of normalised discounted
cumulative gain at 10. Statistical significance is superscripted
(Wilcoxon two-sided p < 0.01). Pink = best algorithm. Blue = not
significantly different to the best.
20/26

Diversity (Gini@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
Cosine-WSR 0.0549 0.0400 0.0902 0.1025
LM-DP-WSR 0.0659 0.0435 0.1557 0.1356
LM-JM-WSR 0.0627 0.0435 0.1034 0.1245
Table: Values of the complement of the Gini index at 10.
Pink = best algorithm.
21/26

Novelty (MSI@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
Cosine-WSR 11.0579 12.4816 21.1968 41.1462
LM-DP-WSR 11.5219 12.8040 25.9647 46.4197
LM-JM-WSR 11.3921 12.8417 21.7935 43.5986
Table: Values of novelty in terms of Mean Self Information at 10.
Pink = best algorithm.
22/26

CONCLUSIONS AND FUTURE DIRECTIONS

Conclusions
Statistical language models are a powerful tool for computing
neighbourhoods in a collaborative ﬁltering scenario. Combined
with WSR, language models:
Provide highly accurate recommendations.
Improve novelty and diversity ﬁgures compared to cosine.
Have low computational complexity.
24/26

Future work
Explore other probability distributions:
Multivariate Bernoulli.
Multivariate Poisson.
Evaluate the use of inverted indexes to compute
neighbourhoods:
Eﬃciency.
Scalability.
25/26

THANK YOU!
@DVALCARCE
http://www.dc.fi.udc.es/~dvalcarce

Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Recommended

Recommended

More Related Content

Similar to Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Similar to Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides] (20)

More from Daniel Valcarce

More from Daniel Valcarce (7)

Recently uploaded

Recently uploaded (20)

Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]