Slides from my talk at the RecSys Stammtisch at SoundCloud in Berlin. The presentation is split in two part one focusing on ranking and relevance and one on diversity and how to achieve it using genres. We introduce a novel diversity metric called Binomial Diversity.
3. Telefonica Research in Barcelona
• User Modeling: Recommender Systems
• Data Mining, Machine Learning
• Multimedia Indexing and Analysis
• HCI
• Mobile Computing
• Systems and Networking
• http://www.tid.es
11. Publications in Ranking
CIKM 2013: GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains
RecSys 2013: xCLiMF: Optimizing Expected Reciprocal Rank for Data with
Multiple Levels of Relevance
RecSys 2012: CLiMF: Learning to Maximize Reciprocal Rank with Collaborative
Less-is-More Filtering * Best Paper Award
SIGIR 2012: TFMAP: Optimizing MAP for Top-N Context-aware Recommendation
Machine Learning Journal, 2008: Improving Maximum Margin Matrix Factorization
* Best Machine Learning Paper Award at ECML PKDD 2008
RecSys 2010: List-wise Learning to Rank with Matrix Factorization for Collaborative Filtering
NIPS 2007: CoFiRank - Maximum Margin Matrix Factorization for Collaborative Ranking
13. Popular Ranking Methods
• In order to generate the ranked item list, we need some
relative utility score for each item
• Popularity is the obvious baseline
• Score could depend on the user (personalized)
• Score could also depend on the other items in the list (list-wise)
• One popular way to rank the items in RS is to sort the
items according to the rating prediction
• Works for the domains with ratings
• Wastes the modeling power for the irrelevant items
16. Matrix Factorization
(for ranking)
• Randomly initialize item vectors
• Randomly initialize user vectors
• While not converged
• Compute rating prediction error
• Update user factors
• Update item factors
• Lets say user is [-100, -100]
• Compute the square error
• (5-<[-100, -100], [0.180, 0.19]>)2=1764
• Update the user and item to the direction
where the error is reduced
(according to the gradient of the loss)
8 items with ratings
and random factors
19. Learning
to
Rank
for
Top-‐k
RecSys
• Usually
we
care
about
accurate
ranking
and
not
ra=ng
predic=on
• Square
Error
loss
op=mizes
to
accurately
predict
1s
and
5s.
• RS
should
get
the
top
items
right
-‐>
Ranking
problem
• Why
not
to
learn
how
to
rank
directly?
• Learning
to
Rank
methods
provide
up
to
30%
performance
improvements
in
off-‐line
evalua=ons
• It
is
possible,
but
a
more
complex
task
20. Example: average precision (AP)
• AP: we compute the precision at each relevant position
and average them
AP =
|S|
X
P (k)
k=1
|S|
P@1+ P@2 + P@4 1 /1+ 2 / 2 + 3 / 4
=
= 0.92
3
3
21. Why is hard? Non Smoothness
Example: AP
u:[-20,-20]
u:[20,20]
23. The Non-smoothness of
Average Precision
AP =
|S|
X
P (k)
k=1
|S|
APm = PN
1
i=1
ymi
N
N
X ymi X
i=1
rmi
j=1
ymj I(rmj rmi )
ymi
is 1 if item i is relevant for user m and 0 otherwise
I(·)
indicator function (1 if it is true, 0 otherwise)
rmi Rank of item i for user m
24. How can we get a smooth-AP?
• We
replace
non
smooth
parts
of
MAP
with
smooth
approxima=on
1
rmi
⇡ g(fmi ) = g(hUm , Vi i)
g(x) = 1/(1 + e
x
)
25. How can we get a smooth-MAP?
• We
replace
non
smooth
parts
of
MAP
with
smooth
approxima=on
I(rmj rmi ) ⇡ g(fmj
fmi ) = g(hUm , Vj
Vi i)
g(x) = 1/(1 + e
x
)
28. Ranking Inconsistencies
• Achieving a perfect ranking for all users is not possible
• Two Sources of Inconsistencies:
• 1) Factor Models (all models) have limited expressive
power and cannot learn the perfect ranking for all users
• 2) Ranking functions approximations are inconsistent e.g.
A >B & B>C but C > A
43. Binomial Diversity
• We base a new Diversity Metric on the Binomial
Distribution
✓ ◆
N k
P (X = k) =
p (1
k
p)
N
k
44. User Genre Relevance
• Fraction of Items of genre “g” user interacted
with
Iu
p00
g
kg
=
|Iu |
• Global fraction of items of genre “g”
• Mix
P Iu
u kg
0
pg = P
u |Iu |
pg = (1
↵)
0
pg
+↵
00
pg
45. Coverage
• Product of the probabilities of the genres not represented
in the list not being picked by random
Coverage(R) =
Y
g 2G(R)
/
P (Xg = 0)
1/|G|
46. Non-Redundancy
P (Xg
k | Xg > 0) = 1
N onRed(R) =
Y
g2G(R)
k 1
X
l=1
P (Xg
P (Xg = l | Xg > 0)
R
kg | Xg > 0)1/|G(R)|