Low rank models for recommender systems with limited preference information

Low rank models for
recommender systems with
limited preference information
PhD student: Evgeny Frolov
Supervisor: Ivan Oseledets
PhD Thesis Defense, 19.09.2018

High-level structure of the thesis
The thesis consists of 3 major parts divided into chapters:
1. Overview of low rank approximation methods.
Consists of 2 chapters on matrix- and tensor-based methods in recommender systems.
2. Proposed methods.
Consists of 3 chapters devoted to implementation details and evaluation of the new methods.
3. Software.
Describes the open-source recommendation framework, developed by the author of the thesis.
2

Recommender systems
Engine
content-based
• user features
• item characteristics
collaborative filtering
• neighborhood-based approach
• model-based approach
hybrid systems
• broadly: ensembles and integration
• narrowly: content + collaborative data
Outcome
“utility” score
Focus of my research
ranked list of items
3

Collaborative filtering in real applications
Uses collective information about human behavior in order to
predict individual interests.
This requires the ability to operate with millions of users and
items and manage highly dynamic online environments.
Low rank matrix- and tensor-based models are especially
suitable for this task and are widely used in industry.
4

A general view on matrix factorization
𝐴users
items
utility
matrix
𝑓𝑅: Users × Items → Relevance Score
Goal: find utility (or relevance) function 𝑓𝑅 such that
known entries unknown entriesIncomplete data:
𝑊⨀ 𝐴 − 𝑅 𝐹
2
→ min 𝑊 masks unknowns: 𝑤𝑖𝑗 =
1, 𝑎𝑖𝑗 is known,
0, otherwise.
The simplest way to deal
with incompleteness:
ℒ 𝐴, 𝑅 → min
As optimization problem with some loss function ℒ:
𝑅 = 𝑃𝑄 𝑇 looking for solution
in the MF form
The solution’s quality is evaluated via the top-𝑛 recommendations task:
rec(𝑖, 𝑛) = arg max
𝑛
𝑟𝑖𝑗
𝑗
5
⨀ - Hadamard product
𝑟𝑖𝑗 = 𝒑𝑖
𝑇
𝒒 𝑗
predicted utility
of item 𝑗 for user 𝑖
𝑃 𝑄 𝑇
𝒑𝑖
𝑇
𝒒 𝑗

Matrix factorization in practice
A simple model, called PureSVD,
remains one of the top-performers
in certain cases*:
higher is better
Recommendations’ quality of the most popular
collaborative filtering techniques on the Netflix data.
*P. Cremonesi, Y.Koren, R.Turrin, "Performance of Recommender Algorithms on Top-N Recommendation Tasks“, Proceedings of the 4th
ACM conference on Recommender systems, 2010, pp. 39-40.
6
𝐴0 − 𝑅 𝐹
2
→ min,
unknowns are replaced with zeros
in 𝐴0, can use standard SVD!

PureSVD
• Benefits:
• stable with global guarantees
• simple hyper-parameter tuning
• analytic solution for folding-in (the same for known users as well)
• deterministic output
• highly optimized implementations
The drawbacks are critical when preference information is limited:
• warm start scenarios (common for all recommender systems)
• scarce interaction data (typical in many domains, e.g. retail)
7
• Drawbacks:
• a bias towards higher-valued input data (user feedback)
• additional side information about users and items is ignored

The main goal of the thesis work is
to develop efficient low rank
approximation methods that:
• inherit the key benefits of the PureSVD approach
• handle the problem of limited preference information
8

Novelty
New higher order model for more accurate user preferences representation
• Based on the Tucker decomposition, uses SVD as a building block
• Applicable in warm start scenario, helps improve rating elicitation
• New metric and evaluation methodology is also proposed
New SVD-based hybrid model
• Uses generalized SVD formulation to incorporate side information about users and items
• Enjoys the benefits of the standard approach
• More efficient learning over scarce interaction data
Combined hybrid tensor-based model
• Combines the previous two methods
• Addresses weak points of its predecessors
9

1. Higher-order preference model
Users
3
1 2
54
1

Technique: Matrix factorization
Restating the problem
𝑈𝑠𝑒𝑟 × 𝐼𝑡𝑒𝑚 → 𝑅𝑎𝑡𝑖𝑛𝑔
Users
Items
3
Standard model
Users
3
1 2
54
1
𝑈𝑠𝑒𝑟 × 𝐼𝑡𝑒𝑚 × 𝑅𝑎𝑡𝑖𝑛𝑔 → 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 𝑆𝑐𝑜𝑟𝑒
Collaborative Full Feedback model – CoFFee
(proposed approach)
Technique: Tensor Factorization
𝒜 ≈ 𝒢 ×1 𝑈 ×2 𝑉 ×3 𝑊
ratings as cardinal values
𝑃 = 𝑉𝑉 𝑇 𝑃𝑊𝑊 𝑇
The model:
Folding-in prediction matrix:
matrix of user preferences
(Tucker Decomposition)𝐴 ≈ 𝑈Σ𝑉 𝑇
The model:
Folding-in prediction vector: 𝒑 = 𝑉𝑉 𝑇 𝒑
vector of user
preferences 11

Warm start scenario
our model predicts “opposite” preferences
standard recommendations are insensitive to negative feedback
traditional
methods
proposed
method
User feedback is
negative!
Probably the user
doesn’t like criminal
movies.
Key observation:
• standard algorithms as well as evaluation metrics are biased
The necessary requirement:
• evaluation metrics should also measure how often irrelevant items are recommended 12

New evaluation metric
𝐷𝐶𝐺 =
𝑖∈𝐼+
2 𝑟𝑒𝑙 𝑖 − 1
log2(𝑖 + 1)
𝐷𝐶𝐿 =
𝑖∈𝐼−
2 𝑟 𝑝−𝑟𝑒𝑙 𝑖 − 1
−log2(𝑖 + 1)
New metric
𝐼+ = {𝑖: 𝑟𝑒𝑙𝑖 ≥ 𝑟𝑝} 𝐼−
= {𝑖: 𝑟𝑒𝑙𝑖 < 𝑟𝑝}
Discounted
Cumulative
Loss
5 4 3 2
+ + +
Known preferences
(𝑟𝑒𝑙𝑖)
Recommendations
tp fp tnfn —
Standard metric
measures potential user’s disappointmentmeasures potential user’s enjoyment
Methodology:
1. split feedback into
positive and negative
2. treat negative part
differently
Discounted
Cumulative
Gain
positive feedback negative feedback
𝑟𝑝 - positivity threshold
positivity threshold
13

Warm start with 1 negative feedback
Data: Movielens 10M
Models:
higher is better
higher is better lower is better
proposed
model
proposed
model
proposed
model
14

CoFFee – summary
Users
3
1 2
54
1
Key feedback
from industry:
prudSys,
Megafon,
МУЛЬТиКУБИК
Not enough to model user feedback only
Need to take into account side information (user data, item features)
• Addresses warm start scenarios by more accurate feedback modeling.
• Supports quick online recommendations.
• Uses SVD as atomic operation (HOOI algorithm).
• Offers simple rank tuning procedure (tensor rounding).
• Improves state-of-the-art results.
• Implemented in the new open-source framework Polara.
15
Moreover, higher order model may suffer from extreme sparsity, which it amplifies.

The problem of scarce interactions
“easy” case
Sci-Fi
Sci-Fi
VS.
“hard” case
17

1
1 0.5
0.5 1
1
New approach - HybridSVD
PureSVD is equivalent to an eigenproblem for the scaled
cosine similarity matrix*:
sim 𝑖, 𝑗 ∼ 𝑎𝑖
𝑇
𝑎𝑗
sim 𝑖, 𝑗 ∼ 𝑎𝑖
𝑇
𝑆 𝑎𝑗
* Nikolakopoulos A. N., Kalantzis V. G., Garofalakis J. D., “EIGENREC: An Efficient and Scalable Latent Factor Family for Top-N Recommendation”, 2015
𝑎𝑖 is an 𝑖-th row of a rating matrix 𝐴
Key idea: substitute scalar products with a bilinear
form that takes side information into account.
Sci-Fi
18

HybridSVD – formal problem statement
1. Build SPD similarity matrices 𝐾, 𝑆 for users and items
based on side information (e.g., user profile data, item description);
2. Find SVD of an auxiliary matrix 𝐴.
𝐴𝑆𝐴 𝑇
= 𝑈Σ2
𝑈 𝑇
𝐴 𝑇
𝐾𝐴 = 𝑉Σ2
𝑉 𝑇
𝐴 ≡ 𝐾
1
2 𝐴𝑆
1
2 = 𝑈ΣV 𝑇
Solution is known (Abdi, 2007)* - take SVD of an auxiliary matrix 𝐴:
*Abdi H., “Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD)”, Encyclopedia of measurement and
statistics. Thousand Oaks (CA): Sage, 2007, 907-12.
𝑈 = 𝐾
1
2 𝑈, 𝑉 = 𝑆
1
2 𝑉
19
𝐴𝐴 𝑇
= 𝑈Σ2
𝑈 𝑇
𝐴 𝑇
𝐴 = 𝑉Σ2
𝑉 𝑇
From standard
SVD 𝐴 = 𝑈Σ𝑉 𝑇
:

HybridSVD computation
Proposition 1 (Frolov, 2017). Solution to the eigendecomposition problem can be
obtained via SVD of an auxiliary matrix
𝐴 ≡ 𝐿 𝐾 𝐴𝐿 𝑆,
where 𝐿 𝐾 and 𝐿 𝑆 are Cholesky factors of the corresponding similarity matrices, i.e.
𝐾 = 𝐿 𝐾 𝐿 𝐾
𝑇
and 𝑆 = 𝐿 𝑆 𝐿 𝑆
𝑇
. Connection between the auxiliary and the original latent
representation is given by
𝑈 = 𝐿 𝐾
𝑇
𝑈, 𝑉 = 𝐿 𝑆
𝑇
𝑉.
Proposition 2 (“hybrid” folding-in; Frolov, 2017). Given a vector 𝒑 of a new user
preferences, the vector of predicted preferences 𝒑 for that user can be estimated as:
𝒑 = 𝐿 𝑆
−𝑇
𝑉 𝑉 𝑇 𝐿 𝑆
𝑇
𝒑.
20

Results
Actual quality of the model depends on the sparsity of data.
no improvement
21

HybridSVD – summary
• Addresses the extreme data sparsity problems by
incorporating side information.
• Allows to stay within the SVD-based computational paradigm.
• Improves state-of-the-art results.
• Generates meaningful latent feature space based on side information.
• Supports quick online recommendations.
• The added complexity is linear w.r.t. the rank of decomposition.
• Applicable in other machine learning areas, e.g., NLP (word embeddings).
22
Issue: in the case of rating data can lead to spurious correlations.

Unified view
• The proposed models address different pieces of the limited preference information
problem and have their own pitfalls.
• CoFFee model is more susceptible to the sparsity issue due to higher order formulation.
• HybridSVD may introduce undesired spurious correlations.
Main idea: combine the previous two methods within a unified approach.
Users
3
1 2
54
1
+
CoFFee model HybridSVD HybridCoFFee
24

Higher order generalization of Proposition 1 (Frolov, 2018).
Let 𝐾, 𝑆 and 𝑅 be similarity matrices for users, items and feedback values respectively. Then, an
auxiliary tensor 𝒜 can be represented in the form:
𝒜 ≡ 𝒜 ×1 𝐿 𝐾
𝑇
×2 𝐿 𝑆
𝑇
×3 𝐿 𝑅
𝑇
,
where 𝐿 𝐾, 𝐿 𝑆 and 𝐿 𝑅 are the corresponding Cholesky factors of the similarity matrices. An auxiliary low
rank approximation problem can be solved via the Tucker Decomposition of 𝒜. Connection between
the auxiliary and the original latent representation is given by
𝑈 = 𝐿 𝐾
𝑇
𝑈, 𝑉 = 𝐿 𝑆
𝑇
𝑉, 𝑊 = 𝐿 𝑅
𝑇
𝑊.
Higher order generalization of Proposition 2 (Frolov, 2018).
Given a matrix 𝑃 of a new user preferences, the matrix of predicted preferences 𝑃 for that user can be
estimated as:
𝑃 = 𝑉𝑉𝑆
𝑇
𝑃𝑊𝑅 𝑊 𝑇,
where 𝑉𝑆 = 𝐿 𝑆 𝑉, 𝑊𝑅 = 𝐿 𝑅 𝑊.
HybridCoFFee
25

Results
26
Movielens10M3%fractionof
Movielens10M
BookCrossing

HybridCoFFee summary
• Combines CoFFee model with the HybridSVD approach.
• Efficient computational scheme based on a hybrid modification of a
standard HOOI algorithm is proposed.
• Inherits the benefits of its predecessors and at the same time
compensates their shortcomings.
• Potentially applicable to a wider class of problems: context-aware,
multi-criteria, etc. Naturally addresses context vs. content dichotomy.
Directions for future of research:
• Not feasible for the number of dimensions greater than 4.
• More appropriate tensor formats (TT/HT) can be used.
27

Part 3. New recommendation
framework

Polara framework
Open-source project
https://github.com/evfro/polara
• Written in Python, boosted by highly
efficient scientific packages
• Designed for quick prototyping and
comprehensive evaluation of models.
• Also used for educational purposes. 0
20
40
60
80
100
Fall 2016 Fall 2017 Fall 2018
Attention from the community
Forks Stars
29
Software registration patent is under revision

Polara framework
“RecSys for Humans”
Features fast and easy-to-use
customizable evaluation pipeline
feature-rich and extensible
actively developed
Implicit, MyMediaLite, GraphLab (Turi) support
30

Conclusions
• A new method for a proper modelling of user feedback is proposed.
• It allows to better handle both positive and negative user feedback,
• may improve user experience during the rating elicitation phase or in a general warm start scenario,
• it is based on the Tucker decomposition; expands the PureSVD approach to higher order cases.
• The second new method uses a generalized formulation of SVD to fuse side information with
collaborative data.
• it allows to handle cases of extreme data sparsity and maintain high quality of recommendations,
• also suitable for cold start regime.
• The third proposed method combines the previous two methods within a unified approach.
• efficient optimization technique is provided; it takes the specific structure of the problem into account.
• the method demonstrates all the advantages of its predecessors and at the same time does not suffer
from their shortcomings.
• All three methods use SVD as an atomic operation, which allows to maintain scalability and makes
them suitable for online settings.
• The methods have minimal requirements for optimal hyper-parameter search.
• New feature-rich and easy to use recommendation framework is developed.
31

Publications
• “Fifty shades of ratings: How to Benefit from Negative Feedback in Top-n Recommendations Tasks”, Evgeny
Frolov and Ivan Oseledets. Proceedings of the 10th ACM Conference on Recommender Systems, 2016, pp. 91-98.
• “Tensor methods and recommender systems”, Evgeny Frolov and Ivan Oseledets. WIREs Data Mining
Knowledge Discovery 2017, vol. 7, issue 3.
• “Matrix Factorization in collaborative filtering”, Evgeny Frolov and Ivan Oseledets, chapter in the book
Collaborative Recommendations: Algorithms, Practical Challenges and Applications, to be published by World
Scientific Publishing Co. Pte. Ltd. in August 2018.
To be published:
• “HybridSVD: When collaborative Information is Not Enough”, Evgeny Frolov and Ivan Oseledets
• “Revealing the Unobserved by Linking Collaborative Behavior and Side Knowledge”, Evgeny Frolov and Ivan
Oseledets
32

Conferences and talks
Conferences
• Tensor Decompositions and Applications, TDA 2016, Leuven, Belgium
• 10th ACM Conference on Recommender Systems, 2016, MIT, Boston, USA
• SIAM Conference on Applied Algebraic Geometry (AG17), 2017, Georgia Tech, Atlanta, USA
Talks
• 3rd Budapest RecSys & Personalization Meetup, November 3, 2016, Budapest, Hungary, organized by Gravity
(http://www.gravityrd.com/)
• INM RAS, regular seminar, November 2016, Moscow
• IITP, seminar on structural learning, November 2016, Moscow
• Matrix Methods and Applications, MMA 2015, Moscow, Skoltech
• Sberbank AI Laboratory, weekly seminar, April 2018, Moscow
• Computer Science seminar at Yandex, September 2016 , Moscow
• Technoprom-2017 Forum, Novosibirsk (http://forumtechnoprom.com/)
Special pre-defense seminars
• Семинар молодых ученых Федерального исследовательского центра «Информатика и управление»
Российской академии наук (ФИЦ ИУ РАН)
• Спецсеминар факультета управления и прикладной математики Московского физико-технического
института (ФУПМ МФТИ)
33

Low rank models for recommender systems with limited preference information

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Low rank models for recommender systems with limited preference information

Similaire à Low rank models for recommender systems with limited preference information (20)

Dernier

Dernier (20)

Low rank models for recommender systems with limited preference information