Title: Factorization Machines
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Steffen Rendle, Research Scientist, Google at MLconf SF
1. Factorization Models & Polynomial Regression Factorization Machines Applications Summary
Factorization Machines
Steen Rendle
Current aliation: Google Inc.
Work was done at University of Konstanz
MLConf, November 14, 2014
Steen Rendle 1 / 53
3. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
k is the rank of the reconstruction.
Steen Rendle 3 / 53
4. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
^y(u; i) = ^yu;i =
Xk
f =1
wu;f hi ;f = hwu; hi i
k is the rank of the reconstruction.
Steen Rendle 3 / 53
5. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
Steen Rendle 4 / 53
6. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Steen Rendle 4 / 53
7. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Rating
Matrix
time
^ytimeSVD(u; i ; t) := hvu + vu;t ; vi i
^ytimeTF(u; i ; t) :=
Xk
f =1
vu;f vi ;f vt;f
: : :
Steen Rendle 4 / 53
8. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Tensor Factorization
Example for data: Examples for models:
Triples of Subject, Predicate, Object
^yPARAFAC(s; p; o) :=
Xk
f =1
vs;f vp;f vo;f
^yPITF(s; p; o) := hvs ; vpi + hvs ; voi + hvp; voi
: : :
Steen Rendle 5 / 53
[illustration from Drumond et al. 2012]
9. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Sequential Factorization Models
Example for data: Examples for models:
Bt Bt3
b
b a b
a c
User 1 ?
c
e
c c
a
?
d c e e ?
?
User 2
User 3
User 4
Bt2
Bt1
a
^yFMC(u; i ; t) :=
X
l2Bt1
hvi ; vl i
^yFPMC(u; i ; t) := hvu; vi i +
X
l2Bt1
hvi ; vl i
: : :
Steen Rendle 6 / 53
10. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
Steen Rendle 7 / 53
11. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
I Downsides
I Factorization models are usually build speci
12. cally for each problem.
I Learning algorithms and implementations are tailored to individual
models.
Steen Rendle 7 / 53
14. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Data and Variable Representation
Many standard ML approaches work with real valued feature vectors as
input. It allows to represent, e.g.:
I any number of variables
I categorical domains by using dummy indicator variables
I numerical domains
I set-categorical domains by using dummy indicator variables
Using this representation allows to apply a wide variety of standard
models (e.g. linear regression, SVM, etc.).
Steen Rendle 9 / 53
15. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Linear Regression
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation:
^y(x) := w0 +
Xp
i=1
wi xi
I Model parameters:
w0 2 R; w 2 Rp
O(p) model parameters.
Steen Rendle 10 / 53
16. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Polynomial Regression
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp
O(p2) model parameters.
Steen Rendle 11 / 53
18. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
Steen Rendle 13 / 53
19. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
,
# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .
Steen Rendle 13 / 53
20. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .
)
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Steen Rendle 13 / 53
21. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Steen Rendle 14 / 53
22. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Steen Rendle 14 / 53
23. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i
Steen Rendle 14 / 53
24. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i
Matrix factorization: ^y(u; i) = hwu; hi i
Steen Rendle 14 / 53
25. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
Steen Rendle 15 / 53
26. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
Steen Rendle 15 / 53
27. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
Steen Rendle 15 / 53
28. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
Steen Rendle 15 / 53
29. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
I Max.-likelihood estimator for a pairwise eect is:
wi ;j =
(
y w0 wi wu; if (i ; j ; y) 2 S:
not de
31. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
I Max.-likelihood estimator for a pairwise eect is:
wi ;j =
(
y w0 wi wu; if (i ; j ; y) 2 S:
not de
32. ned; else
I Polynomial regression cannot generalize to any unobserved pairwise
eect.
Steen Rendle 15 / 53
34. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
35. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Compared to Polynomial regression:
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
36. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
37. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 3):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
+
Xp
i=1
Xp
ji
Xp
lj
Xk
f =1
v(3)
i ;f v(3)
j ;f v(3)
l ;f xi xj xl
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk ; V(3) 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
38. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machines: Discussion
I FMs work with real valued input.
I FMs include variable interactions like polynomial regression.
I Model parameters for interactions are factorized.
I Number of model parameters is O(k p) (instead of O(p2) for poly.
regr.).
Steen Rendle 18 / 53
47. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
Steen Rendle 26 / 53
48. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
I Ecient computation can be done in: O(p k)
Steen Rendle 26 / 53
49. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
I Ecient computation can be done in: O(p k)
I Making use of many zeros in x even in: O(Nz (x) k), where Nz (x) is
the number of non-zero elements in vector x.
Steen Rendle 26 / 53
50. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Steen Rendle 27 / 53
51. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2
Xp
i=1
(xi vi ;f )2
3
5
Steen Rendle 27 / 53
52. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2
Xp
i=1
(xi vi ;f )2
3
5
I In the sums over i , only non-zero xi elements have to be summed up
) O(Nz (x) k).
I (The complexity of polynomial regression is O(Nz (x)2).)
Steen Rendle 27 / 53
53. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .
Steen Rendle 28 / 53
54. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .
E.g. for second order eects ( = vl ;f ):
^y(x; vl;f ) :=
g(vl;f )(x)
z }| {
w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
j=i+1
Xk
f 0=1
(f 06=f )_(l62fi ;jg)
vi ;f 0 vj;f 0 xi xj
+ vl;f xl
X
i=1;i6=l
vi ;f xi
| {z }
h(vl;f )(x)
Steen Rendle 28 / 53
56. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Learning
Using these properties, learning algorithms can be developed:
I L2-regularized regression and classi
57. cation:
I Stochastic gradient descent [Rendle, 2010]
I Alternating least squares/ Coordinate Descent [Rendle et al., 2011,
Rendle 2012]
I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al.
2011, Rendle 2012]
I L2-regularized ranking:
I Stochastic gradient descent [Rendle, 2010]
All the proposed learning algorithms have a runtime of O(k Nz (X) i ),
where i is the number of iterations and Nz (X) the number of non-zero
elements in the design matrix X.
Steen Rendle 30 / 53
58. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Stochastic Gradient Descent (SGD)
I For each training case (x; y) 2 S, SGD updates the FM model
parameter using:
0 =
(^y(x) y)h()(x) + ()
I is the learning rate / step size.
I () is the regularization value of the parameter .
I SGD can easily be applied to other loss functions.
Steen Rendle 31 / 53
[Rendle, 2010]
59. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Coordinate Descent (CD)
I CD updates each FM model parameter using:
0 =
P
(x;y)2S
y g()(x)
h()(x)
P
(x;y)2S h2
()(x) + ()
I Using caches of intermediate results, the runtime for updating all
model parameters is O(k Nz (X)).
I CD can be extended to classi
61. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Gibbs Sampling (MCMC)
I Gibbs sampling with a block for each FM model parameter :
jS; n fg N
P
(x;y)2S
y g()(x)
h()(x)
P
(x;y)2S h2
()(x) + ()
;
1
P
(x;y)2S h2
()(x) + ()
!
I Mean is the same as for CD ) computational complexity is also
O(k Nz (X)).
I MCMC can be extended to classi
62. cation using link functions.
Steen Rendle 33 / 53
[Freudenthaler et al. 2011, Rendle 2012]
63. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Learning Regularization Values
v ,v
yi
w ,w
wj
w0
xij
i=1,...,n
v j
j=1,...,p
, 0 ,0
yi
wj
w0
w
xij
i=1,...,n
v j
j=1,...,p
w0 ,w0
0 ,0
w0 ,w0
w v v
Standard FM with priors. Two level FM with hyperpriors.
Steen Rendle 34 / 53
[Freudenthaler et al., 2011]
65. Factorization Models Polynomial Regression Factorization Machines Applications Summary
libFM Software
libFM is an implementation of FMs
I Model: second-order FMs
I Learning/ inference: SGD, ALS, MCMC
I Classi
66. cation and regression
I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al],
SVMlight [Joachims].
I Supports variable grouping.
I Open source: GPLv3.
Steen Rendle 36 / 53
[http://www.libfm.org/]
67. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 37 / 53
68. Factorization Models Polynomial Regression Factorization Machines Applications Summary
(Context-aware) Rating Prediction
I Main variables:
I User ID (categorical)
I Item ID (categorical)
I Additional variables:
I time
I mood
I user pro
69. le
I item meta data
I . . .
I Examples: Net
ix prize, Movielens, KDDCup 2011
+
♪ + +
Song
User Time Mood
Steen Rendle 38 / 53
70. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Netflix Prize: Prediction Error
Public Leaderboard
RMS Error
0.86 0.87 0.88 0.89 0.90
user, movie
user, movie, day
user, movie, impl.
user, movie,
day, impl.
SGD Matrix
Factorization
user, movie,
day, impl.,
freq, lin. day
$1M Prize
I k = 128 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)
Steen Rendle 39 / 53
71. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using user ID and item ID
Probabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170
Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211
Matrix Factorization [6] Variational Bayes 30 *0.9141
Matchbox [15] Variational Bayes 50 *0.9100
ALS-MF [7] ALS 100 0.9079
ALS-MF [7] ALS 1000 *0.9018
SVD/ MF [3] SGD 100 0.9025
SVD/ MF [3] SGD 200 *0.9009
Bayesian Probablistic Matrix Factorization
[13] MCMC 150 0.8965
(BPMF)
Bayesian Probablistic Matrix Factorization
(BPMF)
[13] MCMC 300 *0.8954
FM, pred. var: user ID, movie ID - MCMC 128 0.8937
Models using implicit feedback
Probabilistic Matrix Factorization with Cons-
traints
[14] Batch GD 30 *0.9016
SVD++ [3] SGD 100 0.8924
SVD++ [3] SGD 200 *0.8911
BSRM/F [18] MCMC 100 0.8926
BSRM/F [18] MCMC 400 *0.8874
FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865
Steen Rendle 40 / 53
72. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using time information
Bayesian Probabilistic Tensor Factorization
[17] MCMC 30 *0.9044
(BPTF)
FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873
Models using time and implicit feedback
timeSVD++ [5] SGD 100 0.8805
timeSVD++ [5] SGD 200 *0.8799
FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809
FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794
Assorted models
BRISMF/UM NB corrected [16] SGD 1000 *0.8904
BMFSI plus side information [8] MCMC 100 *0.8875
timeSVD++ plus frequencies [4] SGD 200 0.8777
timeSVD++ plus frequencies [4] SGD 2000 *0.8762
FM, pred. var: user ID, movie ID, day, impl.,
- MCMC 128 0.8779
freq., lin. day
FM, pred. var: user ID, movie ID, day, impl.,
freq., lin. day
- MCMC 256 0.8771
Steen Rendle 40 / 53
73. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 41 / 53
74. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Link Prediction in Social Networks
I Main variables:
I Actor A ID
I Actor B ID
I Additional variables:
I pro
75. les
I actions
I . . .
+
Actor A Actor B
Steen Rendle 42 / 53
76. Factorization Models Polynomial Regression Factorization Machines Applications Summary
KDDCup 2012: Track 1
KDDCup 2012 Track 1: Prediction Quality
Public Leaderboard Private Leaderboard
Mean Average Precision @3
0.32 0.34 0.36 0.38 0.40 0.42
none
gender, age, ...
keywords
friends
all
none
gender, age, ...
keywords
friends
all
Top 1
Top 5
Top 10
Top 100
I k = 22 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)
Steen Rendle 43 / 53
[Awarded 2nd place (out of 658 teams)]
77. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 44 / 53
78. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Clickthrough Prediction
I Main variables:
I User ID
I Query ID
I Ad/ Link ID
I Additional variables:
I query tokens
I user pro
79. le
I . . .
+
keyword... +
Link 1
Link 2
Link 3
User Query Ad/ Link
Steen Rendle 45 / 53
80. Factorization Models Polynomial Regression Factorization Machines Applications Summary
KDDCup 2012: Track 2
Model Inference wAUC (public) wAUC (private)
ID-based model (k = 0) SGD 0.78050 0.78086
Attribute-based model (k = 8) MCMC 0.77409 0.77555
Mixed model (k = 8) SGD 0.79011 0.79321
Final ensemble n/a 0.79857 0.80178
Ensemble
I Rank positions (not predicted clickthrough rates) are used.
I The MCMC attribute-based model and dierent variations of the
. SGD models are included.
Steen Rendle 46 / 53
[Awarded 3rd place (out of 171 teams)]
81. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 47 / 53
82. Factorization Models Polynomial Regression Factorization Machines Applications Summary
ECML/PKDD Discovery Challenge 2013
I Problem: Recommend given names.
I Main variables:
I User ID
I Name ID
I Additional variables:
I session info
I string representation for each name
I . . .
I FM approach won 1st place (online track) and 2nd (oine track).
Steen Rendle 48 / 53
83. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 49 / 53
84. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Student Performance Prediction
I Main variables:
I Student ID
I Question ID
I Additional variables:
I question hierarchy
I sequence of questions
I skills required
I . . .
I Examples: KDDCup 2010, Grockit Challenge4 (FM placed 1st/241)
+
?
Student Question
4http://www.kaggle.com/c/WhatDoYouKnow
Steen Rendle 50 / 53
85. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 51 / 53
86. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Kaggle Competitions
FMs have been successfully applied to several Kaggle competitions:
I Criteon Display Advertising Challenge: 1st place (team '3 idiots').
I Blue Book for Bulldozers: 1st place (team 'Leustagos Titericz').
I EMI Music Data Science Hackathon: 2nd place (team 'lns').
Steen Rendle 52 / 53
87. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Summary
I Factorization machines combine linear/polynomial regression with
factorization models.
I Feature interactions are learned with a low rank representation.
I Estimation of unobserved interactions is possible.
I Factorization machines can be computed eciently and have high
prediction quality.
Steen Rendle 53 / 53
88. Factorization Models Polynomial Regression Factorization Machines Applications Summary
L. Drumond, S. Rendle, and L. Schmidt-Thieme.
Predicting rdf triples in incomplete knowledge bases with tensor
factorization.
In Proceedings of the 27th Annual ACM Symposium on Applied
Computing, SAC '12, pages 326{331, New York, NY, USA, 2012.
ACM.
C. Freudenthaler, L. Schmidt-Thieme, and S. Rendle.
Bayesian factorization machines.
In NIPS workshop on Sparse Representation and Low-rank
Approximation, 2011.
Y. Koren.
Factorization meets the neighborhood: a multifaceted collaborative
89. ltering model.
In KDD '08: Proceeding of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 426{434,
New York, NY, USA, 2008. ACM.
Y. Koren.
The bellkor solution to the net
ix grand prize.
2009.
Steen Rendle 53 / 53
91. ltering with temporal dynamics.
In KDD '09: Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 447{456,
New York, NY, USA, 2009. ACM.
Y. J. Lim and Y. W. Teh.
Variational Bayesian approach to movie rating prediction.
In Proceedings of KDD Cup and Workshop, 2007.
I. Pilaszy, D. Zibriczky, and D. Tikk.
Fast als-based matrix factorization for explicit and implicit feedback
datasets.
In RecSys '10: Proceedings of the fourth ACM conference on
Recommender systems, pages 71{78, New York, NY, USA, 2010.
ACM.
I. Porteous, A. Asuncion, and M. Welling.
Bayesian matrix factorization with side information and dirichlet
process mixtures.
In Proceedings of the Twenty-Fourth AAAI Conference on Arti
93. Factorization Models Polynomial Regression Factorization Machines Applications Summary
S. Rendle.
Factorization machines.
In Proceedings of the 2010 IEEE International Conference on Data
Mining, ICDM '10, pages 995{1000, Washington, DC, USA, 2010.
IEEE Computer Society.
S. Rendle.
Factorization machines with libFM.
ACM Trans. Intell. Syst. Technol., 3(3):57:1{57:22, May 2012.
S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme.
Factorizing personalized markov chains for next-basket
recommendation.
In WWW '10: Proceedings of the 19th international conference on
World wide web, pages 811{820, New York, NY, USA, 2010. ACM.
S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme.
Fast context-aware recommendations with factorization machines.
In Proceedings of the 34th ACM SIGIR Conference on Reasearch and
Development in Information Retrieval. ACM, 2011.
R. Salakhutdinov and A. Mnih.
Steen Rendle 53 / 53
94. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Bayesian probabilistic matrix factorization using Markov chain Monte
Carlo.
In Proceedings of the 25th international conference on Machine
learning, ICML '08, pages 880{887, New York, NY, USA, 2008.
ACM.
R. Salakhutdinov and A. Mnih.
Probabilistic matrix factorization.
In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in
Neural Information Processing Systems 20, pages 1257{1264,
Cambridge, MA, 2008. MIT Press.
D. H. Stern, R. Herbrich, and T. Graepel.
Matchbox: large scale online bayesian recommendations.
In Proceedings of the 18th international conference on World wide
web, WWW '09, pages 111{120, New York, NY, USA, 2009. ACM.
G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk.
Scalable collaborative
95. ltering approaches for large recommender
systems.
J. Mach. Learn. Res., 10:623{656, June 2009.
Steen Rendle 53 / 53
96. Factorization Models Polynomial Regression Factorization Machines Applications Summary
L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell.
Temporal collaborative
97. ltering with bayesian probabilistic tensor
factorization.
In Proceedings of the SIAM International Conference on Data
Mining, pages 211{222. SIAM, 2010.
S. Zhu, K. Yu, and Y. Gong.
Stochastic relational models for large-scale dyadic data using
MCMC.
In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors,
Advances in Neural Information Processing Systems 21, pages
1993{2000, 2009.
Steen Rendle 53 / 53