SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Factorization Machines (FM)- novel but proved
and promising approach
Evgeniy Marinov
July 2018
1 / 29
Introduction to FM
In 2010, Steffen Rendle (currently a senior research scientist
at Google), introduced a seminal paper [Rendle - FM].
Many years have passed since such an impactful algorithm has
been introduced in the world of ML /Recommender Systems/.
FMs are a new model class which combines the advantages of
Support Vector Machines(SVM)/polynomial regression and
factorization models.
2 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
3 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
Avazu’s CTR prediction context
https://www.kaggle.com/c/avazu-ctr-prediction
http://www.csie.ntu.edu.tw/~r01922136/slides/
kaggle-avazu.pdf
3 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
Avazu’s CTR prediction context
https://www.kaggle.com/c/avazu-ctr-prediction
http://www.csie.ntu.edu.tw/~r01922136/slides/
kaggle-avazu.pdf
In both competitions the LogLoss function for evaluation of the
classifiers has been used - the aim is to minimize the Logrithmic
Loss.
LogLoss = − 1
N
N
i=1[yi log(pi ) − (1 − yi ) log(1 − pi )]
with binary classification
3 / 29
Task Description
We are given (training and testing) dataset /images,
documents,.../
D = {(x(l), y(l))}l∈L ⊂ Rn × T
The aim is to find a function ˆy : Rn −→ T (target space),
which estimates the testing data /unknown to the trained
model/.
In recommender systems (Online advertising) we deal with
sparse input vectors x(l) ∈ Rn.
T can be {−1, 1}, {0, 1} (classification), R (regression) or
some categorical space (ranks for an item).
4 / 29
Task Description
5 / 29
Ad Classification
Classification from implicit feedback - the user does not rate
explicitly..
But we can ”guess” items with which features he likes or does
not care about.
Country Day Ad Type Clicked ?
USA 3/3/15 MOVIE 1
China 1/7/14 GAME 0
China 3/3/15 GAME 1
6 / 29
Standard (dummy) encoding
USA China 3/3/15 1/7/14 MOVIE GAME Clicked ?
1 0 1 0 1 0 1
0 1 0 1 0 1 0
0 1 1 0 0 1 1
Very large feature space
Very sparse samples (feature values)
7 / 29
(De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
8 / 29
(De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
Feature space: insanely large
If originally n features −→ additional n
2 = n(n−1)
2!
Samples: still sparse
8 / 29
Factorization Models
9 / 29
SVD with ALS
Singular Value Decomposition (Matrix Factorization) with
Alternate Least Squares.
The most basic matrix factorization model for recommender
systems models the rating ˆr a user u would give to an item i by:
ˆrui = xT
u yi ,
where
xT
u = (x1
u , · · · , xN
u ) - factor vector associated to the user
yT
i = (x1
y , · · · , yN
i ) - factor vector associated to the item
The dimension N of the factors xT
u , yT
i is the rank of the
model (factorization).
10 / 29
Factorization Models
Collaborative Filtering:
- Neighborhood Methods
- Latent Factor Methods
11 / 29
Factorization Models
12 / 29
Factorization Models
13 / 29
Factorization Models
14 / 29
Advantages of Factorization Models
Factorization models:
Factorization of higher order interactions
Efficient parameter estimation and superior performance
Even for sparse data where just a few or no observations for
those higher order effects are available
Main applications:
Collaborative Filtering (in Recommender Systems)
Link Prediction (in Social Networks)
15 / 29
Disadvantages of Factorization Models
BUT are not general prediction models. There are many
factorization models designed for specific tasks.
Standard models:
Matrix Factorization (MF)
Tensor Factorization (TF)
Specific tasks:
SVD++ [Koren, Bell - Advances in CF]
timeSVD++ [Koren, Bell - Advances in CF]
PIFT (Pairwise Interaction Tensor Factorization)
FPMC (Factorizing Personalized Markov Chains)
And others...
16 / 29
Advantages of FMs
FMs allow parameter estimation under very sparse data where
SVMs fail.
FMs have linear (and even ”almost constant”!) complexity
and don’t rely on support vectors like SVMs.
FMs are a general predictor that can work with any real
valued feature vector. In contrast, other state-of-the-art
factorization models work only on very restricted input data.
Through suitable feature engineering, FMs can mimic
state-of-the-art models like MF, SVD++, timeSVD++,
PARAFAC, PITF, FPMC and others.
17 / 29
18 / 29
Notations
For any x ∈ Rn let us define:
m(x) to be the number of the nonzero elements of the feature
vector;
¯mD to be the average of m(x(l)
) for all l ∈ L;
Since we deal with huge sparsity, we have that ¯mD n - the
dimensionality of the feature space.
One reason for huge sparsity is that the underlying problem
deals with large categorical variable domains.
19 / 29
Definition of Factorization Machine Model of degree 2
ˆy(x) := w0 +
n
i=1
wi xi +
n
i=1
n
j=i+1
vi , vj xi xj (1)
where the model parameters that have to be estimated are:
w0 ∈ R − global baias (mean rating over all items/users)
w = (w1, . . . , wn) ∈
Rn − weights of the corresponding features
vi := (vi,1, . . . , vi,k) within V ∈ Rn×k describes the i-th
variable(feature) with k factors.
20 / 29
Complexity of the model equation
The model equation ˆy(x) for every x ∈ D can be computed in
linear time O(kn) and under sparsity O(km(x))
Proof:
n
i=1
n
j=i+1 vi , vj xi xj =
1
2
n
i=1
n
j=1 vi , vj xi xj − 1
2
n
i=1 vi , vi xi xi =
1
2[ n
i=1
n
j=1
k
f =1 vi,f vj,f xi xj − n
i=1
k
f =1 vi,f vi,f xi xi ] =
1
2
k
f =1[( n
i=1 vi,f xi )( n
j=1 vj,f xj ) − n
i=1 v2
i,f x2
i ] =
1
2
k
f =1[( n
i=1 vi,f xi )2 − n
i=1 v2
i,f x2
i ]
21 / 29
Complexity of parameter updates
The needed computation to get the partial derivatives of ˆy(x) is
done while computing the computation of ˆy(x). Therefore,
All parameter updates for a case (x, y) can be done in O(kn)
and under sparsity O(km(x)).
∂ˆy(x)
∂θ
=



1, if θ is w0
xi , if θ is wi
xi
n
j=1 vj,f xj − vi,f x2
i if θ is vi,f
(2)
22 / 29
General applications of FM
FMs can be applied to a variety of prediction tasks.
Regression: ˆy(x) can be directly applied as a regression
predictor
Binary classification: The sign of ˆy(x) is used and the
parameters are optimized for hinge loss or logit loss.
Ranking: Input vectors x are ordered by the score of ˆy(x)
and optimization is done over pairs of instance vectors with a
pairwise classification loss ([Joachims - Ranking CTR]).
23 / 29
Inference
Inference algorithms:
SGD (Stochastic Gradient Descent)
ALS (Alternate Least Squares)
MCMC (Monte Carlo Markov Chains)
24 / 29
Library: libFM (C++)
Author: Steffen Rendle
Web site: http://www.libfm.org/
(very poor description of the library)
Source code (C++): https://github.com/srendle/libfm
Very professionaly written source code! (Linux and MacOS X)
libFM features
stochastic gradient descent (SGD)
alternating least squares (ALS)
Bayesian inference using Markov Chain Monte Carlo (MCMC)
25 / 29
Library: fastFM (Python)
Author: Immanuel Bayer
Web site: http://ibayer.github.io/fastFM/
Very detailed tutorial for the usage of the library!
Source code (Python):
https://github.com/ibayer/fastFM
Python (2.7 and 3.5) with the well known scikit-learn API.
Performence critical code in C and wrapped with Cython.
Task Solver Loss
Regression ALS, SGD, MCMC Square Loss
Classification ALS, SGD, MCMC Probit(MAP), Probit, Sigmuid
Ranking SGD [Rendle - BPR]
26 / 29
Library: LIBFFM (C++)
A library for Field-aware Factorization Machines from
the Machine Learning group at National Taiwan University
Author: YuChin Juan and colleagues
Web site:
https://www.csie.ntu.edu.tw/~cjlin/libffm/
(very poor description of the library)
Source code (C++):
https://github.com/guestwalk/libffm
FFM is prone to overfitting and not very stable for the
moment.
27 / 29
Table of contents
Thank you for your attention!
28 / 29
References
Steffen Rendle Factorization Machines. 2010. http://www.
ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and
Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking
from implicit feedback. UAI ’09, pages 452 2009
Koren Y. and Bell R. Advances in Collaborative Filtering.
Blondel M. Convex Factorization Machines.
http://www.mblondel.org/publications/mblondel-
ecmlpkdd2015.pdf
Thorsten Joachims, Optimizing Search Engines using
Clickthrough Data 2002.
29 / 29

Contenu connexe

Tendances

Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a SurveyActive Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a SurveyUniversity of Bergen
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLDatabricks
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 

Tendances (20)

Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a SurveyActive Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a Survey
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 

Similaire à Factorization Machines and Applications in Recommender Systems

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemsfuchaoqun
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningBenjamin Bengfort
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptxArthur240715
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Pluribus One
 

Similaire à Factorization Machines and Applications in Recommender Systems (20)

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Ml5
Ml5Ml5
Ml5
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
projectreport
projectreportprojectreport
projectreport
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
 

Dernier

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 

Dernier (20)

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 

Factorization Machines and Applications in Recommender Systems

  • 1. Factorization Machines (FM)- novel but proved and promising approach Evgeniy Marinov July 2018 1 / 29
  • 2. Introduction to FM In 2010, Steffen Rendle (currently a senior research scientist at Google), introduced a seminal paper [Rendle - FM]. Many years have passed since such an impactful algorithm has been introduced in the world of ML /Recommender Systems/. FMs are a new model class which combines the advantages of Support Vector Machines(SVM)/polynomial regression and factorization models. 2 / 29
  • 3. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf 3 / 29
  • 4. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf Avazu’s CTR prediction context https://www.kaggle.com/c/avazu-ctr-prediction http://www.csie.ntu.edu.tw/~r01922136/slides/ kaggle-avazu.pdf 3 / 29
  • 5. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf Avazu’s CTR prediction context https://www.kaggle.com/c/avazu-ctr-prediction http://www.csie.ntu.edu.tw/~r01922136/slides/ kaggle-avazu.pdf In both competitions the LogLoss function for evaluation of the classifiers has been used - the aim is to minimize the Logrithmic Loss. LogLoss = − 1 N N i=1[yi log(pi ) − (1 − yi ) log(1 − pi )] with binary classification 3 / 29
  • 6. Task Description We are given (training and testing) dataset /images, documents,.../ D = {(x(l), y(l))}l∈L ⊂ Rn × T The aim is to find a function ˆy : Rn −→ T (target space), which estimates the testing data /unknown to the trained model/. In recommender systems (Online advertising) we deal with sparse input vectors x(l) ∈ Rn. T can be {−1, 1}, {0, 1} (classification), R (regression) or some categorical space (ranks for an item). 4 / 29
  • 8. Ad Classification Classification from implicit feedback - the user does not rate explicitly.. But we can ”guess” items with which features he likes or does not care about. Country Day Ad Type Clicked ? USA 3/3/15 MOVIE 1 China 1/7/14 GAME 0 China 3/3/15 GAME 1 6 / 29
  • 9. Standard (dummy) encoding USA China 3/3/15 1/7/14 MOVIE GAME Clicked ? 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 Very large feature space Very sparse samples (feature values) 7 / 29
  • 10. (De)motivation! Often features are more important in pairs: ”Country == USA”&”Day == Thanksgiving” If we create a new feature (conjunction) for every pair of features..?!? 8 / 29
  • 11. (De)motivation! Often features are more important in pairs: ”Country == USA”&”Day == Thanksgiving” If we create a new feature (conjunction) for every pair of features..?!? Feature space: insanely large If originally n features −→ additional n 2 = n(n−1) 2! Samples: still sparse 8 / 29
  • 13. SVD with ALS Singular Value Decomposition (Matrix Factorization) with Alternate Least Squares. The most basic matrix factorization model for recommender systems models the rating ˆr a user u would give to an item i by: ˆrui = xT u yi , where xT u = (x1 u , · · · , xN u ) - factor vector associated to the user yT i = (x1 y , · · · , yN i ) - factor vector associated to the item The dimension N of the factors xT u , yT i is the rank of the model (factorization). 10 / 29
  • 14. Factorization Models Collaborative Filtering: - Neighborhood Methods - Latent Factor Methods 11 / 29
  • 18. Advantages of Factorization Models Factorization models: Factorization of higher order interactions Efficient parameter estimation and superior performance Even for sparse data where just a few or no observations for those higher order effects are available Main applications: Collaborative Filtering (in Recommender Systems) Link Prediction (in Social Networks) 15 / 29
  • 19. Disadvantages of Factorization Models BUT are not general prediction models. There are many factorization models designed for specific tasks. Standard models: Matrix Factorization (MF) Tensor Factorization (TF) Specific tasks: SVD++ [Koren, Bell - Advances in CF] timeSVD++ [Koren, Bell - Advances in CF] PIFT (Pairwise Interaction Tensor Factorization) FPMC (Factorizing Personalized Markov Chains) And others... 16 / 29
  • 20. Advantages of FMs FMs allow parameter estimation under very sparse data where SVMs fail. FMs have linear (and even ”almost constant”!) complexity and don’t rely on support vectors like SVMs. FMs are a general predictor that can work with any real valued feature vector. In contrast, other state-of-the-art factorization models work only on very restricted input data. Through suitable feature engineering, FMs can mimic state-of-the-art models like MF, SVD++, timeSVD++, PARAFAC, PITF, FPMC and others. 17 / 29
  • 22. Notations For any x ∈ Rn let us define: m(x) to be the number of the nonzero elements of the feature vector; ¯mD to be the average of m(x(l) ) for all l ∈ L; Since we deal with huge sparsity, we have that ¯mD n - the dimensionality of the feature space. One reason for huge sparsity is that the underlying problem deals with large categorical variable domains. 19 / 29
  • 23. Definition of Factorization Machine Model of degree 2 ˆy(x) := w0 + n i=1 wi xi + n i=1 n j=i+1 vi , vj xi xj (1) where the model parameters that have to be estimated are: w0 ∈ R − global baias (mean rating over all items/users) w = (w1, . . . , wn) ∈ Rn − weights of the corresponding features vi := (vi,1, . . . , vi,k) within V ∈ Rn×k describes the i-th variable(feature) with k factors. 20 / 29
  • 24. Complexity of the model equation The model equation ˆy(x) for every x ∈ D can be computed in linear time O(kn) and under sparsity O(km(x)) Proof: n i=1 n j=i+1 vi , vj xi xj = 1 2 n i=1 n j=1 vi , vj xi xj − 1 2 n i=1 vi , vi xi xi = 1 2[ n i=1 n j=1 k f =1 vi,f vj,f xi xj − n i=1 k f =1 vi,f vi,f xi xi ] = 1 2 k f =1[( n i=1 vi,f xi )( n j=1 vj,f xj ) − n i=1 v2 i,f x2 i ] = 1 2 k f =1[( n i=1 vi,f xi )2 − n i=1 v2 i,f x2 i ] 21 / 29
  • 25. Complexity of parameter updates The needed computation to get the partial derivatives of ˆy(x) is done while computing the computation of ˆy(x). Therefore, All parameter updates for a case (x, y) can be done in O(kn) and under sparsity O(km(x)). ∂ˆy(x) ∂θ =    1, if θ is w0 xi , if θ is wi xi n j=1 vj,f xj − vi,f x2 i if θ is vi,f (2) 22 / 29
  • 26. General applications of FM FMs can be applied to a variety of prediction tasks. Regression: ˆy(x) can be directly applied as a regression predictor Binary classification: The sign of ˆy(x) is used and the parameters are optimized for hinge loss or logit loss. Ranking: Input vectors x are ordered by the score of ˆy(x) and optimization is done over pairs of instance vectors with a pairwise classification loss ([Joachims - Ranking CTR]). 23 / 29
  • 27. Inference Inference algorithms: SGD (Stochastic Gradient Descent) ALS (Alternate Least Squares) MCMC (Monte Carlo Markov Chains) 24 / 29
  • 28. Library: libFM (C++) Author: Steffen Rendle Web site: http://www.libfm.org/ (very poor description of the library) Source code (C++): https://github.com/srendle/libfm Very professionaly written source code! (Linux and MacOS X) libFM features stochastic gradient descent (SGD) alternating least squares (ALS) Bayesian inference using Markov Chain Monte Carlo (MCMC) 25 / 29
  • 29. Library: fastFM (Python) Author: Immanuel Bayer Web site: http://ibayer.github.io/fastFM/ Very detailed tutorial for the usage of the library! Source code (Python): https://github.com/ibayer/fastFM Python (2.7 and 3.5) with the well known scikit-learn API. Performence critical code in C and wrapped with Cython. Task Solver Loss Regression ALS, SGD, MCMC Square Loss Classification ALS, SGD, MCMC Probit(MAP), Probit, Sigmuid Ranking SGD [Rendle - BPR] 26 / 29
  • 30. Library: LIBFFM (C++) A library for Field-aware Factorization Machines from the Machine Learning group at National Taiwan University Author: YuChin Juan and colleagues Web site: https://www.csie.ntu.edu.tw/~cjlin/libffm/ (very poor description of the library) Source code (C++): https://github.com/guestwalk/libffm FFM is prone to overfitting and not very stable for the moment. 27 / 29
  • 31. Table of contents Thank you for your attention! 28 / 29
  • 32. References Steffen Rendle Factorization Machines. 2010. http://www. ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. UAI ’09, pages 452 2009 Koren Y. and Bell R. Advances in Collaborative Filtering. Blondel M. Convex Factorization Machines. http://www.mblondel.org/publications/mblondel- ecmlpkdd2015.pdf Thorsten Joachims, Optimizing Search Engines using Clickthrough Data 2002. 29 / 29