SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Dictionary Learning for
Massive Matrix Factorization
Arthur Mensch, Julien Mairal,
Ga¨el Varoquaux, Bertrand Thirion
Inria/CEA Parietal, Inria Thoth
June 20, 2016
Matrix factorization
X
p
n
D
p
k
=
A
n
k
1X ∈ Rp×n = DA ∈ Rp×k × Rk×n
Flexible tool for unsupervised data analysis
Dataset has lower underlying complexity than appearing size
How to scale it to very large datasets ? (Brain imaging, 2TB)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 1 / 19
Matrix factorization
X
p
n
D
p
k
=
A
n
k
1
Low rank factorization : k < p
X
p
n
D
p
k
=
A
n
k
1
...with optional sparse factors
→ interpretable data (fMRI, genetics, topic modeling)
X
p
n
D
p
k
=
A
n
k
1Overcomplete dictionary learning k p - sparse A
[Olshausen and Field, 1997]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 2 / 19
Formalism and methods
Non-convex formulation
min
D∈C,A∈Rk×n
X − DA 2
2 + λΩ(A)
Constraints on D
Penalty on A ( 1, 2)
Naive resolution
Alternated minimization: use full X at each iteration
Very slow : single iteration in O(p n)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 3 / 19
Online matrix factorization
Stream (xt), update D at each t [Mairal et al., 2010]
Single iteration in O(p), a few epochs
xt
p
n
D
p
k
=
αt
n
k
streaming
1
Large n, regular p, eg image patches:
p = 256 n ≈ 106
1GB
Both (sparse) low-rank factorization / sparse coding
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 4 / 19
Scaling-up for massive matrices
Functional MRI (HCP dataset)
Brain “movies” : space × time
Extract k sparse networks
p = 2 · 105 n = 2 · 106 2 TB
Way larger than vision problems
Unusual setting: data is large in
both directions
Also useful in collaborative filtering
X
Voxels
Time
=
D A
k spatial maps Time
x
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 5 / 19
Scaling-up for massive matrices
Out-of-the-box online algorithm ?
xt
p
n
D
p
k
=
αt
n
k
1Limited time budget ?
Need to accomodate large p
235 h run time
1 full epoch
10 h run time
1
24
epoch
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 6 / 19
Scaling-up in both directions
X
p
n
Batch → online
xt
n
Steaming
Handle large n
1
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Online learning + partial random access to samples
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 7 / 19
Scaling-up in both directions
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Low-distorsion lemma [Johnson and Lindenstrauss, 1984]
Random linear alebra [Halko et al., 2009]
Sketching for data reduction [Pilanci and Wainwright, 2014]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 8 / 19
Algorithm design
Online dictionary learning [Mairal et al., 2010]
1 Compute code – O(p)
αt = argmin
α∈Rk
xt − Dt−1α 2
2 + λΩ(αt)
2 Update surrogate – O(p)
gt =
1
t
t
i=1
xi − Dαi
2
2
3 Minimize surrogate – O(p)
Dt = argmin
D∈C
gt(D) = argmin
D∈C
Tr (D DAt − D Bt)
xt access → O(p) algorithm (complexity dependency in p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 9 / 19
Introducing subsampling
Iteration cost in O(p): can we reduce it?
xt → Mtxt, p → rk Mt = s
Use only Mtxt in algorithm
computation: complexity in O(s)
Mtxt
p
n
Streaming
Subsampling
1Our contribution
Adapt the 3 parts of the algorith to obtain O(s) complexity
1 Code
computation
2 Surrogate
update
3 Surrogate
minimization
[Szab´o et al., 2011]: dictionary learning with missing value – O(p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 10 / 19
1. Code computation
Linear regression with random sampling
αt = argmin
α∈Rk
Mt(xt − Dt−1αt) 2
2 + λ
rk Mt
p
Ω(α)
approximative solution of
αt = argmin
α∈Rk
xt − Dt−1αt
2
2 + λΩ(α)
validity in high dimension, with incoherent features:
D MtD ≈
s
p
D D D Mtxt ≈
s
p
D xt
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 11 / 19
2. Surrogate update
Original algorithm: At and Bt used in dictionary update
At = 1
t
t
i=1 αi αi same as in online algorithm
Bt = (1 − 1
t )Bt−1 + 1
t xtαt = 1
t
t
i=1 xi αi Forbidden
Partial update of Bt at each iteration
Bt =
1
t
i=1 Mi
t
i=1
Mi xi αi
Only MtB is updated
Behaves like Ex[xα] for large t
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 12 / 19
3. Surrogate minimization
Original algorithm : block coordinate descent with projection on C
min
D∈C
gt(D) Dj ← p⊥
Cj
(Dj −
1
Aj,j
(D(At)j − (Bt)j ))
Forbidden update of full D at iteration t
Cautious update
Leave dictionary unchanged for unseen features (I − Mt)
min
D∈C
(I−Mt )D=(I−Mt )Dt−1
gt(D)
O(s) update in block coordinate descent
Dj ← p⊥
Cr
j
(Dj −
1
(At)j,j
(Mt(D(At)j − (Bt)j )))
1 ball Cr
j = {D ∈ C, MtD 1 ≤ MtDt−1 1}
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 13 / 19
Resting-state fMRI
HCP dataset
One brain image per second
200 sessions n = 2 · 106 p = 2 · 105
D AX
Voxels
Time
=
k spatial maps Time
x
Sparse decomposition k = 70 C = Bk
1 Ω = · 2
2
Validation
Increase reduction factor p
s
Objective function on test set vs CPU time
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 14 / 19
Resting-state fMRI
Online dictionary learning
235 h run time
1 full epoch
10 h run time
1
24 epoch
Proposed method
10 h run time
1
2 epoch, reduction r=12
Qualitatively, usable maps are obtained 10× faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 15 / 19
Resting-state fMRI
.1 h 1 h 10 h 100 h
CPU
time
2.20
2.25
2.30
2.35
2.40
Objectivevalueontestset ×108
Original online algorithm
Reduction factor r 4 8 12
No reduction
Speed-up close to reduction factor p
s
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 16 / 19
Resting-state fMRI
100 1000 Epoch 4000Records
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
Objectivevalueontestset ×108
λ = 10−4
No reduction
(original alg.)
r = 4
r = 8
r = 12
Convergence speed / number of seen records
Information is acquired faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 17 / 19
Collaborative filtering
Mtxt movie ratings from user t
vs. coordinate descent for MMMF loss (no hyperparameters)
100 s 1000 s
0.93
0.94
0.95
0.96
0.97
0.98
0.99 Netflix (140M)
Coordinate descent
Proposed
(full projection)
Proposed
(partial projection)
Dataset Test RMSE Speed
CD MODL -up
ML 1M 0.872 0.866 ×0.75
ML 10M 0.802 0.799 ×3.7
NF (140M) 0.938 0.934 ×6.8
Outperform coordinate descent beyond 10M ratings
Same prediction performance
Speed-up 6.8× on Netflix
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 18 / 19
Conclusion
Take-home message
Loading stochastic subsets of sample
streams can drastically accelerates online
matrix factorization
Mtxt
p
n
Streaming
Subsampling
1
Reduce CPU (+IO) load at each iteration
cf Gradient Descent vs SGD
An order of magnitude speed-up on two different problems
Python package http://github.com/arthurmensch/modl
Heuristic at contribution time
A follow-up algorithm has convergence guarantees
Questions ? (Poster # 41 this afternoon)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 19 / 19
Bibliography I
[Halko et al., 2009] Halko, N., Martinsson, P.-G., and Tropp, J. A. (2009).
Finding structure with randomness: Probabilistic algorithms for constructing
approximate matrix decompositions.
arXiv:0909.4061 [math].
[Johnson and Lindenstrauss, 1984] Johnson, W. B. and Lindenstrauss, J.
(1984).
Extensions of Lipschitz mappings into a Hilbert space.
Contemporary mathematics, 26(189-206):1.
[Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010).
Online learning for matrix factorization and sparse coding.
The Journal of Machine Learning Research, 11:19–60.
[Olshausen and Field, 1997] Olshausen, B. A. and Field, D. J. (1997).
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vision Research, 37(23):3311–3325.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 20 / 19
Bibliography II
[Pilanci and Wainwright, 2014] Pilanci, M. and Wainwright, M. J. (2014).
Iterative Hessian sketch: Fast and accurate solution approximation for
constrained least-squares.
arXiv:1411.0347 [cs, math, stat].
[Szab´o et al., 2011] Szab´o, Z., P´oczos, B., and Lorincz, A. (2011).
Online group-structured dictionary learning.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2865–2872. IEEE.
[Yu et al., 2012] Yu, H.-F., Hsieh, C.-J., and Dhillon, I. (2012).
Scalable coordinate descent approaches to parallel matrix factorization for
recommender systems.
In Proceedings of the International Conference on Data Mining, pages
765–774. IEEE.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 21 / 19
Appendix
Collaborative filtering
Streaming uncomplete data
Mt is imposed by user t
Data stream : Mtxt movies ranked by user t
Proposed by [Szab´o et al., 2011]), with O(p) complexity
Validation: Test RMSE (rating prediction) vs CPU time
Baseline: Coordinate descent solver [Yu et al., 2012] solving
related loss
n
i=1
( Mt(Xt − Dαt) 2
2 + λ αt
2
2) + λ D 2
2
Fastest solver available apart from SGD – no hyperparameters
Our method is not sensitive to hyperparameters
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 22 / 19
Algorithm
Our algorithm
1 Code computation
αt = argmin
α∈Rk
Mt (xt − Dt−1α) 2
2
+ λ
rk Mt
p
Ω(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
i=1 Mi
(Mt xt αt − Mt Bt−1)
3 Surrogate minimization
Mt Dj ← p⊥
Cj
(Mt Dj −
1
(At )j,j
Mt (D(At )j −(Bt )j ))
Original online MF
1 Code computation
αt = argmin
α∈Rk
xt − Dt−1α 2
2
+ λΩ(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
(xt αt − Bt−1)
3 Surrogate minimization
Dj ← p⊥
Cr
j
(Dj −
1
(At )j,j
(D(At )j −(Bt )j ))
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 23 / 19

Contenu connexe

Tendances

2. ruang vektor dan ruang vektor bagian
2. ruang vektor dan ruang vektor bagian2. ruang vektor dan ruang vektor bagian
2. ruang vektor dan ruang vektor bagianpujirahayustat13
 
量子アニーリングを用いたクラスタ分析
量子アニーリングを用いたクラスタ分析量子アニーリングを用いたクラスタ分析
量子アニーリングを用いたクラスタ分析Shu Tanaka
 
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2hirokazutanaka
 
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習Masanari Kimura
 
ディープボルツマンマシン入門
ディープボルツマンマシン入門ディープボルツマンマシン入門
ディープボルツマンマシン入門Saya Katafuchi
 
ANALISIS RIIL 1 2.1 ROBERT G BARTLE
ANALISIS RIIL 1 2.1 ROBERT G BARTLEANALISIS RIIL 1 2.1 ROBERT G BARTLE
ANALISIS RIIL 1 2.1 ROBERT G BARTLEMuhammad Nur Chalim
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanismsShiga University, RIKEN
 
RBMを応用した事前学習とDNN学習
RBMを応用した事前学習とDNN学習RBMを応用した事前学習とDNN学習
RBMを応用した事前学習とDNN学習Masayuki Tanaka
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
Geometri analitik bidang lingkaran
Geometri analitik bidang  lingkaran Geometri analitik bidang  lingkaran
Geometri analitik bidang lingkaran barian11
 
Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)yukihiro domae
 
PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性sleepy_yoshi
 
Prml4.4 ラプラス近似~ベイズロジスティック回帰
Prml4.4 ラプラス近似~ベイズロジスティック回帰Prml4.4 ラプラス近似~ベイズロジスティック回帰
Prml4.4 ラプラス近似~ベイズロジスティック回帰Yuki Matsubara
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian ProcessesDeep Learning JP
 

Tendances (20)

Contoh ruang metrik
Contoh ruang metrikContoh ruang metrik
Contoh ruang metrik
 
Chapter2.3.6
Chapter2.3.6Chapter2.3.6
Chapter2.3.6
 
Teorema Nilai Rata-Rata Cauchy
Teorema Nilai Rata-Rata CauchyTeorema Nilai Rata-Rata Cauchy
Teorema Nilai Rata-Rata Cauchy
 
Analisis real
Analisis realAnalisis real
Analisis real
 
2. ruang vektor dan ruang vektor bagian
2. ruang vektor dan ruang vektor bagian2. ruang vektor dan ruang vektor bagian
2. ruang vektor dan ruang vektor bagian
 
量子アニーリングを用いたクラスタ分析
量子アニーリングを用いたクラスタ分析量子アニーリングを用いたクラスタ分析
量子アニーリングを用いたクラスタ分析
 
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
 
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習
[Ridge-i 論文読み会] ICLR2019における不完全ラベル学習
 
ディープボルツマンマシン入門
ディープボルツマンマシン入門ディープボルツマンマシン入門
ディープボルツマンマシン入門
 
ANALISIS RIIL 1 2.1 ROBERT G BARTLE
ANALISIS RIIL 1 2.1 ROBERT G BARTLEANALISIS RIIL 1 2.1 ROBERT G BARTLE
ANALISIS RIIL 1 2.1 ROBERT G BARTLE
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanisms
 
変分法
変分法変分法
変分法
 
RBMを応用した事前学習とDNN学習
RBMを応用した事前学習とDNN学習RBMを応用した事前学習とDNN学習
RBMを応用した事前学習とDNN学習
 
Teorema isomorfisma ring makalah
Teorema isomorfisma ring makalahTeorema isomorfisma ring makalah
Teorema isomorfisma ring makalah
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Geometri analitik bidang lingkaran
Geometri analitik bidang  lingkaran Geometri analitik bidang  lingkaran
Geometri analitik bidang lingkaran
 
Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)
 
PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性
 
Prml4.4 ラプラス近似~ベイズロジスティック回帰
Prml4.4 ラプラス近似~ベイズロジスティック回帰Prml4.4 ラプラス近似~ベイズロジスティック回帰
Prml4.4 ラプラス近似~ベイズロジスティック回帰
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes
 

En vedette

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEEBEBTECHSTUDENTPROJECTS
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...zukun
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningDavide Nardone
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)Hamid Adldoost
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
 

En vedette (7)

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 

Similaire à Dictionary Learning for Massive Matrix Factorization

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to ReconstructJonas Adler
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesisValentin De Bortoli
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
presentation
presentationpresentation
presentationjie ren
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBigMine
 
Talk icml
Talk icmlTalk icml
Talk icmlBo Li
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsDmitriy Selivanov
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...multimediaeval
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
 

Similaire à Dictionary Learning for Massive Matrix Factorization (20)

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
presentation
presentationpresentation
presentation
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
 
Talk icml
Talk icmlTalk icml
Talk icml
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
 
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
 

Dernier

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 

Dernier (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 

Dictionary Learning for Massive Matrix Factorization

  • 1. Dictionary Learning for Massive Matrix Factorization Arthur Mensch, Julien Mairal, Ga¨el Varoquaux, Bertrand Thirion Inria/CEA Parietal, Inria Thoth June 20, 2016
  • 2. Matrix factorization X p n D p k = A n k 1X ∈ Rp×n = DA ∈ Rp×k × Rk×n Flexible tool for unsupervised data analysis Dataset has lower underlying complexity than appearing size How to scale it to very large datasets ? (Brain imaging, 2TB) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 1 / 19
  • 3. Matrix factorization X p n D p k = A n k 1 Low rank factorization : k < p X p n D p k = A n k 1 ...with optional sparse factors → interpretable data (fMRI, genetics, topic modeling) X p n D p k = A n k 1Overcomplete dictionary learning k p - sparse A [Olshausen and Field, 1997] Arthur Mensch Dictionary Learning for Massive Matrix Factorization 2 / 19
  • 4. Formalism and methods Non-convex formulation min D∈C,A∈Rk×n X − DA 2 2 + λΩ(A) Constraints on D Penalty on A ( 1, 2) Naive resolution Alternated minimization: use full X at each iteration Very slow : single iteration in O(p n) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 3 / 19
  • 5. Online matrix factorization Stream (xt), update D at each t [Mairal et al., 2010] Single iteration in O(p), a few epochs xt p n D p k = αt n k streaming 1 Large n, regular p, eg image patches: p = 256 n ≈ 106 1GB Both (sparse) low-rank factorization / sparse coding Arthur Mensch Dictionary Learning for Massive Matrix Factorization 4 / 19
  • 6. Scaling-up for massive matrices Functional MRI (HCP dataset) Brain “movies” : space × time Extract k sparse networks p = 2 · 105 n = 2 · 106 2 TB Way larger than vision problems Unusual setting: data is large in both directions Also useful in collaborative filtering X Voxels Time = D A k spatial maps Time x Arthur Mensch Dictionary Learning for Massive Matrix Factorization 5 / 19
  • 7. Scaling-up for massive matrices Out-of-the-box online algorithm ? xt p n D p k = αt n k 1Limited time budget ? Need to accomodate large p 235 h run time 1 full epoch 10 h run time 1 24 epoch Arthur Mensch Dictionary Learning for Massive Matrix Factorization 6 / 19
  • 8. Scaling-up in both directions X p n Batch → online xt n Steaming Handle large n 1 xt p n Streaming Mtxt n Streaming Subsampling Handle large p Online → double online 1 Online learning + partial random access to samples Arthur Mensch Dictionary Learning for Massive Matrix Factorization 7 / 19
  • 9. Scaling-up in both directions xt p n Streaming Mtxt n Streaming Subsampling Handle large p Online → double online 1 Low-distorsion lemma [Johnson and Lindenstrauss, 1984] Random linear alebra [Halko et al., 2009] Sketching for data reduction [Pilanci and Wainwright, 2014] Arthur Mensch Dictionary Learning for Massive Matrix Factorization 8 / 19
  • 10. Algorithm design Online dictionary learning [Mairal et al., 2010] 1 Compute code – O(p) αt = argmin α∈Rk xt − Dt−1α 2 2 + λΩ(αt) 2 Update surrogate – O(p) gt = 1 t t i=1 xi − Dαi 2 2 3 Minimize surrogate – O(p) Dt = argmin D∈C gt(D) = argmin D∈C Tr (D DAt − D Bt) xt access → O(p) algorithm (complexity dependency in p) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 9 / 19
  • 11. Introducing subsampling Iteration cost in O(p): can we reduce it? xt → Mtxt, p → rk Mt = s Use only Mtxt in algorithm computation: complexity in O(s) Mtxt p n Streaming Subsampling 1Our contribution Adapt the 3 parts of the algorith to obtain O(s) complexity 1 Code computation 2 Surrogate update 3 Surrogate minimization [Szab´o et al., 2011]: dictionary learning with missing value – O(p) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 10 / 19
  • 12. 1. Code computation Linear regression with random sampling αt = argmin α∈Rk Mt(xt − Dt−1αt) 2 2 + λ rk Mt p Ω(α) approximative solution of αt = argmin α∈Rk xt − Dt−1αt 2 2 + λΩ(α) validity in high dimension, with incoherent features: D MtD ≈ s p D D D Mtxt ≈ s p D xt Arthur Mensch Dictionary Learning for Massive Matrix Factorization 11 / 19
  • 13. 2. Surrogate update Original algorithm: At and Bt used in dictionary update At = 1 t t i=1 αi αi same as in online algorithm Bt = (1 − 1 t )Bt−1 + 1 t xtαt = 1 t t i=1 xi αi Forbidden Partial update of Bt at each iteration Bt = 1 t i=1 Mi t i=1 Mi xi αi Only MtB is updated Behaves like Ex[xα] for large t Arthur Mensch Dictionary Learning for Massive Matrix Factorization 12 / 19
  • 14. 3. Surrogate minimization Original algorithm : block coordinate descent with projection on C min D∈C gt(D) Dj ← p⊥ Cj (Dj − 1 Aj,j (D(At)j − (Bt)j )) Forbidden update of full D at iteration t Cautious update Leave dictionary unchanged for unseen features (I − Mt) min D∈C (I−Mt )D=(I−Mt )Dt−1 gt(D) O(s) update in block coordinate descent Dj ← p⊥ Cr j (Dj − 1 (At)j,j (Mt(D(At)j − (Bt)j ))) 1 ball Cr j = {D ∈ C, MtD 1 ≤ MtDt−1 1} Arthur Mensch Dictionary Learning for Massive Matrix Factorization 13 / 19
  • 15. Resting-state fMRI HCP dataset One brain image per second 200 sessions n = 2 · 106 p = 2 · 105 D AX Voxels Time = k spatial maps Time x Sparse decomposition k = 70 C = Bk 1 Ω = · 2 2 Validation Increase reduction factor p s Objective function on test set vs CPU time Arthur Mensch Dictionary Learning for Massive Matrix Factorization 14 / 19
  • 16. Resting-state fMRI Online dictionary learning 235 h run time 1 full epoch 10 h run time 1 24 epoch Proposed method 10 h run time 1 2 epoch, reduction r=12 Qualitatively, usable maps are obtained 10× faster Arthur Mensch Dictionary Learning for Massive Matrix Factorization 15 / 19
  • 17. Resting-state fMRI .1 h 1 h 10 h 100 h CPU time 2.20 2.25 2.30 2.35 2.40 Objectivevalueontestset ×108 Original online algorithm Reduction factor r 4 8 12 No reduction Speed-up close to reduction factor p s Arthur Mensch Dictionary Learning for Massive Matrix Factorization 16 / 19
  • 18. Resting-state fMRI 100 1000 Epoch 4000Records 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 Objectivevalueontestset ×108 λ = 10−4 No reduction (original alg.) r = 4 r = 8 r = 12 Convergence speed / number of seen records Information is acquired faster Arthur Mensch Dictionary Learning for Massive Matrix Factorization 17 / 19
  • 19. Collaborative filtering Mtxt movie ratings from user t vs. coordinate descent for MMMF loss (no hyperparameters) 100 s 1000 s 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Netflix (140M) Coordinate descent Proposed (full projection) Proposed (partial projection) Dataset Test RMSE Speed CD MODL -up ML 1M 0.872 0.866 ×0.75 ML 10M 0.802 0.799 ×3.7 NF (140M) 0.938 0.934 ×6.8 Outperform coordinate descent beyond 10M ratings Same prediction performance Speed-up 6.8× on Netflix Arthur Mensch Dictionary Learning for Massive Matrix Factorization 18 / 19
  • 20. Conclusion Take-home message Loading stochastic subsets of sample streams can drastically accelerates online matrix factorization Mtxt p n Streaming Subsampling 1 Reduce CPU (+IO) load at each iteration cf Gradient Descent vs SGD An order of magnitude speed-up on two different problems Python package http://github.com/arthurmensch/modl Heuristic at contribution time A follow-up algorithm has convergence guarantees Questions ? (Poster # 41 this afternoon) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 19 / 19
  • 21. Bibliography I [Halko et al., 2009] Halko, N., Martinsson, P.-G., and Tropp, J. A. (2009). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061 [math]. [Johnson and Lindenstrauss, 1984] Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics, 26(189-206):1. [Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11:19–60. [Olshausen and Field, 1997] Olshausen, B. A. and Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325. Arthur Mensch Dictionary Learning for Massive Matrix Factorization 20 / 19
  • 22. Bibliography II [Pilanci and Wainwright, 2014] Pilanci, M. and Wainwright, M. J. (2014). Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares. arXiv:1411.0347 [cs, math, stat]. [Szab´o et al., 2011] Szab´o, Z., P´oczos, B., and Lorincz, A. (2011). Online group-structured dictionary learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2865–2872. IEEE. [Yu et al., 2012] Yu, H.-F., Hsieh, C.-J., and Dhillon, I. (2012). Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In Proceedings of the International Conference on Data Mining, pages 765–774. IEEE. Arthur Mensch Dictionary Learning for Massive Matrix Factorization 21 / 19
  • 24. Collaborative filtering Streaming uncomplete data Mt is imposed by user t Data stream : Mtxt movies ranked by user t Proposed by [Szab´o et al., 2011]), with O(p) complexity Validation: Test RMSE (rating prediction) vs CPU time Baseline: Coordinate descent solver [Yu et al., 2012] solving related loss n i=1 ( Mt(Xt − Dαt) 2 2 + λ αt 2 2) + λ D 2 2 Fastest solver available apart from SGD – no hyperparameters Our method is not sensitive to hyperparameters Arthur Mensch Dictionary Learning for Massive Matrix Factorization 22 / 19
  • 25. Algorithm Our algorithm 1 Code computation αt = argmin α∈Rk Mt (xt − Dt−1α) 2 2 + λ rk Mt p Ω(αt ) 2 Surrogate aggregation At = 1 t t i=1 αi αi Bt = Bt−1 + 1 t i=1 Mi (Mt xt αt − Mt Bt−1) 3 Surrogate minimization Mt Dj ← p⊥ Cj (Mt Dj − 1 (At )j,j Mt (D(At )j −(Bt )j )) Original online MF 1 Code computation αt = argmin α∈Rk xt − Dt−1α 2 2 + λΩ(αt ) 2 Surrogate aggregation At = 1 t t i=1 αi αi Bt = Bt−1 + 1 t (xt αt − Bt−1) 3 Surrogate minimization Dj ← p⊥ Cr j (Dj − 1 (At )j,j (D(At )j −(Bt )j )) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 23 / 19