SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Sparse Matrix Reconstruction
Michael Hankin
University of Southern California
mhankin@usc.edu

December 5, 2013

Michael Hankin (USC)

Matrix Completion

December 5, 2013

1 / 28
Overview
1

Initial Problem

2

Algorithm
Algorithm Explanation

3

Convergence

4

Extensions

5

Demos

6

Other topics

7

References

Michael Hankin (USC)

Matrix Completion

December 5, 2013

2 / 28
Overview of Matrix Completion Problem

Motivation: Say that Netflix has NMovies movies and NUsers users. Given
universal knowledge they could construct an NMovies xNUsers matrix of
ratings and thus predict which movies their users would enjoy, and how
much so.
However, all they have are the few ratings their users have taken the time
to input, and the data on which accounts have watched which movies.
Can the full matrix be reconstructed from this VERY sparse, noisy
sample?

Michael Hankin (USC)

Matrix Completion

December 5, 2013

3 / 28
Overview of Matrix Completion Problem

Idea: Without some constraint the values of the missing points could be
any real (or even complex!) number. Obviously we have to impose some
restrictions beginning with real numbers only!
Less obvious is the condition that the matrix be of low rank. In the Netflix
problem this is obvious: there really aren’t that many types of people (as
far a taste profile goes) or movies (as far as genre/appeal profile goes).
However this condition is relevant in many other scenarios as well.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

4 / 28
Notational Interlude

For a matrix X define the nuclear norm to be
r

X

∗

|σi |

=
i=1

where the σi ’s are the singular values of the matrix and r is its rank (and
therefore the number of nonzero singular values).
Grievously abusing notation, we might say X ∗ = σ 1
If the true matrix is M and we observe only Mi,j ∀(i, j) ∈ Ω for some Ω
then let
Xi,j : (i, j) ∈ Ω
PΩ (X ) =
0
: (i, j) ∈ Ω
/

Michael Hankin (USC)

Matrix Completion

December 5, 2013

5 / 28
Problem statement

Given PΩ (M) and the knowledge that M is of low rank, recover M.
To do so we work with an approximation X . We want to minimize the
rank of X , however a direct approach would be NP-hard, in the same way
that σ 0 would be, so we relax our conditions in the same vein as
LASSO, and set up the problem:
min X

∗

≈ min σ

1

(1)

s.t. PΩ (M) = PΩ (X )

we end up with something slightly resembling the Dantzig selector, which
we know gives sparse results, and sparsity in σ is equivalent to low rank for
X.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

6 / 28
To expand on this notion, consider the Dantzig case.
Level sets of
1 can be visually represented as the
diamond in the image to the right.
1

In the case of LASSO regression
1 ≤ 1 can be considered to be the
convex hull of the single-parameter euclidean basis vectors of unit length.
In the nuclear norm case,
∗ ≤ 1 is just the convex hull of the set of
rank 1 matrices whose spectral norm X 2 ≤ 1 (keeping in mind that
X 2 = σ ∞)
The solution to the previous minimization problem is the point at which
the smallest level set of the nuclear norm to intersect the subspace
{X : PΩ (M) = PΩ (X )} does so. Using the spatial intuition gleaned from
our study of LASSO we recognize that this will give a sparse set of singular
values, and therefore a low rank matrix, that agrees with M on all of Ω.

1

Credit to Nicolai Meinshausen: http://www.stats.ox.ac.uk/~meinshau/
Michael Hankin (USC)

Matrix Completion

December 5, 2013

7 / 28
Algorithm Background
Candes, Cai, and Shen introduced an algorithm that comes close to
solving our problem.
Let X be of low rank r and UΣV ∗ be its SVD, where
Σ = diag ({σi }1≤i≤r ) (because it has only r nonzero singular values.

N
ext they define the soft-thresholding operator:
Dτ (X ) = UDτ (Σ)V ∗
Dτ (Σ) = diag ({(σi − τ )+ }1≤i≤r )
for τ > 0, so that it shrinks all of the singular values of X , setting any that
were originally ≤ τ to 0, thereby reducing its rank.
Note: Dτ (X ) = arg minY 1 Y − X 2 + τ Y
F
2
This will affect the output of the algorithm.
Michael Hankin (USC)

Matrix Completion

∗

.

December 5, 2013

8 / 28
Algorithm

Start with some Y 0 that vanishes outside of Ω (an efficient choice for
Y 0 will be discussed later, but for now just use 0 or even M.)
Choose values for τ > 0 and a sequence δk corresponding to step sizes
At step k set X k = Dτ (Y k−1 )
Then set Y k = Y k−1 + δk PΩ (M − X k )

Michael Hankin (USC)

Matrix Completion

December 5, 2013

9 / 28
Algorithm Discussion
Notes on the algorithm:

Low Rank
The X k ’s will tend to have low rank, unless to many of the singular values
end up growing beyond τ so that further iterations do not lower the rank.
Both the authors and I found (empirically) that the rank of the X k ’s tend
to start low, and grow to a stable point after a few dozen iterations. As
long as the original matrix was of low rank, this stable point also tends to
be of low rank. Unfortunately the authors have been unable to prove this.
When the dimensions of X k are high, this low rank property allows us to
economize on memory by maintaining only the portion of its SVD
corresponding to non-0’d singular values instead of the entire, dense
matrix itself.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

10 / 28
Algorithm Discussion

Notes on the algorithm:

Sparsity
The Y k ’s will always be sparse, and vanish outside of Ω. This is obvious
because we require that Y 0 be either equal to 0 or at least vanish outside
of Ω. PΩ (M − X k ) vanishes outside of Ω by definition, and if we assume
Y k−1 does to, then Y k = Y k−1 + PΩ (M − X k ) must have the same
property, and is therefore sparse.
This lessens our storage requirements (though we must still maintain the
dense matrices X k ) but more importantly it makes computing the SVD of
Y k much faster as long as clever computational approaches and a sparse
solver are used.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

11 / 28
Proof that algorithm gives a solution to
1
X 2
F
2
s.t. PΩ (M) = PΩ (X )

min τ X

Michael Hankin (USC)

∗

+

Matrix Completion

(2)

December 5, 2013

12 / 28
Convergence Significance

Figure : Convergence towards true value for different tau and delta values

Michael Hankin (USC)

Matrix Completion

December 5, 2013

13 / 28
Convergence Significance

Figure : Convergence towards true value for different tau and delta values

Michael Hankin (USC)

Matrix Completion

December 5, 2013

14 / 28
Convergence Significance

As seen in the proof, the algorithm converges to the solution of:
1
X 2
F
2
s.t. PΩ (M) = PΩ (X )

min τ X

Michael Hankin (USC)

∗

+

Matrix Completion

(3)

December 5, 2013

15 / 28
Convergence Significance

Why is a solution to
1
X 2
F
2
s.t. PΩ (M) = PΩ (X )

min τ X

∗

+

(4)

satisfactory when we’re looking for a solution to
min X

∗

(5)

s.t. PΩ (M) = PΩ (X )

Michael Hankin (USC)

Matrix Completion

December 5, 2013

16 / 28
Proof of adequacy in a more general case.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

17 / 28
Convergence Significance

Figure : Convergence towards true value for different tau and delta values

Michael Hankin (USC)

Matrix Completion

December 5, 2013

18 / 28
General Convex Constraints

Cai Candes and Shen extend their algorithm to the more general case,
addressed in the previous proof:
min fτ (X )

(6)

s.t. fi (X ) ≤ 0 ∀i
Where the fi (X )’s are convex, left semi-continuous functionals.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

19 / 28
Generalized Algorithm

In that case, the algorithm is as follows:
Denote F(X ) = (f1 (X )...fn (X )) and initialize y 0
X k = arg minX fτ (X ) + y k−1 , F(X )
y k = (y k−1 + δk F(X k ))+
In the special case where the constraints are linear, ie A(X ) ≤ b for some
linear functional A, the iterations are as follows:
X k = Dτ (A∗ (y k−1 ))
y k = (y k−1 + δk (b − A(X k ))+
Consider b = {Mi,j }(i,j)∈Ω , A(X ) = {Mi,j }(i,j)∈Ω , and its adjoint A∗ (y )
mapping y to a sparse matrix X with entries only on indices in Ω and
values equal to those in y .

Michael Hankin (USC)

Matrix Completion

December 5, 2013

20 / 28
Use Case

Noise!
If our data is noisy we can use |Xi,j − Mi,j | < ∀(i, j) ∈ Ω
This is

Example
Triangulation If the matrix in question is distances between points we can
fill in the relative locations with just a few entries.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

21 / 28
Noise Free

Figure : Rank 10 matrix

Michael Hankin (USC)

Matrix Completion

December 5, 2013

22 / 28
Noisy

Figure : Rank 10 matrix with a little noise, using exact matrix reconstruction

Michael Hankin (USC)

Matrix Completion

December 5, 2013

23 / 28
Images

Michael Hankin (USC)

Matrix Completion

December 5, 2013

24 / 28
Images

Michael Hankin (USC)

Matrix Completion

December 5, 2013

25 / 28
Stopping Criteria
Because we expect PΩ (M − X ) to converge to zero the authors suggest
using PΩ (M−X ) F ≤ as a stopping criteria. Because I generated my own
PΩ (M) F
data I can actually plot

M−X
M F

F

Figure : Rank 10 matrix with no noise
Michael Hankin (USC)

Matrix Completion

December 5, 2013

26 / 28
WORK IN PROGRESS When can a matrix be reconstructed, and how
much data is required? The most obvious issues arise when either a row
or a column of PΩ (M) is all 0. In that case nothing can be done as that
row (or column) could be totally independent of the others.
Along those lines, if any row or column in the unshredded M is all 0, we
are out of luck, as PΩ (M) must also have a 0 row (or column). Even when
there are no such rows or columns in M, if any of its singular vectors are
too heavily skewed in a euclidean basis direction, the likelihood of one of
the rows (or columns) of PΩ (M) being 0 is high. Also, note that an n1 xn2
matrix of rank r has (n1 − r )r + r 2 + (n2 − r )r degrees of freedom.

Michael Hankin (USC)

Matrix Completion

December 5, 2013

27 / 28
References

Cai, J.-F., Cands, E. J. and Shen, Z. (2010)
A singular value thresholding algorithm for matrix completion.
SIAM J. Optim. 20, 1956-1982.
Cands, E. J. and Plan, Y. (2010)
Matrix completion with noise
Proceedings of the IEEE 98, 925-936

Michael Hankin (USC)

Matrix Completion

December 5, 2013

28 / 28
The End

Michael Hankin (USC)

Matrix Completion

December 5, 2013

29 / 28

Contenu connexe

Tendances

Propositional logic
Propositional logicPropositional logic
Propositional logic
Rushdi Shams
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
RohanBorgalli
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
mandlapure
 

Tendances (20)

Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
Quick sort
Quick sortQuick sort
Quick sort
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Fuzzy Logic
Fuzzy LogicFuzzy Logic
Fuzzy Logic
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP Completeness
 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
 
Backtracking & branch and bound
Backtracking & branch and boundBacktracking & branch and bound
Backtracking & branch and bound
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
A* Algorithm
A* AlgorithmA* Algorithm
A* Algorithm
 
Merge sort algorithm
Merge sort algorithmMerge sort algorithm
Merge sort algorithm
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Inference in First-Order Logic 2
Inference in First-Order Logic 2Inference in First-Order Logic 2
Inference in First-Order Logic 2
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
 

En vedette

Chapter 4: Linear Algebraic Equations
Chapter 4: Linear Algebraic EquationsChapter 4: Linear Algebraic Equations
Chapter 4: Linear Algebraic Equations
Maria Fernanda
 

En vedette (12)

Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...
 
Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm
Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication AlgorithmMidpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm
Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm
 
Math Geophysics-system of linear algebraic equations
Math Geophysics-system of linear algebraic equationsMath Geophysics-system of linear algebraic equations
Math Geophysics-system of linear algebraic equations
 
What is sparse matrix
What is sparse matrixWhat is sparse matrix
What is sparse matrix
 
Matlab dc circuit analysis
Matlab dc circuit analysisMatlab dc circuit analysis
Matlab dc circuit analysis
 
Chapter 4: Linear Algebraic Equations
Chapter 4: Linear Algebraic EquationsChapter 4: Linear Algebraic Equations
Chapter 4: Linear Algebraic Equations
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
System of linear algebriac equations nsm
System of linear algebriac equations nsmSystem of linear algebriac equations nsm
System of linear algebriac equations nsm
 
Crout s method for solving system of linear equations
Crout s method for solving system of linear equationsCrout s method for solving system of linear equations
Crout s method for solving system of linear equations
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 

Similaire à Matrix Completion Presentation

20070823
2007082320070823
20070823
neostar
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
Ryen Krusinga
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
Sean Golliher
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
NYversity
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhad
Farhad Gholami
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
nozomuhamada
 

Similaire à Matrix Completion Presentation (20)

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
 
20070823
2007082320070823
20070823
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
A Comparison Of Methods For Solving MAX-SAT Problems
A Comparison Of Methods For Solving MAX-SAT ProblemsA Comparison Of Methods For Solving MAX-SAT Problems
A Comparison Of Methods For Solving MAX-SAT Problems
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhad
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 

Dernier

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Dernier (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 

Matrix Completion Presentation

  • 1. Sparse Matrix Reconstruction Michael Hankin University of Southern California mhankin@usc.edu December 5, 2013 Michael Hankin (USC) Matrix Completion December 5, 2013 1 / 28
  • 2. Overview 1 Initial Problem 2 Algorithm Algorithm Explanation 3 Convergence 4 Extensions 5 Demos 6 Other topics 7 References Michael Hankin (USC) Matrix Completion December 5, 2013 2 / 28
  • 3. Overview of Matrix Completion Problem Motivation: Say that Netflix has NMovies movies and NUsers users. Given universal knowledge they could construct an NMovies xNUsers matrix of ratings and thus predict which movies their users would enjoy, and how much so. However, all they have are the few ratings their users have taken the time to input, and the data on which accounts have watched which movies. Can the full matrix be reconstructed from this VERY sparse, noisy sample? Michael Hankin (USC) Matrix Completion December 5, 2013 3 / 28
  • 4. Overview of Matrix Completion Problem Idea: Without some constraint the values of the missing points could be any real (or even complex!) number. Obviously we have to impose some restrictions beginning with real numbers only! Less obvious is the condition that the matrix be of low rank. In the Netflix problem this is obvious: there really aren’t that many types of people (as far a taste profile goes) or movies (as far as genre/appeal profile goes). However this condition is relevant in many other scenarios as well. Michael Hankin (USC) Matrix Completion December 5, 2013 4 / 28
  • 5. Notational Interlude For a matrix X define the nuclear norm to be r X ∗ |σi | = i=1 where the σi ’s are the singular values of the matrix and r is its rank (and therefore the number of nonzero singular values). Grievously abusing notation, we might say X ∗ = σ 1 If the true matrix is M and we observe only Mi,j ∀(i, j) ∈ Ω for some Ω then let Xi,j : (i, j) ∈ Ω PΩ (X ) = 0 : (i, j) ∈ Ω / Michael Hankin (USC) Matrix Completion December 5, 2013 5 / 28
  • 6. Problem statement Given PΩ (M) and the knowledge that M is of low rank, recover M. To do so we work with an approximation X . We want to minimize the rank of X , however a direct approach would be NP-hard, in the same way that σ 0 would be, so we relax our conditions in the same vein as LASSO, and set up the problem: min X ∗ ≈ min σ 1 (1) s.t. PΩ (M) = PΩ (X ) we end up with something slightly resembling the Dantzig selector, which we know gives sparse results, and sparsity in σ is equivalent to low rank for X. Michael Hankin (USC) Matrix Completion December 5, 2013 6 / 28
  • 7. To expand on this notion, consider the Dantzig case. Level sets of 1 can be visually represented as the diamond in the image to the right. 1 In the case of LASSO regression 1 ≤ 1 can be considered to be the convex hull of the single-parameter euclidean basis vectors of unit length. In the nuclear norm case, ∗ ≤ 1 is just the convex hull of the set of rank 1 matrices whose spectral norm X 2 ≤ 1 (keeping in mind that X 2 = σ ∞) The solution to the previous minimization problem is the point at which the smallest level set of the nuclear norm to intersect the subspace {X : PΩ (M) = PΩ (X )} does so. Using the spatial intuition gleaned from our study of LASSO we recognize that this will give a sparse set of singular values, and therefore a low rank matrix, that agrees with M on all of Ω. 1 Credit to Nicolai Meinshausen: http://www.stats.ox.ac.uk/~meinshau/ Michael Hankin (USC) Matrix Completion December 5, 2013 7 / 28
  • 8. Algorithm Background Candes, Cai, and Shen introduced an algorithm that comes close to solving our problem. Let X be of low rank r and UΣV ∗ be its SVD, where Σ = diag ({σi }1≤i≤r ) (because it has only r nonzero singular values. N ext they define the soft-thresholding operator: Dτ (X ) = UDτ (Σ)V ∗ Dτ (Σ) = diag ({(σi − τ )+ }1≤i≤r ) for τ > 0, so that it shrinks all of the singular values of X , setting any that were originally ≤ τ to 0, thereby reducing its rank. Note: Dτ (X ) = arg minY 1 Y − X 2 + τ Y F 2 This will affect the output of the algorithm. Michael Hankin (USC) Matrix Completion ∗ . December 5, 2013 8 / 28
  • 9. Algorithm Start with some Y 0 that vanishes outside of Ω (an efficient choice for Y 0 will be discussed later, but for now just use 0 or even M.) Choose values for τ > 0 and a sequence δk corresponding to step sizes At step k set X k = Dτ (Y k−1 ) Then set Y k = Y k−1 + δk PΩ (M − X k ) Michael Hankin (USC) Matrix Completion December 5, 2013 9 / 28
  • 10. Algorithm Discussion Notes on the algorithm: Low Rank The X k ’s will tend to have low rank, unless to many of the singular values end up growing beyond τ so that further iterations do not lower the rank. Both the authors and I found (empirically) that the rank of the X k ’s tend to start low, and grow to a stable point after a few dozen iterations. As long as the original matrix was of low rank, this stable point also tends to be of low rank. Unfortunately the authors have been unable to prove this. When the dimensions of X k are high, this low rank property allows us to economize on memory by maintaining only the portion of its SVD corresponding to non-0’d singular values instead of the entire, dense matrix itself. Michael Hankin (USC) Matrix Completion December 5, 2013 10 / 28
  • 11. Algorithm Discussion Notes on the algorithm: Sparsity The Y k ’s will always be sparse, and vanish outside of Ω. This is obvious because we require that Y 0 be either equal to 0 or at least vanish outside of Ω. PΩ (M − X k ) vanishes outside of Ω by definition, and if we assume Y k−1 does to, then Y k = Y k−1 + PΩ (M − X k ) must have the same property, and is therefore sparse. This lessens our storage requirements (though we must still maintain the dense matrices X k ) but more importantly it makes computing the SVD of Y k much faster as long as clever computational approaches and a sparse solver are used. Michael Hankin (USC) Matrix Completion December 5, 2013 11 / 28
  • 12. Proof that algorithm gives a solution to 1 X 2 F 2 s.t. PΩ (M) = PΩ (X ) min τ X Michael Hankin (USC) ∗ + Matrix Completion (2) December 5, 2013 12 / 28
  • 13. Convergence Significance Figure : Convergence towards true value for different tau and delta values Michael Hankin (USC) Matrix Completion December 5, 2013 13 / 28
  • 14. Convergence Significance Figure : Convergence towards true value for different tau and delta values Michael Hankin (USC) Matrix Completion December 5, 2013 14 / 28
  • 15. Convergence Significance As seen in the proof, the algorithm converges to the solution of: 1 X 2 F 2 s.t. PΩ (M) = PΩ (X ) min τ X Michael Hankin (USC) ∗ + Matrix Completion (3) December 5, 2013 15 / 28
  • 16. Convergence Significance Why is a solution to 1 X 2 F 2 s.t. PΩ (M) = PΩ (X ) min τ X ∗ + (4) satisfactory when we’re looking for a solution to min X ∗ (5) s.t. PΩ (M) = PΩ (X ) Michael Hankin (USC) Matrix Completion December 5, 2013 16 / 28
  • 17. Proof of adequacy in a more general case. Michael Hankin (USC) Matrix Completion December 5, 2013 17 / 28
  • 18. Convergence Significance Figure : Convergence towards true value for different tau and delta values Michael Hankin (USC) Matrix Completion December 5, 2013 18 / 28
  • 19. General Convex Constraints Cai Candes and Shen extend their algorithm to the more general case, addressed in the previous proof: min fτ (X ) (6) s.t. fi (X ) ≤ 0 ∀i Where the fi (X )’s are convex, left semi-continuous functionals. Michael Hankin (USC) Matrix Completion December 5, 2013 19 / 28
  • 20. Generalized Algorithm In that case, the algorithm is as follows: Denote F(X ) = (f1 (X )...fn (X )) and initialize y 0 X k = arg minX fτ (X ) + y k−1 , F(X ) y k = (y k−1 + δk F(X k ))+ In the special case where the constraints are linear, ie A(X ) ≤ b for some linear functional A, the iterations are as follows: X k = Dτ (A∗ (y k−1 )) y k = (y k−1 + δk (b − A(X k ))+ Consider b = {Mi,j }(i,j)∈Ω , A(X ) = {Mi,j }(i,j)∈Ω , and its adjoint A∗ (y ) mapping y to a sparse matrix X with entries only on indices in Ω and values equal to those in y . Michael Hankin (USC) Matrix Completion December 5, 2013 20 / 28
  • 21. Use Case Noise! If our data is noisy we can use |Xi,j − Mi,j | < ∀(i, j) ∈ Ω This is Example Triangulation If the matrix in question is distances between points we can fill in the relative locations with just a few entries. Michael Hankin (USC) Matrix Completion December 5, 2013 21 / 28
  • 22. Noise Free Figure : Rank 10 matrix Michael Hankin (USC) Matrix Completion December 5, 2013 22 / 28
  • 23. Noisy Figure : Rank 10 matrix with a little noise, using exact matrix reconstruction Michael Hankin (USC) Matrix Completion December 5, 2013 23 / 28
  • 24. Images Michael Hankin (USC) Matrix Completion December 5, 2013 24 / 28
  • 25. Images Michael Hankin (USC) Matrix Completion December 5, 2013 25 / 28
  • 26. Stopping Criteria Because we expect PΩ (M − X ) to converge to zero the authors suggest using PΩ (M−X ) F ≤ as a stopping criteria. Because I generated my own PΩ (M) F data I can actually plot M−X M F F Figure : Rank 10 matrix with no noise Michael Hankin (USC) Matrix Completion December 5, 2013 26 / 28
  • 27. WORK IN PROGRESS When can a matrix be reconstructed, and how much data is required? The most obvious issues arise when either a row or a column of PΩ (M) is all 0. In that case nothing can be done as that row (or column) could be totally independent of the others. Along those lines, if any row or column in the unshredded M is all 0, we are out of luck, as PΩ (M) must also have a 0 row (or column). Even when there are no such rows or columns in M, if any of its singular vectors are too heavily skewed in a euclidean basis direction, the likelihood of one of the rows (or columns) of PΩ (M) being 0 is high. Also, note that an n1 xn2 matrix of rank r has (n1 − r )r + r 2 + (n2 − r )r degrees of freedom. Michael Hankin (USC) Matrix Completion December 5, 2013 27 / 28
  • 28. References Cai, J.-F., Cands, E. J. and Shen, Z. (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956-1982. Cands, E. J. and Plan, Y. (2010) Matrix completion with noise Proceedings of the IEEE 98, 925-936 Michael Hankin (USC) Matrix Completion December 5, 2013 28 / 28
  • 29. The End Michael Hankin (USC) Matrix Completion December 5, 2013 29 / 28