1. Sparse Coding
Shao-Chuan Wang
Review of PCA
A Friendly Guide To Sparse Coding Introducing
Sparsity
Solving the
Optimization
Problem
Shao-Chuan Wang
Learning
Dictionary
Research Center for Information Technology Innovation
Applications
Academia Sinica
E-mail: scwang ASCII(64) ntu.edu.tw
December 3, 2009
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 1 / 18
2. Outline
Sparse Coding
Shao-Chuan Wang
1 Review of PCA Review of PCA
Introducing
Sparsity
2 Introducing Sparsity Solving the
Optimization
Problem
3 Solving the Optimization Problem Learning
Dictionary
Applications
4 Learning Dictionary
5 Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 2 / 18
3. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
x∈ m,
D = [d1 , d2 , d3 , ...dp ] ∈ where dj ∈ m×p , If x m. Introducing
Sparsity
can be approximated by the linear combination of D, i.e., Solving the
Optimization
Problem
x ∼ x = Dα,
ˆ (1) Learning
Dictionary
where α ∈ p and α is new coordinate in terms of the new Applications
basis D.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 3 / 18
4. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
Introducing
Sparsity
We want x is as close as possible to x, i.e., minimize
ˆ
Solving the
reconstruction error; If we define the error metric, L2 norm Optimization
Problem
for instance,
Error = x − Dα 2 2 (2) Learning
Dictionary
Applications
How to get D?
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 4 / 18
5. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
Introducing
If our goal is to minimize total error, then given a dataset Sparsity
S = {x (i) , y (i) }N ...
i=0
Solving the
Optimization
Problem
min x (i) − Dα(i) 2
2 (3) Learning
Dictionary
D,α
i Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 5 / 18
6. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
Without loss of generality, let’s assume diT dj = δij (For any Introducing
Sparsity
vectors spaces, the basis can be orthonormalized by
Solving the
Gram-Schmidt process), from Eq. (1) we know that D T Optimization
Problem
satisfies D T x = D T x = α.
ˆ
Learning
Dictionary
min x (i) − DD T x (i) 2
2 (4) Applications
D
i
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 6 / 18
7. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
Using Pythagorean theorem, (4) becomes,
Introducing
Sparsity
(i) T (i) 2
min x − DD x 2 Solving the
D Optimization
i Problem
= min ( x (i) 2
2 − DD T x (i) 2
2) Learning
D Dictionary
i i
Applications
ˆ
⇒ D = arg max DD T x (i) 2
2
D
i
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 7 / 18
8. PCA Review
Sparse Coding
Shao-Chuan Wang
Review of PCA
This optimization problem can be rewritten as
Introducing
Sparsity
ˆ
D = arg max DD x T (i) 2
2 Solving the
D Optimization
i Problem
= arg max djT ( x (i) (x (i) )T )dk , Learning
Dictionary
D
j,k i
Applications
and solve the eigenvalue problems of covariance matrix
(i) (i) T
i x (x ) .
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 8 / 18
9. Introducing Sparsity
Sparse Coding
Shao-Chuan Wang
How about regularization? Review of PCA
Introducing
Sparsity
min x (i) − Dα(i) 2
2 +λψ(α), λ ≥ 0, Solving the
D,α
i Optimization
Problem
where λψ(α) is called regularization, or sparsity, or prior Learning
Dictionary
term, and λ is the strength of regularization. Intuitively, Applications
ψ(α) is a term to ”confine” the ”quota” of αi and therefore
make α ”sparse”. In fact, regularized linear regression also
introduces the sparsity on θ coefficients.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 9 / 18
10. Introducing Sparsity
Sparse Coding
Shao-Chuan Wang
Review of PCA
Hence, we can conclude that sparse coding is a more Introducing
Sparsity
generalized form of principle component analysis. (PCA + Solving the
Sparsity = Sparse PCA (Zou et al., 2004)). diT dj may = 0. Optimization
Problem
Also if m = p, then no dimension ”reduction” anymore, and Learning
Dictionary
only sparsity affect the basis. Or even, we can make p > m,
Applications
using an over-complete basis and let sparsity dominate D
and α.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 10 / 18
11. Solve the Optimization Problem
Sparse Coding
Shao-Chuan Wang
Review of PCA
Introducing
How to solve the optimization problem? ⇒ Too Hard!. Sparsity
Solving the
Hence, we assume D is known first (i.e., designed D). Two Optimization
greedy algorithms are the most popular: Problem
Learning
Matching Pursuit Dictionary
Applications
Orthogonal Matching Pursuit
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 11 / 18
12. Matching Pursuit
Sparse Coding
2
minp x − Dα 2 s.t. α 0 ≤L (5) Shao-Chuan Wang
α∈
r Review of PCA
1: α ← 0. Introducing
Sparsity
2: r ← x (residual).
Solving the
3: while α 0 < L do Optimization
Problem
Pick the element who correlates the most with the
Learning
residual. Dictionary
Applications
ˆ ← arg maxi=1,...,p
i diT r
Subtract the contribution and update α
α[ˆ ← α[ˆ + dˆ r
i] i] i
T
T
r ← r − (dˆ r )dˆ
i i
end while
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 12 / 18
13. Orthogonal Matching Pursuit
Sparse Coding
2
minp x − Dα 2 s.t. α 0 ≤L (6) Shao-Chuan Wang
α∈
r Review of PCA
1: Γ = ø. Introducing
Sparsity
2: while α 0 < L do
Solving the
Pick the element that most reduces the objective Optimization
Problem
ˆ ← arg mini∈ΓC {minα x − DΓ
i {i} α
2} Learning
2 Dictionary
Applications
Update the active set: Γ ← Γ {ˆ
i}.
Update α and the residual
αΓ ← (DΓ D Γ )−1 D Γ T x,
T
r ← x − DαΓ .
end while
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 13 / 18
14. Learning Dictionary
Sparse Coding
Shao-Chuan Wang
How do we learn D from the data? Review of PCA
Introducing
min x (i) − Dα(i) 2
2 +λ α 0,1,2 , λ ≥ 0, (7) Sparsity
D,α
i Solving the
Optimization
Problem
Learning
Brute force Dictionary
K-means-like Applications
FOCUSS (K. Engan et al., 2003)
K-SVD (M. Aharon et al., 2005)
Online Dictionary Learning (J. Mairal et al., 2009)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 14 / 18
15. K-SVD (M. Aharon et al., 2005)
1: Initialize D ∈ m×k with random normalized dictionary; Sparse Coding
2: Repeat until convergence { Shao-Chuan Wang
Sparse Coding Stage: Review of PCA
Use pursuit algorithm to compute sparse code α(i) of x (i) Introducing
Sparsity
Codebook Update Stage:
Solving the
For j = 1, 2, ..., k do { Optimization
Problem
Define the cluster of examples that use dj
ω ← {i | 1 ≤ i ≤ M, α(i) [j] = 0}. Learning
Dictionary
For each i ∈ ω do r (i) ← x (i) − Dα(i) . Applications
ˆ ˆ
d, β ← arg min r (i) + α(i) [j]dj − d β 2 ,
2
d ,β∈ |ω|
ı∈ω
dj ˆ ˆ
← d, and replace α(i) [j] = 0 with β.
}
}
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 15 / 18
16. Applications
Sparse Coding
Image De-noise Shao-Chuan Wang
(Roth and Black,
Review of PCA
2009)
Introducing
Sparsity
Solving the
Optimization
Problem
Learning
Dictionary
Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
17. Applications
Sparse Coding
Image De-noise Shao-Chuan Wang
(Roth and Black,
Review of PCA
2009)
Introducing
Sparsity
Edge Detection (J.
Solving the
Marial et al., 2008) Optimization
Problem
Learning
Dictionary
Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
18. Applications
Sparse Coding
Image De-noise Shao-Chuan Wang
(Roth and Black,
Review of PCA
2009)
Introducing
Sparsity
Edge Detection (J.
Solving the
Marial et al., 2008) Optimization
Problem
Image In-painting Learning
(Roth and Black, Dictionary
2009) Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
19. Applications
Sparse Coding
Image De-noise Shao-Chuan Wang
(Roth and Black,
Review of PCA
2009)
Introducing
Sparsity
Edge Detection (J.
Solving the
Marial et al., 2008) Optimization
Problem
Image In-painting Learning
(Roth and Black, Dictionary
2009) Applications
Super-resolution
(Yang et al, 2008)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
20. Applications
Sparse Coding
Image De-noise Shao-Chuan Wang
(Roth and Black,
Review of PCA
2009)
Introducing
Sparsity
Edge Detection (J.
Solving the
Marial et al., 2008) Optimization
Problem
Image In-painting Learning
(Roth and Black, Dictionary
2009) Applications
Super-resolution
(Yang et al, 2008)
Signal Compression
(in replace of VQ
using K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
21. Bibliography I
Sparse Coding
Shao-Chuan Wang
H. Zou, T. Hastie, and R. Tibshirani,
Review of PCA
Sparse Principal Component Analysis. Journal of
Introducing
Computational and Graphical Statistics, 2004. Sparsity
Solving the
K. Kreutz-Delgado, J. F. Murray, B. D. Rao,K. Engan, Optimization
Problem
T.-W. Lee and T. J. Sejnowski,
Learning
Dictionary learning algorithms for sparse representation. Dictionary
Neural Computation, 2003. Applications
M. Aharon, M. Elad, and A. M. Bruckstein,
The K-SVD: An algorithm for designing of overcomplete
dictionaries for sparse representations. IEEE
Transactions on Signal Processing, November 2006.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 17 / 18
22. Bibliography II
Sparse Coding
Shao-Chuan Wang
S. Roth, M. J. Black
Fields of Experts. IJCV, 2009. Review of PCA
Introducing
J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Sparsity
Ponce, Solving the
Optimization
Discriminative Sparse Image Models for Class-Specific Problem
Edge Detection and Image Interpretation. ECCV 2008. Learning
Dictionary
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Applications
Online dictionary learning for sparse coding. ICML 2009.
J. Yang, J. Wright, T. Huang, Y. Ma,
Image Super-Resolution as Sparse Representation of
Raw Image Patches. CVPR 2008.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 18 / 18