SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
METHODS OF MANIFOLD LEARNING FOR 
DIMENSION REDUCTION OF 
LARGE DATA SETS 
Doctoral Candidacy Preliminary Oral Exam 
Ryan Bensussan Harvey 
May 17, 2010 
Committee: 
Wojciech Czaja, Chair 
Kasso Okoudjou 
John Benedetto 
Rama Chellappa 
1
PREVIEW 
• Motivation 
• Problem 
• Methods 
• Research Ideas Image by Stefan Baudy, used under Creative Commons license 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Preview 2
MOTIVATION 
• Science and business producing massive quantities of data 
• Computationally difficult to store, process, analyze, visualize 
• Academic focus on compression, dimension-reduced 
processing to address this problem 
• Compression methods widely available, but require 
decompression step to use data 
• Dimension-reduced processing generally not available 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Motivation 3
THE PROBLEM 
•What are we trying to do? 
•What is the intuition for 
this problem? 
• How can we formalize the 
problem mathematically? 
• On what kinds of data can 
we solve this problem? 
Image by qisur, used under Creative Commons license 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Problem Definition 4
PROBLEM INTUITION 
• Think flattening a 3D surface 
to a 2D image 
• Simple projection 
• Preserving some particular 
quantity of interest locally 
• Preserving some global 
property of the surface 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Problem Definition 5
THE PROBLEM (FORMALIZED) 
• Inputs: 
• Outputs: 
X = [x1, . . . ,xn], xk ∈ RD 
Y = [y1, . . . , yn], yk ∈ Rd, d  D 
M⊂ Rd 
• Assumption: data live on some manifold embedded 
in RD , and inputs X are samples taken in RD 
of the underlying 
manifold . 
M 
• Problem statement: Find a reduced representation Y of 
X 
which best preserves the manifold structure of the data, as 
defined by some metric of interest. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Problem Definition 6
An example from Molecular Dynamics, I 
EXAMPLES Text OF documents 
DATA SETS 
1000 Science News articles, from 8 different categories. We 
compute about 10000 coordinates, i-th coordinate of document 
d represents frequency in document d of the i-th word in a fixed 
dictionary. 
Data base of about 60, 000 28 × 28 gray-scale pictures of 
handwritten digits, Text 
collected by USPS. Point cloud in R282 . 
Goal: automatic recognition. 
Mauro Maggioni Geometry of data sets in high dimensions and learning Hyperspectral 
The dynamics of a small protein in a bath of water molecules is 
approximated by a Langevin system of stochastic equations 
x˙ = −∇U(x) + w˙ . 
Handwritten Digits 
10 
model in the form of Equation 3, we can synthesize new shapes through the walking 
cycle. In these examples only 10 samples were used to embed the manifold for half a 
cycle on a unit circle in 2D and to learn the model. Silhouettes at intermediate body 
configurations were synthesized (at the middle point between each two centers) using 
the learned model. The learned model can successfully interpolate shapes at intermedi-ate 
The set of states of the protein is a noisy set of points in R36. 
configurations (never seen in the learning) using only two-dimensional embedding. 
The figure shows results for three different peoples. 
Mauro Maggioni Analysis of High-dimensional Data Sets and Graphs 
Learn Mapping from 
Embedding to 3D 
Learn Nonlinear Mapping 
Manifold 
Embedding 
Visual 
input 
3D pose 
from Embedding 
to visual input 
Learn Nonlinear 
Manifold Embedding 
(a) Learning components 
Manifold 
Embedding 
(view based) 
Visual 
input 
3D pose 
Image 
Closed Form 
solution for 
inverse mapping 
Collections 
Error Criteria 
Manifold Selection 
View Determination 
3D pose 
interpolation 
(b) pose estimation. 
(c) Synthesis. 
Video 
Fig. 4. (a,b) Block diagram for the learning framework and 3D pose estimation. (c) Shape synthe-sis 
for three different people. First, third and fifth rows: samples used in learning. Second, fourth, 
sixth rows: interpolated shapes at intermediate configurations (never seen in the learning) 
Dimension Reduction - Given a visual input (silhouette), and the learned model, we can recover the intrinsic 
Ryan B Harvey - Prelim Oral Exam 
body configuration, recover the view point, and reconstruct the input and detect any 
spatial or temporal outliers. In other words, we can simultaneously solve for the pose, 
view point, and reconstruct the input. A block diagram for recovering 3D pose and 
view point given learned manifold models are shown in Figure 4. The framework [20] 
Molecular 
Dynamics 
Set of 10, 000 picture (28 by 28 pixels) of 10 handwritten digits. Color represents the label (digit) of each point. 
Problem Definition 7
METHODS: TAXONOMY 
• Methods considered involve 
convex optimizations solved 
via eigenvalue problems 
• Full-rank: PCA, Kernel PCA 
• Sparse: LLE, Laplacian 
Eigenmaps 
Dimension 
Reduction 
Convex Non-convex 
Full-Rank Sparse 
Linear: PCA 
Non-linear: k-PCA 
Reconstruction 
Weights: LLE 
Neighborhood 
Graph Laplacian: LE Methods Introduction 8 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam
METHODS: FRAMEWORK 
• 3 step algorithm framework: 
• Build the kernel matrix 
• Solve the appropriate 
eigenvalue problem 
associated with that kernel 
• Use eigenvectors to 
compute the embedding 
in the lower dimension 
• Some methods: 
• Principal Components 
Analysis 
• Kernel-based Principal 
Components Analysis 
• Laplacian Eigenmaps 
• Locally Linear Embedding 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Methods Introduction 9
PRINCIPAL COMPONENTS 
ANALYSIS 
• Linear method: rotation, 
translation, simple scaling 
• Think SNR: maximize signal 
while minimizing noise 
• Rotate and translate axes so 
that signal variances lie on as 
few axes as possible 
• The kernel: 
C = 1n 
n 
j=1 xjxTj 
• The eigenvalue problem: 
λp = Cp 
PTΛ = CPT 
• The embedding: {λk}d+1 
k=2, 
1 ≥ λ1 ≥ · · · ≥ λD ≥ 0 
Y = P{k}X 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Principal Components Analysis 10
PRINCIPAL COMPONENTS 
ANALYSIS 
EVP 
−20 
−10 
0 
10 
20 
−20 
50 
40 
30 
20 
10 
0 
15 
10 
5 
0 
−5 
−10 
−15 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Principal Components Analysis 11 
−10 
0 
10 
20 
−20 −15 −10 −5 0 5 10 15 20
PCA USING DOT PRODUCTS 
• To move from (linear) PCA to (nonlinear) Kernel-based PCA 
(Schölkopf, Smola  Müller, 1998), we consider a formulation 
of PCA exclusively using dot products: 
λp = Cp 
Cp = 1n 
n 
i=1(xj · p)xj 
λ(xk · p) = (xk · Cp) 
p λ= 0 span(x1, x2, . . . ,xn) 
• All solutions with lie in . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 12
INTRODUCING 
NONLINEARITY IN PCA 
•We then introduce nonlinearity by mapping from the input 
space to the feature space : 
F 
Φ : RD → F 
x→ ˜x = Φ(x) 
Φ(xk) F 
• For now, assume the data in is centered: 
n 
k=1 Φ(xk) = 0 
• Then, the covariance matrix in is: 
F 
˜ C = 1n 
n 
k=1 Φ(xj)Φ(xj)T = 1n 
n 
k=1  xj  xj 
T 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 13
THE EIGENPROBLEM IN 
F 
•We now rewrite the eigenvalue problem of PCA in : 
λ˜p = ˜ C˜p 
λ(Φ(xk) · ˜p) = (Φ(xk) · ˜ C˜p), ∀k = 1, . . . ,n 
p˜ λ= 0 span(x˜1, x˜2, . . . , x˜n) 
• Again, all with lie in . 
• In addition, we can write the linear expansion: 
F 
˜p = 
n 
j=1 αjΦ(xi) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 14
THE EIGENPROBLEM IN 
(CONTINUED) 
F 
• Using this expansion, we rewrite the dot product formulation 
of the PCA eigenproblem in : 
F 
λ 
n 
j=1 αj(˜xk · ˜xj) = 1n 
n 
i=1 αi(˜xk · 
∀k = 1, . . . ,n 
•We define an kernel matrix by 
n 
j=1 ˜xj)(˜xj · ˜xi) 
n × n K 
Kij = (Φ(xi) · Φ(xj)) = (˜xi · ˜xj) 
• And rewrite the eigenproblem in matrix form: 
nλKα = K2α 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 15
COMPUTING IN FEATURE 
SPACE 
• The feature space F 
is of arbitrarily large and possibly infinite 
dimension. Computing dot products directly is often not 
possible, and computationally impractical when it is. 
• Solution: the “kernel trick” (Aizerman et al, 1964). Construct a 
kernel function 
k(u, v) = (Φ(u) · Φ(v)) 
(Φ(u) · Φ(v)) k(u, v) 
• Then replace each with . This 
construction implicitly defines and via . 
Φ F k(u, v) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 16
SOME POSSIBLE KERNELS 
• Kernels must be continuous, symmetric, positive semi-definite. 
• Some possible kernels proposed by Schölkopf et al include: 
dth 
• Dot product in the space of all order monomials: 
k(u, v) = (u · v)d 
k(u, v) = exp 
• Radial basis functions: 
• Sigmoid functions: 
 
−||u−v||2 
2σ2 
 
k(u, v) = tanh(κ(u · v) +Θ) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 17
SOLVING THE EIGENPROBLEM 
nλKα = K2α 
• To solve the eigenproblem where 
, we solve the following: 
Kij = k(xi, xj) 
nλα = Kα 
• Solutions are identical to all relevant solutions to the prior, as 
can be seen by expanding in the eigenvector basis of . 
α K 
0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λn 
• Let be the complete set of 
eigenvalues nλ and α1,α2, . . . ,αn 
the corresponding 
eigenvectors, with the first nonzero eigenvalue. 
λq 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 18
COMPUTING THE 
EMBEDDING 
•We normalize by requiring that the corresponding 
vectors in be normalized: 
λq, . . . ,λn 
(˜pk · ˜pk) = 1, ∀k = q, . . . , n 
F 
• This translates to a normalization condition for : 
j (˜xi · ˜xj) = 
i αk 
• Compute projections of a test point onto eigenvectors : 
j (Φ(xj) · Φ(x)) = 
j k(xj, x) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 19 
αq, . . . ,αn 
1 = 
n 
i,j=1 αk 
i αk 
n 
i,j=1 αk 
jKij 
= (αk ·Kαk) = λk(αk · αk) 
x ˜pk 
(˜pk · Φ(x)) = 
n 
j=1 αk 
n 
j=1 αk
CENTERING THE DATA 
•We assumed centered data in F 
, which is unrealistic. To center 
our data, we must have: 
Φc(xj) = ˜xcj 
= ˜xj − 1n 
• Then we rewrite everything in terms of , and thus have 
a new kernel , which we will express in terms of : 
ij = (˜xci · ˜xcj 
where . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 20 
n 
k=1 ˜xk 
Φc(xj) 
Kc K 
Kc 
) = 
 
˜xi − 1n 
n 
k=1 ˜xk 
 
· 
 
˜xj − 1n 
n 
=1 ˜x 
 
= (K − 1nK − K1n + 1nK1n)ij 
(1n)ij = 1n 
, ∀i, j
KERNEL-BASED PCA 
• Extend linear PCA to 
nonlinear space via kernel 
transformation 
• Think SNR where signal lies 
along a curve in space 
• Rotate/translate transformed 
axes so signal variances lie 
on as few axes as possible 
• The kernel: 
Kij = (˜xi · ˜xj) = k(xi, xj) 
Φ : RD → F 
x→ ˜x 
• The eigenvalue problem: 
• The embedding: 
j k(xi, x) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 21 
(nλ)α = Kα 
(˜pk · ˜x) = 
n 
j=1 αk
KERNEL-BASED PCA 
FEATURE SPACE 
, DEFINED BY 
KERNEL PCA 
(EVP) 
F 
Φ k(u, v) 
B 
B 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Kernel-based PCA 22 
C 
D A 
C 
A 
A B C B 
D 
A 
C 
B 
C 
D A 
Images from (3) 
Figure 3. Embeddings from kernel on the Swiss roll and
LAPLACIAN EIGENMAPS 
•While it provides nonlinearity, kernel-based PCA requires 
computation dependent on number of points , rather than 
the often smaller dimension of each point . 
•We thus consider Laplacian Eigenmaps (Belkin  Niyogi, 2003) 
which introduces sparsity in the kernel. 
• This method has been shown to be a special case of Kernel-based 
PCA by Bengio et al (2004). 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 23 
n 
D
INTRODUCING SPARSITY 
• To build a sparse kernel, we build a graph from the data which 
samples the assumed manifold M . The adjacency matrix is 
built by taking either a fixed number m 
of nearest neighbors 
to, or by selecting all points within an -ball of, a given point as 
the point’s nearest neighbors. 
•We denote the set of nearest neighbors of a point by . 
The adjacency matrix is then given by: 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 24 
ε 
xj Nj 
A 
Aij = 
 
1, if xi ∈ Nj 
0, if xi /∈ Nj
BUILDING THE KERNEL 
•We then introduce edge weights in the graph. The heat kernel 
is chosen due to its connection to the Laplace Beltrami 
operator on the manifold and therefore to the graph 
approximation of the manifold Laplacian: 
Wij = 
 
exp 
 
−||xi−xj ||2 
t 
0, otherwise 
t→∞,W → A 
• Note that as . 
 
, xi ∈ Nj or xj ∈ Ni 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 25
CONSTRUCTING THE 
EIGENVALUE PROBLEM 
• To understand what eigenvalue problem to solve here, we 
must consider the optimization problem. 
• First, we think of mapping the graph in a 
simplistic sense to a line such that connected points stay 
as close together as possible. 
• This gives the objective function: 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 26 
G(X,E,W) 
1D y 
12 
 
i 
 
j(yi − yj)2Wij
CONSTRUCTING THE 
EIGENVALUE PROBLEM 
• Here, we introduce the diagonal matrix , where 
Dii = 
 
D 
and the graph Laplacian matrix , and note that 
is symmetric, which allows us to rewrite the objective 
function in matrix-vector form: 
i + y2 
j − 2yiyj)Wij 
iDii + 12 
i + y2 
j − 2yiyj)Wij 
W 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 27 
j Wji 
L = D −W 
12 
 
i 
 
j(yi − yj)2Wij = 12 
 
i 
 
j(y2 
= 12 
 
i y2 
 
j y2 
jDjj − 
 
i 
 
j(y2 
= yTDy − yTWy = yT Ly
CONSTRUCTING THE 
EIGENVALUE PROBLEM 
• So the relevant optimization problem in the case 
becomes: 
argmin yT Ly 
y 
yTDy = 1 
• This problem can be solved by solving the generalized 
eigenvalue problem for the minimum eigenvalues. 
• Note that the computation on the previous slide also shows 
that is positive semi-definite. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 28 
1D 
L 
Ly = λDy
CONSTRUCTING THE 
EIGENVALUE PROBLEM 
• Extending the same argument to , with 
and , we need to minimize the objective 
function: 
giving the minimization 
F 
FTDF = I 
• This problem can also be solved by solving the generalized 
eigenvalue problem for the minimum eigenvalues. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 29 
f ∈ Rd F ∈ Rn × Rd 
f (i) = [f (i) 
1 , . . . , f (i) 
d ]T 
 
i 
 
j ||f (i) − f (j)||2Wij = tr(FTLF) 
argmin tr(FTLF) 
Lf = λDf
THE EIGENVALUE PROBLEM 
AND THE EMBEDDING 
• Thus, we solve for the minimum nonzero eigenvalue solution 
of the generalized eigenvalue problem 
Lf = λDf 
•We then order the eigenvalues 
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 
and construct the embedding from the first corresponding 
eigenvectors (leaving out the zero eigenvector), giving the 
embedding 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 30 
d 
xi → yi = (f (i) 
1 , . . . , f (i) 
d )
LAPLACIAN EIGENMAPS 
• Move away from PCA’s full-matrix 
computations toward 
a graph sampling of the 
manifold which allows for a 
sparse kernel matrix 
• Point-to-point metric locally 
applied to preserve 
distances on the manifold 
between points 
• The kernel: 
or 
• The eigenvalue problem: 
• The embedding: 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 31 
Wij = exp 
 
−||xi−xj ||2 
t 
 
xi ∈ Nj xj ∈ Ni 
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 
xi → yi = (f (i) 
1 , . . . , f (i) 
) 
d Lf =  
λDf 
Dii = 
j Wji 
L = D −W
LAPLACIAN EIGENMAPS 
15 
10 
5 
0 
G(X,E,W) 
−5 
−10 
NN, W EVP 
15 
10 
5 
0 
−5 
−10 
−10 −5 0 5 10 15 
0 
50 
100 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Laplacian Eigenmaps 32 
−15 
−10 −5 0 5 10 15 
0 
50 
100 
−15 
N = 5 t = 5.0 N = 10 t = 5.0 N = 15 t = 5.0 
N = 5 t = 25.0 N = 10 t = 25.0 N = 15 t = 25.0 
N = 5 t = ! N = 10 t = ! N = 15 t = ! 
Images from (1)
AN ALTERNATIVE VIEW OF 
LAPLACIAN EIGENMAPS 
• Although theoretically sound and using sparsity, Laplacian 
Eigenmaps is intuitively difficult to understand. 
• Belkin  Niyogi (2003) show that Locally Linear Embedding 
(Roweis  Saul, 2000), which has a more intuitive geometric 
construction, is approximately equivalent under certain 
conditions. 
•We will develop the LLE method, then briefly sketch the 
argument for approximate equivalence. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 33
LOCALLY LINEAR EMBEDDING 
•We construct the graph sampling the manifold in the same 
way as before, by finding nearest neighbors of each point. 
•Weights for the matrix are selected by assuming that local 
neighborhoods of points are nearly linear, and solving a 
minimization problem with the cost function: 
where . 
 
i 
 
xi − 
 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 34 
j Wijxij 
 
2 
xij ∈ Ni
FINDING THE WEIGHTS 
•Weights solving this minimization can be found via a closed-form 
expression as follows: 
(1) Compute neighbor correlation matrices (and inverses): 
Cjk = xij · xik, xij ∈ Ni,Xik ∈ Ni 
(2) Compute Lagrange multiplier (sum-to-one constraint): 
λ = αβ 
P 
= 1− 
j 
(3) Compute reconstruction weights: 
jk (xi · xik + λ) 
• Nearly singular can be preconditioned prior to computing. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 35 
P 
k C−1 
jk (xi·xik ) 
P 
j 
P 
k C−1 
jk 
Wij = 
 
k C−1 
Cjk
THE EMBEDDING 
• To find the embedding, we minimize the same form, this time 
over the embedding coordinates with fixed weights: 
argmin 
subject to constraints: 
• Centering: 
 
i 
 
yi − 
• Unit covariance (to avoid degenerate solutions): 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 36 
 
j Wijyj 
 
2 
y 
 
i yi = 0 
1n 
 
i yi ⊗ yi = I
COMPUTING THE 
EMBEDDING 
• To compute solutions to the minimization, we introduce the 
sparse matrix defined by 
Eij = δij −Wij −Wji + 
E = (I −W)T (I −W) 
•We then solve for eigenpairs of and take as the embedding 
the eigenvectors corresponding to the lowest eigenvalues, 
excluding the zero eigenvalue. 
• Note that is symmetric and positive semi-definite. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 37 
 
kWkiWkj 
E 
E 
E 
d
LOCALLY LINEAR EMBEDDING 
• Another method which 
considers metrics used to 
weight a graph sampling the 
manifold 
•Weights computed by global 
linear optimization over a 
local neighborhood around 
each point 
• The kernel: 
jk (xi · xik + λ) 
• The eigenvalue problem: 
• The embedding: 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 38 
Wij = 
 
k C−1 
Ef = λf 
E = (I −W)T (I −W) 
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 
xi → yi = (f (i) 
1 , . . . , f (i) 
) 
d
CONNECTION TO THE 
GRAPH LAPLACIAN 
12 
•We will show in three steps that for a function (under 
appropriate assumptions) Ef ≈ L2f 
: 
(1) Fix a point and show that: 
xi 
[(I −W)f]i ≈ −12 
 
f ∈M 
j Wij(xi − xij )TH(xi − xij ) 
where H is the Hessian of f at xi 
. 
(2) Show that the expectation . 
(3) Put steps (1) and (2) together to achieve the final result. 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 39 
E[vTHv] = rLf
• Show: 
CONNECTION TO THE 
GRAPH LAPLACIAN (1) 
[(I −W)f]i ≈ −12 
 
j Wij(xi − xij )TH(xi − xij ) 
• Consider a coordinate system in the tangent plane centered at 
vj = xij − xi 
o = xi o 
and let . This is a vector originating at . 
αj = Wij xi 
• Let . Since is in the affine span of its neighbors 
(and by construction of W 
), we have 
where . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 40 
 
j αj = 1 
o = xi = 
 
j αjvj
CONNECTION TO THE 
GRAPH LAPLACIAN (1) 
• Assuming f 
is sufficiently smooth, we write the 2nd order 
Taylor approximation 
f(v) = f(o) + vT∇f + 12 
where is the gradient and is the Hessian, both evaluated 
at . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 41 
(vTHv) + o(||v||2) 
o ∇f H
CONNECTION TO THE 
GRAPH LAPLACIAN (1) 
[(I −W)f]i = f(o) − 
•We have , and using Taylor’s 
approximation for , we can write 
j ∇f + 12 
j Hvj) 
• Since and , the first three terms 
disappear, and 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 42 
 
j αjf(vj) 
f(vj) 
[(I −W)f]i = f(o) − 
 
j αjf(vj) 
≈ f(o) − 
 
j αjf(o) − 
 
j αjvT 
 
j αj(vT 
 
j αj = 1 
 
j αjvj = o 
[(I −W)f]i = f(o) − 
 
j αjf(vj) ≈ −12 
 
j αjvT 
j Hvj
CONNECTION TO THE 
GRAPH LAPLACIAN (2) 
vTHv Lf 
• Show: is proportional to 
√αjvj  
• If form an orthonormal basis (unusual), then 
j Hvj = tr(H) = Lf 
j WijvT 
• If not, we assume x 
to be a random vector with uniform 
distribution on every sphere centered at , and show 
proportionality. 
• Let be an orthonormal basis for corresponding 
to eigenvalues . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 43 
xi 
e1, . . . , en H 
λ1, . . . ,λn
CONNECTION TO THE 
GRAPH LAPLACIAN (2) 
• Then using the Spectral theorem, we can write 
E[vTHv] = E 
 
• Since is independent of , we can replace 
to get 
i 
 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 44 
i λiv, ei2 
 
E 
 
v, ei2 
 
E 
 
v, ei2 
 
= r 
E[vTHv] = r ( 
i λi) = rtr(H) = rLf
CONNECTION TO THE 
GRAPH LAPLACIAN (3) 
• Now, putting these together, we have 
[(I −W)f]i ≈ −12 
• LLE minimizes which reduces to 
finding eigenfunctions of , which can now 
be interpreted as finding eigenfunctions of the iterated 
Laplacian . Eigenfunctions of coincide with those of . 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 45 
 
j WijvT 
j Hvj 
E[vTHv] = rLf 
(I −W)T (I −W)f ≈ 12 
L2f 
fT (I −W)T (I −W)f 
(I −W)T (I −W) 
L2 L L 2
LOCALLY LINEAR EMBEDDING 
• Another method which 
considers metrics used to 
weight a graph sampling the 
manifold 
•Weights computed by global 
linear optimization over a 
local neighborhood around 
each point 
• The kernel: 
jk (xi · xik + λ) 
• The eigenvalue problem: 
• The embedding: 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 46 
Wij = 
 
k C−1 
Ef = λf 
E = (I −W)T (I −W) 
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 
xi → yi = (f (i) 
1 , . . . , f (i) 
) 
d
documents of text. 
coordinates as observed modes of variability. 
Previous approaches to this problem, based on 
multidimensional scaling (MDS) (2), have 
computed embeddings that attempt to preserve 
pairwise distances [or generalized disparities 
Reconstruction errors are measured 
by the cost function 
!W# !!iX!i$%jWij X!j2 
LOCALLY LINEAR EMBEDDING 
G(X,E,W) 
between data points; these distances are 
measured along straight lines or, in more so-phisticated 
usages of MDS such as Isomap (4), 
NN, MIN EVP 
Images from (6) 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Locally Linear Embedding 47 
linear reconstruc-tions, 
manifolds, such as 
modes of variability. 
this problem, based on 
(MDS) (2), have 
that attempt to preserve 
generalized disparities 
these distances are 
lines or, in more so-phisticated 
such as Isomap (4), 
these patches by linear coefficients that 
reconstruct each data point from its neigh-bors. 
Reconstruction errors are measured 
by the cost function 
!W# !!i 
X!i$%jWij X!j2 
(1) 
which adds up the squared distances between 
all the data points and their reconstructions. The 
weights Wij summarize the contribution of the 
jth data point to the ith reconstruction. To com-pute 
the weights Wij, we minimize the cost 
nonlinear dimensionality reduction, as illustrated (10) for three-dimensional 
two-dimensional manifold (A). An unsupervised learning algorithm must 
coordinates of the manifold without signals that explicitly indicate how 
(1) 
which adds up the squared distances between 
all the data points and their reconstructions. The 
weights Wij summarize the contribution of the 
jth data point to the ith reconstruction. To com-pute 
the weights Wij, we minimize the cost 
The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional 
B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must
COMPUTATIONAL 
COMPLEXITY OF METHODS 
Method Computational 
Cost 
Memory 
Usage 
O(D3) O(D2) 
O(n3) O(n2) 
O(ξn2) O(ξn2) 
O(ξn2) O(ξn2) 
PCA 
k-PCA 
LLE 
LE 
is the sparsity ratio of the kernel matrix: the number of non-zero 
elements divided by the total number of elements in the kernel matrix 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Comparison 48 
ξ
BENEFITS AND LIMITATIONS 
Method Benefits Limitations 
PCA Fast, simple Linear 
k-PCA Kernel choice 
Computation, 
kernel selection 
LE 
Sparse kernel, 
justification 
Nearest neighbor 
search 
LLE 
Sparse kernel, 
direct solution 
Nearest neighbor 
search 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Comparison 49
RESEARCH IDEAS 
• Guiding principles 
• Software should be built 
for modular use within a 
framework and library 
• Software should be 
validated with real known 
data associated with 
“ground truth” 
• Research directions 
• Landmarks, out-of-sample 
extensions, low-rank 
update iterative methods 
• Hybridization of methods 
and ideas 
• Extension to higher order 
graph properties 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Research Ideas 50
SOFTWARE LIBRARY 
• Comprehensive testbed software library and experimentation 
framework is needed to support manifold learning research 
• Must be modular, extensible, platform agnostic 
• Interpreted/scriptable languages are a good choice for 
experimentation: Python, MATLAB, Boo, IDL 
• Previous efforts: 
• DRToolbox (MATLAB, 2007-) by van der Maaten 
• scikit.learn (Python, 2009-) by Matthieu Brucher 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
Research Ideas 51
1) Belkin, M. and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, 
Neural Comp. 15 (2003),1373-1396. 
2) Schölkopf, B., A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, 
Neural Comp. 10 (1998), 1299-1319. 
3)Weinberger K. K., B. D. Packer, and L. K. Saul, Nonlinear Dimensionality Reduction by Semidefinite 
Programming and Kernel Matrix Factorization, Proc AI and Statistics (Dec 2005), 381-388. 
4)Golub, G. H. and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press 
(Baltimore, 1996), 70-75. 
5) Roweis, S. T. and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 
290 (2000), 2323-2326. 
6) Shlens, J, A Tutorial on Principal Component Analysis, Version 2 (Dec 2005). 
7) van der Maaten, L., E. Postma and J. van der Herik, Dimensionality Reduction: A Comparative Review, 
TiCC TR 2009-005 (Oct 2009). 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
References 52
QUESTIONS? IDEAS? 
THANK YOU! 
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 
53

Contenu connexe

Tendances

Tendances (20)

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernels
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 

En vedette

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
wl820609
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)
Akisato Kimura
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
Shintaro Fukushima
 

En vedette (13)

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsWSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
LDA入門
LDA入門LDA入門
LDA入門
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
 
CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺について
 

Similaire à Methods of Manifold Learning for Dimension Reduction of Large Data Sets

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
Kumar
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
SantiagoGarridoBulln
 

Similaire à Methods of Manifold Learning for Dimension Reduction of Large Data Sets (20)

Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Review DRCN
Review DRCNReview DRCN
Review DRCN
 
EPFL_presentation
EPFL_presentationEPFL_presentation
EPFL_presentation
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDA
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 

Dernier

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Dernier (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Methods of Manifold Learning for Dimension Reduction of Large Data Sets

  • 1. METHODS OF MANIFOLD LEARNING FOR DIMENSION REDUCTION OF LARGE DATA SETS Doctoral Candidacy Preliminary Oral Exam Ryan Bensussan Harvey May 17, 2010 Committee: Wojciech Czaja, Chair Kasso Okoudjou John Benedetto Rama Chellappa 1
  • 2. PREVIEW • Motivation • Problem • Methods • Research Ideas Image by Stefan Baudy, used under Creative Commons license Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Preview 2
  • 3. MOTIVATION • Science and business producing massive quantities of data • Computationally difficult to store, process, analyze, visualize • Academic focus on compression, dimension-reduced processing to address this problem • Compression methods widely available, but require decompression step to use data • Dimension-reduced processing generally not available Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Motivation 3
  • 4. THE PROBLEM •What are we trying to do? •What is the intuition for this problem? • How can we formalize the problem mathematically? • On what kinds of data can we solve this problem? Image by qisur, used under Creative Commons license Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 4
  • 5. PROBLEM INTUITION • Think flattening a 3D surface to a 2D image • Simple projection • Preserving some particular quantity of interest locally • Preserving some global property of the surface Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 5
  • 6. THE PROBLEM (FORMALIZED) • Inputs: • Outputs: X = [x1, . . . ,xn], xk ∈ RD Y = [y1, . . . , yn], yk ∈ Rd, d D M⊂ Rd • Assumption: data live on some manifold embedded in RD , and inputs X are samples taken in RD of the underlying manifold . M • Problem statement: Find a reduced representation Y of X which best preserves the manifold structure of the data, as defined by some metric of interest. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 6
  • 7. An example from Molecular Dynamics, I EXAMPLES Text OF documents DATA SETS 1000 Science News articles, from 8 different categories. We compute about 10000 coordinates, i-th coordinate of document d represents frequency in document d of the i-th word in a fixed dictionary. Data base of about 60, 000 28 × 28 gray-scale pictures of handwritten digits, Text collected by USPS. Point cloud in R282 . Goal: automatic recognition. Mauro Maggioni Geometry of data sets in high dimensions and learning Hyperspectral The dynamics of a small protein in a bath of water molecules is approximated by a Langevin system of stochastic equations x˙ = −∇U(x) + w˙ . Handwritten Digits 10 model in the form of Equation 3, we can synthesize new shapes through the walking cycle. In these examples only 10 samples were used to embed the manifold for half a cycle on a unit circle in 2D and to learn the model. Silhouettes at intermediate body configurations were synthesized (at the middle point between each two centers) using the learned model. The learned model can successfully interpolate shapes at intermedi-ate The set of states of the protein is a noisy set of points in R36. configurations (never seen in the learning) using only two-dimensional embedding. The figure shows results for three different peoples. Mauro Maggioni Analysis of High-dimensional Data Sets and Graphs Learn Mapping from Embedding to 3D Learn Nonlinear Mapping Manifold Embedding Visual input 3D pose from Embedding to visual input Learn Nonlinear Manifold Embedding (a) Learning components Manifold Embedding (view based) Visual input 3D pose Image Closed Form solution for inverse mapping Collections Error Criteria Manifold Selection View Determination 3D pose interpolation (b) pose estimation. (c) Synthesis. Video Fig. 4. (a,b) Block diagram for the learning framework and 3D pose estimation. (c) Shape synthe-sis for three different people. First, third and fifth rows: samples used in learning. Second, fourth, sixth rows: interpolated shapes at intermediate configurations (never seen in the learning) Dimension Reduction - Given a visual input (silhouette), and the learned model, we can recover the intrinsic Ryan B Harvey - Prelim Oral Exam body configuration, recover the view point, and reconstruct the input and detect any spatial or temporal outliers. In other words, we can simultaneously solve for the pose, view point, and reconstruct the input. A block diagram for recovering 3D pose and view point given learned manifold models are shown in Figure 4. The framework [20] Molecular Dynamics Set of 10, 000 picture (28 by 28 pixels) of 10 handwritten digits. Color represents the label (digit) of each point. Problem Definition 7
  • 8. METHODS: TAXONOMY • Methods considered involve convex optimizations solved via eigenvalue problems • Full-rank: PCA, Kernel PCA • Sparse: LLE, Laplacian Eigenmaps Dimension Reduction Convex Non-convex Full-Rank Sparse Linear: PCA Non-linear: k-PCA Reconstruction Weights: LLE Neighborhood Graph Laplacian: LE Methods Introduction 8 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam
  • 9. METHODS: FRAMEWORK • 3 step algorithm framework: • Build the kernel matrix • Solve the appropriate eigenvalue problem associated with that kernel • Use eigenvectors to compute the embedding in the lower dimension • Some methods: • Principal Components Analysis • Kernel-based Principal Components Analysis • Laplacian Eigenmaps • Locally Linear Embedding Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Methods Introduction 9
  • 10. PRINCIPAL COMPONENTS ANALYSIS • Linear method: rotation, translation, simple scaling • Think SNR: maximize signal while minimizing noise • Rotate and translate axes so that signal variances lie on as few axes as possible • The kernel: C = 1n n j=1 xjxTj • The eigenvalue problem: λp = Cp PTΛ = CPT • The embedding: {λk}d+1 k=2, 1 ≥ λ1 ≥ · · · ≥ λD ≥ 0 Y = P{k}X Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Principal Components Analysis 10
  • 11. PRINCIPAL COMPONENTS ANALYSIS EVP −20 −10 0 10 20 −20 50 40 30 20 10 0 15 10 5 0 −5 −10 −15 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Principal Components Analysis 11 −10 0 10 20 −20 −15 −10 −5 0 5 10 15 20
  • 12. PCA USING DOT PRODUCTS • To move from (linear) PCA to (nonlinear) Kernel-based PCA (Schölkopf, Smola Müller, 1998), we consider a formulation of PCA exclusively using dot products: λp = Cp Cp = 1n n i=1(xj · p)xj λ(xk · p) = (xk · Cp) p λ= 0 span(x1, x2, . . . ,xn) • All solutions with lie in . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 12
  • 13. INTRODUCING NONLINEARITY IN PCA •We then introduce nonlinearity by mapping from the input space to the feature space : F Φ : RD → F x→ ˜x = Φ(x) Φ(xk) F • For now, assume the data in is centered: n k=1 Φ(xk) = 0 • Then, the covariance matrix in is: F ˜ C = 1n n k=1 Φ(xj)Φ(xj)T = 1n n k=1 xj xj T Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 13
  • 14. THE EIGENPROBLEM IN F •We now rewrite the eigenvalue problem of PCA in : λ˜p = ˜ C˜p λ(Φ(xk) · ˜p) = (Φ(xk) · ˜ C˜p), ∀k = 1, . . . ,n p˜ λ= 0 span(x˜1, x˜2, . . . , x˜n) • Again, all with lie in . • In addition, we can write the linear expansion: F ˜p = n j=1 αjΦ(xi) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 14
  • 15. THE EIGENPROBLEM IN (CONTINUED) F • Using this expansion, we rewrite the dot product formulation of the PCA eigenproblem in : F λ n j=1 αj(˜xk · ˜xj) = 1n n i=1 αi(˜xk · ∀k = 1, . . . ,n •We define an kernel matrix by n j=1 ˜xj)(˜xj · ˜xi) n × n K Kij = (Φ(xi) · Φ(xj)) = (˜xi · ˜xj) • And rewrite the eigenproblem in matrix form: nλKα = K2α Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 15
  • 16. COMPUTING IN FEATURE SPACE • The feature space F is of arbitrarily large and possibly infinite dimension. Computing dot products directly is often not possible, and computationally impractical when it is. • Solution: the “kernel trick” (Aizerman et al, 1964). Construct a kernel function k(u, v) = (Φ(u) · Φ(v)) (Φ(u) · Φ(v)) k(u, v) • Then replace each with . This construction implicitly defines and via . Φ F k(u, v) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 16
  • 17. SOME POSSIBLE KERNELS • Kernels must be continuous, symmetric, positive semi-definite. • Some possible kernels proposed by Schölkopf et al include: dth • Dot product in the space of all order monomials: k(u, v) = (u · v)d k(u, v) = exp • Radial basis functions: • Sigmoid functions: −||u−v||2 2σ2 k(u, v) = tanh(κ(u · v) +Θ) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 17
  • 18. SOLVING THE EIGENPROBLEM nλKα = K2α • To solve the eigenproblem where , we solve the following: Kij = k(xi, xj) nλα = Kα • Solutions are identical to all relevant solutions to the prior, as can be seen by expanding in the eigenvector basis of . α K 0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λn • Let be the complete set of eigenvalues nλ and α1,α2, . . . ,αn the corresponding eigenvectors, with the first nonzero eigenvalue. λq Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 18
  • 19. COMPUTING THE EMBEDDING •We normalize by requiring that the corresponding vectors in be normalized: λq, . . . ,λn (˜pk · ˜pk) = 1, ∀k = q, . . . , n F • This translates to a normalization condition for : j (˜xi · ˜xj) = i αk • Compute projections of a test point onto eigenvectors : j (Φ(xj) · Φ(x)) = j k(xj, x) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 19 αq, . . . ,αn 1 = n i,j=1 αk i αk n i,j=1 αk jKij = (αk ·Kαk) = λk(αk · αk) x ˜pk (˜pk · Φ(x)) = n j=1 αk n j=1 αk
  • 20. CENTERING THE DATA •We assumed centered data in F , which is unrealistic. To center our data, we must have: Φc(xj) = ˜xcj = ˜xj − 1n • Then we rewrite everything in terms of , and thus have a new kernel , which we will express in terms of : ij = (˜xci · ˜xcj where . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 20 n k=1 ˜xk Φc(xj) Kc K Kc ) = ˜xi − 1n n k=1 ˜xk · ˜xj − 1n n =1 ˜x = (K − 1nK − K1n + 1nK1n)ij (1n)ij = 1n , ∀i, j
  • 21. KERNEL-BASED PCA • Extend linear PCA to nonlinear space via kernel transformation • Think SNR where signal lies along a curve in space • Rotate/translate transformed axes so signal variances lie on as few axes as possible • The kernel: Kij = (˜xi · ˜xj) = k(xi, xj) Φ : RD → F x→ ˜x • The eigenvalue problem: • The embedding: j k(xi, x) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 21 (nλ)α = Kα (˜pk · ˜x) = n j=1 αk
  • 22. KERNEL-BASED PCA FEATURE SPACE , DEFINED BY KERNEL PCA (EVP) F Φ k(u, v) B B Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 22 C D A C A A B C B D A C B C D A Images from (3) Figure 3. Embeddings from kernel on the Swiss roll and
  • 23. LAPLACIAN EIGENMAPS •While it provides nonlinearity, kernel-based PCA requires computation dependent on number of points , rather than the often smaller dimension of each point . •We thus consider Laplacian Eigenmaps (Belkin Niyogi, 2003) which introduces sparsity in the kernel. • This method has been shown to be a special case of Kernel-based PCA by Bengio et al (2004). Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 23 n D
  • 24. INTRODUCING SPARSITY • To build a sparse kernel, we build a graph from the data which samples the assumed manifold M . The adjacency matrix is built by taking either a fixed number m of nearest neighbors to, or by selecting all points within an -ball of, a given point as the point’s nearest neighbors. •We denote the set of nearest neighbors of a point by . The adjacency matrix is then given by: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 24 ε xj Nj A Aij = 1, if xi ∈ Nj 0, if xi /∈ Nj
  • 25. BUILDING THE KERNEL •We then introduce edge weights in the graph. The heat kernel is chosen due to its connection to the Laplace Beltrami operator on the manifold and therefore to the graph approximation of the manifold Laplacian: Wij = exp −||xi−xj ||2 t 0, otherwise t→∞,W → A • Note that as . , xi ∈ Nj or xj ∈ Ni Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 25
  • 26. CONSTRUCTING THE EIGENVALUE PROBLEM • To understand what eigenvalue problem to solve here, we must consider the optimization problem. • First, we think of mapping the graph in a simplistic sense to a line such that connected points stay as close together as possible. • This gives the objective function: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 26 G(X,E,W) 1D y 12 i j(yi − yj)2Wij
  • 27. CONSTRUCTING THE EIGENVALUE PROBLEM • Here, we introduce the diagonal matrix , where Dii = D and the graph Laplacian matrix , and note that is symmetric, which allows us to rewrite the objective function in matrix-vector form: i + y2 j − 2yiyj)Wij iDii + 12 i + y2 j − 2yiyj)Wij W Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 27 j Wji L = D −W 12 i j(yi − yj)2Wij = 12 i j(y2 = 12 i y2 j y2 jDjj − i j(y2 = yTDy − yTWy = yT Ly
  • 28. CONSTRUCTING THE EIGENVALUE PROBLEM • So the relevant optimization problem in the case becomes: argmin yT Ly y yTDy = 1 • This problem can be solved by solving the generalized eigenvalue problem for the minimum eigenvalues. • Note that the computation on the previous slide also shows that is positive semi-definite. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 28 1D L Ly = λDy
  • 29. CONSTRUCTING THE EIGENVALUE PROBLEM • Extending the same argument to , with and , we need to minimize the objective function: giving the minimization F FTDF = I • This problem can also be solved by solving the generalized eigenvalue problem for the minimum eigenvalues. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 29 f ∈ Rd F ∈ Rn × Rd f (i) = [f (i) 1 , . . . , f (i) d ]T i j ||f (i) − f (j)||2Wij = tr(FTLF) argmin tr(FTLF) Lf = λDf
  • 30. THE EIGENVALUE PROBLEM AND THE EMBEDDING • Thus, we solve for the minimum nonzero eigenvalue solution of the generalized eigenvalue problem Lf = λDf •We then order the eigenvalues 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 and construct the embedding from the first corresponding eigenvectors (leaving out the zero eigenvector), giving the embedding Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 30 d xi → yi = (f (i) 1 , . . . , f (i) d )
  • 31. LAPLACIAN EIGENMAPS • Move away from PCA’s full-matrix computations toward a graph sampling of the manifold which allows for a sparse kernel matrix • Point-to-point metric locally applied to preserve distances on the manifold between points • The kernel: or • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 31 Wij = exp −||xi−xj ||2 t xi ∈ Nj xj ∈ Ni 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d Lf = λDf Dii = j Wji L = D −W
  • 32. LAPLACIAN EIGENMAPS 15 10 5 0 G(X,E,W) −5 −10 NN, W EVP 15 10 5 0 −5 −10 −10 −5 0 5 10 15 0 50 100 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 32 −15 −10 −5 0 5 10 15 0 50 100 −15 N = 5 t = 5.0 N = 10 t = 5.0 N = 15 t = 5.0 N = 5 t = 25.0 N = 10 t = 25.0 N = 15 t = 25.0 N = 5 t = ! N = 10 t = ! N = 15 t = ! Images from (1)
  • 33. AN ALTERNATIVE VIEW OF LAPLACIAN EIGENMAPS • Although theoretically sound and using sparsity, Laplacian Eigenmaps is intuitively difficult to understand. • Belkin Niyogi (2003) show that Locally Linear Embedding (Roweis Saul, 2000), which has a more intuitive geometric construction, is approximately equivalent under certain conditions. •We will develop the LLE method, then briefly sketch the argument for approximate equivalence. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 33
  • 34. LOCALLY LINEAR EMBEDDING •We construct the graph sampling the manifold in the same way as before, by finding nearest neighbors of each point. •Weights for the matrix are selected by assuming that local neighborhoods of points are nearly linear, and solving a minimization problem with the cost function: where . i xi − Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 34 j Wijxij 2 xij ∈ Ni
  • 35. FINDING THE WEIGHTS •Weights solving this minimization can be found via a closed-form expression as follows: (1) Compute neighbor correlation matrices (and inverses): Cjk = xij · xik, xij ∈ Ni,Xik ∈ Ni (2) Compute Lagrange multiplier (sum-to-one constraint): λ = αβ P = 1− j (3) Compute reconstruction weights: jk (xi · xik + λ) • Nearly singular can be preconditioned prior to computing. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 35 P k C−1 jk (xi·xik ) P j P k C−1 jk Wij = k C−1 Cjk
  • 36. THE EMBEDDING • To find the embedding, we minimize the same form, this time over the embedding coordinates with fixed weights: argmin subject to constraints: • Centering: i yi − • Unit covariance (to avoid degenerate solutions): Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 36 j Wijyj 2 y i yi = 0 1n i yi ⊗ yi = I
  • 37. COMPUTING THE EMBEDDING • To compute solutions to the minimization, we introduce the sparse matrix defined by Eij = δij −Wij −Wji + E = (I −W)T (I −W) •We then solve for eigenpairs of and take as the embedding the eigenvectors corresponding to the lowest eigenvalues, excluding the zero eigenvalue. • Note that is symmetric and positive semi-definite. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 37 kWkiWkj E E E d
  • 38. LOCALLY LINEAR EMBEDDING • Another method which considers metrics used to weight a graph sampling the manifold •Weights computed by global linear optimization over a local neighborhood around each point • The kernel: jk (xi · xik + λ) • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 38 Wij = k C−1 Ef = λf E = (I −W)T (I −W) 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d
  • 39. CONNECTION TO THE GRAPH LAPLACIAN 12 •We will show in three steps that for a function (under appropriate assumptions) Ef ≈ L2f : (1) Fix a point and show that: xi [(I −W)f]i ≈ −12 f ∈M j Wij(xi − xij )TH(xi − xij ) where H is the Hessian of f at xi . (2) Show that the expectation . (3) Put steps (1) and (2) together to achieve the final result. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 39 E[vTHv] = rLf
  • 40. • Show: CONNECTION TO THE GRAPH LAPLACIAN (1) [(I −W)f]i ≈ −12 j Wij(xi − xij )TH(xi − xij ) • Consider a coordinate system in the tangent plane centered at vj = xij − xi o = xi o and let . This is a vector originating at . αj = Wij xi • Let . Since is in the affine span of its neighbors (and by construction of W ), we have where . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 40 j αj = 1 o = xi = j αjvj
  • 41. CONNECTION TO THE GRAPH LAPLACIAN (1) • Assuming f is sufficiently smooth, we write the 2nd order Taylor approximation f(v) = f(o) + vT∇f + 12 where is the gradient and is the Hessian, both evaluated at . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 41 (vTHv) + o(||v||2) o ∇f H
  • 42. CONNECTION TO THE GRAPH LAPLACIAN (1) [(I −W)f]i = f(o) − •We have , and using Taylor’s approximation for , we can write j ∇f + 12 j Hvj) • Since and , the first three terms disappear, and Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 42 j αjf(vj) f(vj) [(I −W)f]i = f(o) − j αjf(vj) ≈ f(o) − j αjf(o) − j αjvT j αj(vT j αj = 1 j αjvj = o [(I −W)f]i = f(o) − j αjf(vj) ≈ −12 j αjvT j Hvj
  • 43. CONNECTION TO THE GRAPH LAPLACIAN (2) vTHv Lf • Show: is proportional to √αjvj • If form an orthonormal basis (unusual), then j Hvj = tr(H) = Lf j WijvT • If not, we assume x to be a random vector with uniform distribution on every sphere centered at , and show proportionality. • Let be an orthonormal basis for corresponding to eigenvalues . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 43 xi e1, . . . , en H λ1, . . . ,λn
  • 44. CONNECTION TO THE GRAPH LAPLACIAN (2) • Then using the Spectral theorem, we can write E[vTHv] = E • Since is independent of , we can replace to get i Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 44 i λiv, ei2 E v, ei2 E v, ei2 = r E[vTHv] = r ( i λi) = rtr(H) = rLf
  • 45. CONNECTION TO THE GRAPH LAPLACIAN (3) • Now, putting these together, we have [(I −W)f]i ≈ −12 • LLE minimizes which reduces to finding eigenfunctions of , which can now be interpreted as finding eigenfunctions of the iterated Laplacian . Eigenfunctions of coincide with those of . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 45 j WijvT j Hvj E[vTHv] = rLf (I −W)T (I −W)f ≈ 12 L2f fT (I −W)T (I −W)f (I −W)T (I −W) L2 L L 2
  • 46. LOCALLY LINEAR EMBEDDING • Another method which considers metrics used to weight a graph sampling the manifold •Weights computed by global linear optimization over a local neighborhood around each point • The kernel: jk (xi · xik + λ) • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 46 Wij = k C−1 Ef = λf E = (I −W)T (I −W) 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d
  • 47. documents of text. coordinates as observed modes of variability. Previous approaches to this problem, based on multidimensional scaling (MDS) (2), have computed embeddings that attempt to preserve pairwise distances [or generalized disparities Reconstruction errors are measured by the cost function !W# !!iX!i$%jWij X!j2 LOCALLY LINEAR EMBEDDING G(X,E,W) between data points; these distances are measured along straight lines or, in more so-phisticated usages of MDS such as Isomap (4), NN, MIN EVP Images from (6) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 47 linear reconstruc-tions, manifolds, such as modes of variability. this problem, based on (MDS) (2), have that attempt to preserve generalized disparities these distances are lines or, in more so-phisticated such as Isomap (4), these patches by linear coefficients that reconstruct each data point from its neigh-bors. Reconstruction errors are measured by the cost function !W# !!i X!i$%jWij X!j2 (1) which adds up the squared distances between all the data points and their reconstructions. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To com-pute the weights Wij, we minimize the cost nonlinear dimensionality reduction, as illustrated (10) for three-dimensional two-dimensional manifold (A). An unsupervised learning algorithm must coordinates of the manifold without signals that explicitly indicate how (1) which adds up the squared distances between all the data points and their reconstructions. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To com-pute the weights Wij, we minimize the cost The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must
  • 48. COMPUTATIONAL COMPLEXITY OF METHODS Method Computational Cost Memory Usage O(D3) O(D2) O(n3) O(n2) O(ξn2) O(ξn2) O(ξn2) O(ξn2) PCA k-PCA LLE LE is the sparsity ratio of the kernel matrix: the number of non-zero elements divided by the total number of elements in the kernel matrix Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Comparison 48 ξ
  • 49. BENEFITS AND LIMITATIONS Method Benefits Limitations PCA Fast, simple Linear k-PCA Kernel choice Computation, kernel selection LE Sparse kernel, justification Nearest neighbor search LLE Sparse kernel, direct solution Nearest neighbor search Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Comparison 49
  • 50. RESEARCH IDEAS • Guiding principles • Software should be built for modular use within a framework and library • Software should be validated with real known data associated with “ground truth” • Research directions • Landmarks, out-of-sample extensions, low-rank update iterative methods • Hybridization of methods and ideas • Extension to higher order graph properties Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Research Ideas 50
  • 51. SOFTWARE LIBRARY • Comprehensive testbed software library and experimentation framework is needed to support manifold learning research • Must be modular, extensible, platform agnostic • Interpreted/scriptable languages are a good choice for experimentation: Python, MATLAB, Boo, IDL • Previous efforts: • DRToolbox (MATLAB, 2007-) by van der Maaten • scikit.learn (Python, 2009-) by Matthieu Brucher Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Research Ideas 51
  • 52. 1) Belkin, M. and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Comp. 15 (2003),1373-1396. 2) Schölkopf, B., A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Comp. 10 (1998), 1299-1319. 3)Weinberger K. K., B. D. Packer, and L. K. Saul, Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization, Proc AI and Statistics (Dec 2005), 381-388. 4)Golub, G. H. and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press (Baltimore, 1996), 70-75. 5) Roweis, S. T. and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 290 (2000), 2323-2326. 6) Shlens, J, A Tutorial on Principal Component Analysis, Version 2 (Dec 2005). 7) van der Maaten, L., E. Postma and J. van der Herik, Dimensionality Reduction: A Comparative Review, TiCC TR 2009-005 (Oct 2009). Ryan Dimension Reduction - B Harvey - Prelim Oral Exam References 52
  • 53. QUESTIONS? IDEAS? THANK YOU! Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 53