2. General Paradigm
Reduce and Conquer
• Large Problem Small Problem
– Break array into two parts
– Consider odd and even elements
– Sample edges in a graph to obtain a smaller graph
– Represent a graph by a collection of trees
– Take number modulo small prime
– Multiply matrix by a random vector
– Project high dimensional point sets into fewer dimensions
3. The Problem
• Given n points in D dimensional space
• Project them in d << D dimensions
– So (Euclidean) distance between every pair of points is
(almost) preserved
• How does d compare to n?
5. First Attempt
• Can we make d=n-1?
– X axis through 2 of the points
– Y axis so 3rd point is in the XY
plane
– Z axis so 4th point is in the XYZ
3d space
– And so on
6. First Attempt
• Time taken
– Each new axis has to be made
orthogonal to all previous axes
– O(n2 D)
– Too slow
7. Second Attempt
Use Random Projections
• Take d random vectors r1..rd
• For every point p, take the d dimensional point
• [ p.r1 p.r2 .. p.rd ] * scaling-factor
• Do these d-dim points preserve inter-point
distances approximately? How large should d be?
8. Random Projections
Further Simplification
• Take any vector p in D dimensions
• Suppose we show
– [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p|
– Failure prob < 1/n3
• Prob that even one of the n2 difference vector
lengths is not preserved with prob < n2/n3 ~ 1/n
11. Generating Random Vectors without
Directional Bias
• Take D numbers (X1...XD), each N(0,1), independently
• Distribution of each number X
– Pr of being between a..a+da ~ e-a2/2
• Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD
– e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD
– e-(a12+a22+aD2)/2 da1da2….daD
– e-l2/2 da1da2….daD
So no dependence on direction, only on length l !
12. The Algorithm
• Take d random vectors r1..rd
– Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from
N(0,1) independently
• For every point p, take the d dimensional point
• [ p.r1 p.r2 .. p.rd ] * sqrt(1/d)
• Time: n*d*D
13. Simplifying Further
• Take any vector p in D dimensions
• We need to show that
• [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p|
• Failure prob < 1/n3
• We can assume p to be 1 0 0 0 0 0 …
– because random vectors have no directional bias
– Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
14. Analysis
• We need to show that
• [X1 X2 … Xd] * sqrt(1/d) has length ~ 1
• Failure prob < 1/n3
• Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3
• Or (X12+…+Xd2) ~ d, failure prob < 1/n3
• Note Xi has mean 1 and s.d sqrt(2)
15. Law of Large Numbers
• Y1..Yd each with any (decent) distribution with mean
1 and s.d sqrt(2)
• Then Y1+…+Yd tends to a Normal distribution with
mean d and s.d sqrt(2d) (for large d)
• Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) <
• e-(∆d)2/2.2d = e-∆2d/4
• Choose d=12 ln n/∆2 , this is < 1/n3 as needed
16. Conclusion
• n numbers in D dimensions
– can be projected to 12 ln n/∆2 dimensions
– all distances stretch only by (1+/-∆)
– with prob > 1-1/n