SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
6.897: CSCI8980 Algorithmic Techniques for Big Data September 19, 2013
Lecture 5
Dr. Barna Saha Scribe: Shuo Zhou
Overview
In the previous lecture we saw how using sketching the second frequency moment can be approx-
imated well in space logarithmic in the input size. In this lecture, we explore the dimensionality
reduction technique-a by-product of sketching method in the streaming model. We introduce
Johnson-Linderstrauss (JL) Lemma, a very powerful tool for l2-dimensionality reduction. JLlemma
has found many applications in computer science. As a concrete application of JL lemma, we
consider nearest neighbor problem. We introduce a popular technique of locality sensitive hashing
(LSH) for the nearest neighbor problem. The discussion on LSH will continue in the next lecture
as well.
1 Linear Sketch as Dimensionality Reduction Technique
The algorithm for estimating F2 in the previous lecture maintains a linear sketch [Z1, Z2, ...., Zt] =
Rx where R is a t × n random matrix with entries {+1, −1} and x is the frequency vector. We Use
Y = ||Rx||2
2 to estimate t||x|2
2. t = O( 1
2 ). Then by scaling by a factor of t, we obtain an estimate
of F2(x) = ||x||2
2. We have
Pr |Y/t − ||x||2
2|| ≤ ||x||2
2 >
1
c
where c is some constant. Streaming algorithm operating in the sketch model can be viewed as
dimensionality reduction technique. From a stream which can be viewed as a n-dimensional
vector, we obtain a sketch Y of dimension t. However, the distribution of Y is heavy-tailed, we
would like to have an estimate for F2 such that it gives an (1 ± ) approximation with probability
(say) 1 − 1
n . This can be achieved by taking median of Y1, Y2, .., Ys each an independent estimates
of Y .
The sketch operator R can be viewed as an approximate embedding of ln
2 to sketch space C such
that
1. Each point in C can be described using only a few number so C ⊂ Rt, t << n, and
2. value of l2(S) is approximately equal to F(C(S)), where F(C(S)) = F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt)
Note that F being a median operator (C, F) is not a normed space. So performing any nontrivial
operations in the sketch space (e.g. clustering, similarity search, regression etc.) becomes difficult.
Does there exist a dimensionality reduction technique via sketching for which the sketch space is a
normed space. Below we see one such example, where sketching achieves dimensionality reduction
from ln
2 to lt
2, for t = O( 1
2 log n).
1
2 A Different Linear Sketch
In stead of having ±1 as the entries of sketch operator R, we let each of its entry be a i.i.d. gaussian
random variables from N(0, 1). Recap the properties of normal distribution.
Normal distribution N(0, 1):
• Range (−∞, ∞)
• Density f(x) = e−x2
/
√
2π
• Mean=0, Variance=1
Basic facts
• If X and Y are independent random variables with normal distribution then so is X + Y
• If X and Y are independent with mean 0 then E [X + Y ]2 = E X2 + E Y 2
• E cX = cE X , Var cX = c2Var X
Therefore, our estimation procedure is as follows. We have [Z1, Z2, .., Zt] = Rx and let Y =
t
i=1 Z2
i . We return Y/t
Let us consider any Zi, We have Zi = n
j=1 xjri
j, where ri
j is the entry in the cell R[i, j].
E Z2
i = E[(
j
ri
jxj)2
]
= E[
j
(ri
j)2
x2
j + 2
j<k
E xjxkri
jri
k
=
j
x2
j E (ri
j)2
+ 2
j<k
xjxkE ri
j E ri
k
= ||x||2
2
Here the third equality comes from considering that all ri
j and ri
k are independent and the fourth
equality comes from the fact that E ri
j = 0 for all ri
j and E (ri
j)2 = Var rj
i = 1.
Therefore, we have E Y/t = 1
t i E Z2
i = ||x||2
2.
We now want to show that Y concentrates around its expectation. For this we use JL lemma. From
JL lemma, by setting t = O(log n
2 ), there exist constant C > 0 s.t. for small enough > 0
Pr[Y − t x 2
2] > 2
t x 2
2 ≤ e−C 2t
The above implies
Pr[Y/t − x 2
2] > 2
x 2
2 ≤ e−C 2t
We now prove the above inequality which also establishes JL lemma.
2
3 Johnson-Lindenstrauss Lemma
We shall assume without loss of generality ||x||2
2 = 1 (can be ensured by scaling). We defined Zi =
rix = n
j=1 ri
jxj where ri
j are i.i.d. random variables drawn from N(0, 1). Let Z = [Z1, Z2, ..., Zt].
We have E Y = E ||Z||2
2 = t. We only prove the upper tail here. The proof of the lower tail is
similar.
Lemma 1. Pr ||Z||2
2 ≥ t(1 + )2 ≤ e−t 2+O(t 3).
Proof. We have Y as the random variable ||Z||2
2 and let α = t(1 + )2. For every s > 0, we have
Pr Y > α = Pr esY
> esα
≤
E esY
esα
by Markov inequality
We now calculate the numerator.
E esY
= E es(Z2
1 +Z2
2 +...+Z2
t )
= E
t
i=1
esZ2
i
=
i=1
E esZ2
i by independence of Z2
i
We now calculate E esZ2
i . We have Zi = n
j=1 ri
jxj where each ri
j ∼ N(0, 1). By 2-stability of
normal distribution, Zi is distributed as ||x||2G where G ∼ N(0, 1). Since ||x||2 = 1, Zi ∼ N(0, 1).
Hence,
E esZ2
i =
1
√
2π
∞
−∞
esa2
e−a2/2
da
=
1
√
2π
∞
−∞
e(s−1
2
)a2
da
We now apply change of variables u2 = (1 − s2)a2. Therefore, by elementary calculations, da =
1√
1−2s
du and the above integral becomes
1
√
2π
∞
−∞
1
√
1 − 2s
e−u2/2
du =
1
√
1 − 2s
Therefore
E esY
=
i=1
E esZ2
i =
1
(1 − 2s)t/2
.
Hence
Pr Y > α ≤
1
(1 − 2s)t/2esα
Since the above inequality holds for all s > 0, it holds for s = 1
2 − t
2α . Inserting this value for s we
have
Pr Y > α ≤
t
α
−t/2
e−α
2 (1− t
α ) = e
1
2
(t−α)( t
α )
−t/2
.
3
Recall that α = t(1 + )2. Thus we have
e
1
2
(t−α)( t
α )
−t/2
= e
t
2
(1−(1+ )2)
e− t
2
ln ( t
α )
= e
t
2
(−2 − 2−ln 1
(1+ )2 )
= et(− − 2/2+ln (1+ ))
= et(− − 2/2+ − 2/2+ 3/3−...)
= e−t 2+O(t 3)
Lemma 2 (Johnson-Lindenstrauss). For any 0 < < 1 and any integer N, let t be a positive
integer such that
t ≥
4 ln N
2
2 +
3
3
Then for any set V of N points in Rn, there is a map f : Rn → Rt such that for all u and v ∈ V .
(1 − ) u − v 2
2 ≤ f(u) − f(v) 2
2 ≤ (1 + ) u − v 2
2
Proof. Set f to be the sketching matrix t × n with each entry being i.i.d. drawn from N(0, 1). Let
Z = f(u)−f(v) for a given u and v in V . Then by Lemma 1,Pr (1− ) u−v 2
2 ≤ f(u)−f(v) 2
2 ≤
(1 + ) u − v 2
2 ≥ 1 − e−t 2+O(t 3) ≥ 1 − 1
N3 . Now there are at most N
2 pairs, hence by union
bound for all u, v ∈ V ,
Pr (1 − ) u − v 2
2 ≤ f(u) − f(v) 2
2 ≤ (1 + ) u − v 2
2 ≥ 1 − e−t 2+O(t 3)
≥ 1 −
1
N
.
Therefore, by repeating a few times one can obtain the required map.
JL lemma has found many applications is computer science. We now focus on one particular
application of it namely, nearest neighbor search
4 Nearest Neighbor Problem
Given a set of points V , |V | = N , a distance metric d and a query point q, we want to find out the
point x ∈ V nearest to q. We first focus on the related near neighbor problem, where we ask given a
set of points V , |V | = N and a query point q, does there exist a point x ∈ V such that d(x, q) ≤ R.
Clearly, if we can solve this later question, then with an addition of O(log N) increase in query
time the nearest neighbor problem can be solved within arbitrary (1 + ) ( > 0 is a constant)
approximation by binary search on R.
When dimension is low, both nearest neighbor problem and near neighbor problem can be solved
efficiently. For example, when dimension (denoted by d) is 2, one requires space O(N) and query
time O(log N) to solve both of these problems. However, with increasing d, either the required
space or the query time becomes exponential in d. This is known as curse of dimensionality. To
tackle this, one resort to approximate near neighbor problem which we formally define below.
4
Definition 3 ((c, R)-Near Neighbor Problem). Given a set of points V , a distance metric d and a
query point q, the (c, R)-Near Neighbor problem, c ≥ 1, requires if there exists a point x such that
d(x, q) ≤ R, then one must find a point x such that d(x , q) ≤ cR with probability > (1 − δ) for a
given δ > 0.
We now introduce locality sensitive hashing (LSH) to solve (c, R)-Near Neighbor problem.
5 Locality Sensitive Hashing
The basic intuition behind LSH is that two points that are close to each other should hash to the
same bucket with high probability, while those which are far apart should hash to different buckets.
During preprocessing, we hash all the points in V to the respective buckets. When a query q comes,
one only searches in the buckets that contain q.
Definition 4. LSH A family of hash functions H is said to be (c, R, p1, p2)-sensitive for a distance
metric d, when:
1. Prh∼H[h(x) = h(y)] ≥ p1 for all x and y such that d(x.y) ≤ R
2. Prh∼H[h(x) = h(y)] ≤ p2 for all x and y such that d(x.y) > cR
For H to be LSH one must have p1 > p2.
Example 5. Let V ⊆ [0, 1]n and d(x, y) = Hamming distance between x and y. Let R << n and
cR << n, define H = {h1, h2, ..., hn} such that hi(x) = xi. p1 ≥ 1 − R
n and p2 ≤ 1 − cR
n .
6 Description of Algorithm:
Algorithm 1 Preprocessing
for all x ∈ V do
for all j ∈ [L] do
add x to bucketj(gj(x))
end for
end for
Algorithm 2 Query(q)
for j = 1 to L do
for all x ∈ bucketj(gj(q)) do
if d(x, q) ≤ cR then
return x
end if
end for
end for
return none
5
Let H be a family of LSH which is (c, R, p1, p2) -sensitive. Select K×L hash functions from this fam-
ily independently and randomly: hi,j ∼ H, i ∈ [1, K], j ∈ [1, L]. Now define, gj = h1,j, h2,j, ..., hK,j
for all j ∈ [1, L]. That is,
g1 = h1,1, h2,1.....hk,1
g2 = h1,2, h2,2.....hk.2
...
gL = h1,L, h2,L.....hk,L
The preprocessing time is O(NLK) assuming computation of each hi,j requires O(1) time. Then
computing L g1, g2, ..., gL require O(KL) time and each of the N points need to be hashed giving
the required time complexity.
Hashing query point q by g1, g2, .., gL require O(KL) time. Suppose, F be the probability for any
given j that a point x is hashed to the same bucket by gj as q but d(x, q) > cR. Then the number
of such points hashed to the same bucket at a particular level on expectation is NF. There are
L levels, so searching through all these points require O(NLF) time giving the said running time.
Therefore, in order to obtain a good query time, we want F to be as small as possible.
References
[1] Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse
of Dimensionality. STOC 1998: 604-613
6

Contenu connexe

Tendances

Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
Boundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductBoundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductVjekoslavKovac1
 
Calculus of variation problems
Calculus of variation   problemsCalculus of variation   problems
Calculus of variation problemsSolo Hermelin
 
Dynamical systems solved ex
Dynamical systems solved exDynamical systems solved ex
Dynamical systems solved exMaths Tutoring
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Traian Rebedea
 
Relative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeRelative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeeSAT Publishing House
 
A Proof of the Riemann Hypothesis
A Proof of the Riemann  HypothesisA Proof of the Riemann  Hypothesis
A Proof of the Riemann Hypothesisnikos mantzakouras
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
OrthogonalFunctionsPaper
OrthogonalFunctionsPaperOrthogonalFunctionsPaper
OrthogonalFunctionsPaperTyler Otto
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodHa Phuong
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
A sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsA sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsVjekoslavKovac1
 

Tendances (20)

Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)
 
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
 
Legendre functions
Legendre functionsLegendre functions
Legendre functions
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
160280102051 c3 aem
160280102051 c3 aem160280102051 c3 aem
160280102051 c3 aem
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Chapter 1 (maths 3)
Chapter 1 (maths 3)Chapter 1 (maths 3)
Chapter 1 (maths 3)
 
Boundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductBoundedness of the Twisted Paraproduct
Boundedness of the Twisted Paraproduct
 
Calculus of variation problems
Calculus of variation   problemsCalculus of variation   problems
Calculus of variation problems
 
Dynamical systems solved ex
Dynamical systems solved exDynamical systems solved ex
Dynamical systems solved ex
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Relative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeRelative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relative
 
A Proof of the Riemann Hypothesis
A Proof of the Riemann  HypothesisA Proof of the Riemann  Hypothesis
A Proof of the Riemann Hypothesis
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
OrthogonalFunctionsPaper
OrthogonalFunctionsPaperOrthogonalFunctionsPaper
OrthogonalFunctionsPaper
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Monopole zurich
Monopole zurichMonopole zurich
Monopole zurich
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
A sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsA sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentials
 

En vedette

презентация красная площадь
презентация красная площадьпрезентация красная площадь
презентация красная площадьAtner Yegorov
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизацииAtner Yegorov
 
оао электроприбор
оао электроприбороао электроприбор
оао электроприборAtner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Atner Yegorov
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистикиAtner Yegorov
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салютAtner Yegorov
 
презентация РИФ регион 2015
презентация РИФ регион 2015презентация РИФ регион 2015
презентация РИФ регион 2015Atner Yegorov
 
Образование и интеллектуально-инновационный потенциал ЧР
Образование и интеллектуально-инновационный потенциал ЧРОбразование и интеллектуально-инновационный потенциал ЧР
Образование и интеллектуально-инновационный потенциал ЧРAtner Yegorov
 
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...Eca_ru
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Atner Yegorov
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИAtner Yegorov
 
Dmitriev Социология больших данных
Dmitriev Социология больших данныхDmitriev Социология больших данных
Dmitriev Социология больших данныхAtner Yegorov
 
Доклад министра здравоохранения Чувашии 2015
Доклад министра здравоохранения Чувашии 2015Доклад министра здравоохранения Чувашии 2015
Доклад министра здравоохранения Чувашии 2015Atner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыAtner Yegorov
 
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?Олег Паладьев
 
Галицкий. От анализа к накоплению знаний
Галицкий. От анализа к накоплению знанийГалицкий. От анализа к накоплению знаний
Галицкий. От анализа к накоплению знанийAtner Yegorov
 

En vedette (20)

презентация красная площадь
презентация красная площадьпрезентация красная площадь
презентация красная площадь
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизации
 
оао электроприбор
оао электроприбороао электроприбор
оао электроприбор
 
презентация1
презентация1презентация1
презентация1
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистики
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салют
 
презентация РИФ регион 2015
презентация РИФ регион 2015презентация РИФ регион 2015
презентация РИФ регион 2015
 
Образование и интеллектуально-инновационный потенциал ЧР
Образование и интеллектуально-инновационный потенциал ЧРОбразование и интеллектуально-инновационный потенциал ЧР
Образование и интеллектуально-инновационный потенциал ЧР
 
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...
Александр Воробьев, руководитель регионального отделения Движения ЭКА в Респу...
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИ
 
Dmitriev Социология больших данных
Dmitriev Социология больших данныхDmitriev Социология больших данных
Dmitriev Социология больших данных
 
Доклад министра здравоохранения Чувашии 2015
Доклад министра здравоохранения Чувашии 2015Доклад министра здравоохранения Чувашии 2015
Доклад министра здравоохранения Чувашии 2015
 
Vlc202
Vlc202Vlc202
Vlc202
 
презентация1
презентация1презентация1
презентация1
 
Relap
RelapRelap
Relap
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум Чебоксары
 
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?
КАК ВЫРАСТИТЬ ЛИДЕРА И ГЕРОЯ?
 
Галицкий. От анализа к накоплению знаний
Галицкий. От анализа к накоплению знанийГалицкий. От анализа к накоплению знаний
Галицкий. От анализа к накоплению знаний
 

Similaire à Lecture5

μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACEμ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACEtheijes
 
Polya recurrence
Polya recurrencePolya recurrence
Polya recurrenceBrian Burns
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
 
Double_Integral.pdf
Double_Integral.pdfDouble_Integral.pdf
Double_Integral.pdfd00a7ece
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
6-Nfa & equivalence with RE.pdf
6-Nfa & equivalence with RE.pdf6-Nfa & equivalence with RE.pdf
6-Nfa & equivalence with RE.pdfshruti533256
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient MethodsMTiti1
 
Nonlinear perturbed difference equations
Nonlinear perturbed difference equationsNonlinear perturbed difference equations
Nonlinear perturbed difference equationsTahia ZERIZER
 
A unique common fixed point theorem for four
A unique common fixed point theorem for fourA unique common fixed point theorem for four
A unique common fixed point theorem for fourAlexander Decker
 
Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Kai Katsumata
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisCharaf Ech-Chatbi
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisCharaf Ech-Chatbi
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSChandan Singh
 

Similaire à Lecture5 (20)

μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACEμ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Polya recurrence
Polya recurrencePolya recurrence
Polya recurrence
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Metodo gauss_newton.pdf
Metodo gauss_newton.pdfMetodo gauss_newton.pdf
Metodo gauss_newton.pdf
 
Lec 4-slides
Lec 4-slidesLec 4-slides
Lec 4-slides
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Double_Integral.pdf
Double_Integral.pdfDouble_Integral.pdf
Double_Integral.pdf
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
6-Nfa & equivalence with RE.pdf
6-Nfa & equivalence with RE.pdf6-Nfa & equivalence with RE.pdf
6-Nfa & equivalence with RE.pdf
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient Methods
 
Nonlinear perturbed difference equations
Nonlinear perturbed difference equationsNonlinear perturbed difference equations
Nonlinear perturbed difference equations
 
A unique common fixed point theorem for four
A unique common fixed point theorem for fourA unique common fixed point theorem for four
A unique common fixed point theorem for four
 
Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann Hypothesis
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann Hypothesis
 
Imc2017 day2-solutions
Imc2017 day2-solutionsImc2017 day2-solutions
Imc2017 day2-solutions
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICS
 
Ch05 2
Ch05 2Ch05 2
Ch05 2
 

Plus de Atner Yegorov

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Atner Yegorov
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Atner Yegorov
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Atner Yegorov
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Atner Yegorov
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015Atner Yegorov
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг ЧувашииAtner Yegorov
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Atner Yegorov
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанAtner Yegorov
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, ТатарстанAtner Yegorov
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициативаAtner Yegorov
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ Atner Yegorov
 
Город Innopolis
Город InnopolisГород Innopolis
Город InnopolisAtner Yegorov
 
проект благоустройства парк «победы»
проект благоустройства парк «победы»проект благоустройства парк «победы»
проект благоустройства парк «победы»Atner Yegorov
 
презентация шупашкар
презентация шупашкарпрезентация шупашкар
презентация шупашкарAtner Yegorov
 
презентация макдональдс
презентация макдональдспрезентация макдональдс
презентация макдональдсAtner Yegorov
 
презентация дом мод
презентация дом модпрезентация дом мод
презентация дом модAtner Yegorov
 
презентация выборочно проекты по туризму
презентация выборочно проекты по туризмупрезентация выборочно проекты по туризму
презентация выборочно проекты по туризмуAtner Yegorov
 
Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Atner Yegorov
 

Plus de Atner Yegorov (18)

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг Чувашии
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, Татарстан
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, Татарстан
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициатива
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
 
Город Innopolis
Город InnopolisГород Innopolis
Город Innopolis
 
проект благоустройства парк «победы»
проект благоустройства парк «победы»проект благоустройства парк «победы»
проект благоустройства парк «победы»
 
презентация шупашкар
презентация шупашкарпрезентация шупашкар
презентация шупашкар
 
презентация макдональдс
презентация макдональдспрезентация макдональдс
презентация макдональдс
 
презентация дом мод
презентация дом модпрезентация дом мод
презентация дом мод
 
презентация выборочно проекты по туризму
презентация выборочно проекты по туризмупрезентация выборочно проекты по туризму
презентация выборочно проекты по туризму
 
Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Lecture5

  • 1. 6.897: CSCI8980 Algorithmic Techniques for Big Data September 19, 2013 Lecture 5 Dr. Barna Saha Scribe: Shuo Zhou Overview In the previous lecture we saw how using sketching the second frequency moment can be approx- imated well in space logarithmic in the input size. In this lecture, we explore the dimensionality reduction technique-a by-product of sketching method in the streaming model. We introduce Johnson-Linderstrauss (JL) Lemma, a very powerful tool for l2-dimensionality reduction. JLlemma has found many applications in computer science. As a concrete application of JL lemma, we consider nearest neighbor problem. We introduce a popular technique of locality sensitive hashing (LSH) for the nearest neighbor problem. The discussion on LSH will continue in the next lecture as well. 1 Linear Sketch as Dimensionality Reduction Technique The algorithm for estimating F2 in the previous lecture maintains a linear sketch [Z1, Z2, ...., Zt] = Rx where R is a t × n random matrix with entries {+1, −1} and x is the frequency vector. We Use Y = ||Rx||2 2 to estimate t||x|2 2. t = O( 1 2 ). Then by scaling by a factor of t, we obtain an estimate of F2(x) = ||x||2 2. We have Pr |Y/t − ||x||2 2|| ≤ ||x||2 2 > 1 c where c is some constant. Streaming algorithm operating in the sketch model can be viewed as dimensionality reduction technique. From a stream which can be viewed as a n-dimensional vector, we obtain a sketch Y of dimension t. However, the distribution of Y is heavy-tailed, we would like to have an estimate for F2 such that it gives an (1 ± ) approximation with probability (say) 1 − 1 n . This can be achieved by taking median of Y1, Y2, .., Ys each an independent estimates of Y . The sketch operator R can be viewed as an approximate embedding of ln 2 to sketch space C such that 1. Each point in C can be described using only a few number so C ⊂ Rt, t << n, and 2. value of l2(S) is approximately equal to F(C(S)), where F(C(S)) = F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt) Note that F being a median operator (C, F) is not a normed space. So performing any nontrivial operations in the sketch space (e.g. clustering, similarity search, regression etc.) becomes difficult. Does there exist a dimensionality reduction technique via sketching for which the sketch space is a normed space. Below we see one such example, where sketching achieves dimensionality reduction from ln 2 to lt 2, for t = O( 1 2 log n). 1
  • 2. 2 A Different Linear Sketch In stead of having ±1 as the entries of sketch operator R, we let each of its entry be a i.i.d. gaussian random variables from N(0, 1). Recap the properties of normal distribution. Normal distribution N(0, 1): • Range (−∞, ∞) • Density f(x) = e−x2 / √ 2π • Mean=0, Variance=1 Basic facts • If X and Y are independent random variables with normal distribution then so is X + Y • If X and Y are independent with mean 0 then E [X + Y ]2 = E X2 + E Y 2 • E cX = cE X , Var cX = c2Var X Therefore, our estimation procedure is as follows. We have [Z1, Z2, .., Zt] = Rx and let Y = t i=1 Z2 i . We return Y/t Let us consider any Zi, We have Zi = n j=1 xjri j, where ri j is the entry in the cell R[i, j]. E Z2 i = E[( j ri jxj)2 ] = E[ j (ri j)2 x2 j + 2 j<k E xjxkri jri k = j x2 j E (ri j)2 + 2 j<k xjxkE ri j E ri k = ||x||2 2 Here the third equality comes from considering that all ri j and ri k are independent and the fourth equality comes from the fact that E ri j = 0 for all ri j and E (ri j)2 = Var rj i = 1. Therefore, we have E Y/t = 1 t i E Z2 i = ||x||2 2. We now want to show that Y concentrates around its expectation. For this we use JL lemma. From JL lemma, by setting t = O(log n 2 ), there exist constant C > 0 s.t. for small enough > 0 Pr[Y − t x 2 2] > 2 t x 2 2 ≤ e−C 2t The above implies Pr[Y/t − x 2 2] > 2 x 2 2 ≤ e−C 2t We now prove the above inequality which also establishes JL lemma. 2
  • 3. 3 Johnson-Lindenstrauss Lemma We shall assume without loss of generality ||x||2 2 = 1 (can be ensured by scaling). We defined Zi = rix = n j=1 ri jxj where ri j are i.i.d. random variables drawn from N(0, 1). Let Z = [Z1, Z2, ..., Zt]. We have E Y = E ||Z||2 2 = t. We only prove the upper tail here. The proof of the lower tail is similar. Lemma 1. Pr ||Z||2 2 ≥ t(1 + )2 ≤ e−t 2+O(t 3). Proof. We have Y as the random variable ||Z||2 2 and let α = t(1 + )2. For every s > 0, we have Pr Y > α = Pr esY > esα ≤ E esY esα by Markov inequality We now calculate the numerator. E esY = E es(Z2 1 +Z2 2 +...+Z2 t ) = E t i=1 esZ2 i = i=1 E esZ2 i by independence of Z2 i We now calculate E esZ2 i . We have Zi = n j=1 ri jxj where each ri j ∼ N(0, 1). By 2-stability of normal distribution, Zi is distributed as ||x||2G where G ∼ N(0, 1). Since ||x||2 = 1, Zi ∼ N(0, 1). Hence, E esZ2 i = 1 √ 2π ∞ −∞ esa2 e−a2/2 da = 1 √ 2π ∞ −∞ e(s−1 2 )a2 da We now apply change of variables u2 = (1 − s2)a2. Therefore, by elementary calculations, da = 1√ 1−2s du and the above integral becomes 1 √ 2π ∞ −∞ 1 √ 1 − 2s e−u2/2 du = 1 √ 1 − 2s Therefore E esY = i=1 E esZ2 i = 1 (1 − 2s)t/2 . Hence Pr Y > α ≤ 1 (1 − 2s)t/2esα Since the above inequality holds for all s > 0, it holds for s = 1 2 − t 2α . Inserting this value for s we have Pr Y > α ≤ t α −t/2 e−α 2 (1− t α ) = e 1 2 (t−α)( t α ) −t/2 . 3
  • 4. Recall that α = t(1 + )2. Thus we have e 1 2 (t−α)( t α ) −t/2 = e t 2 (1−(1+ )2) e− t 2 ln ( t α ) = e t 2 (−2 − 2−ln 1 (1+ )2 ) = et(− − 2/2+ln (1+ )) = et(− − 2/2+ − 2/2+ 3/3−...) = e−t 2+O(t 3) Lemma 2 (Johnson-Lindenstrauss). For any 0 < < 1 and any integer N, let t be a positive integer such that t ≥ 4 ln N 2 2 + 3 3 Then for any set V of N points in Rn, there is a map f : Rn → Rt such that for all u and v ∈ V . (1 − ) u − v 2 2 ≤ f(u) − f(v) 2 2 ≤ (1 + ) u − v 2 2 Proof. Set f to be the sketching matrix t × n with each entry being i.i.d. drawn from N(0, 1). Let Z = f(u)−f(v) for a given u and v in V . Then by Lemma 1,Pr (1− ) u−v 2 2 ≤ f(u)−f(v) 2 2 ≤ (1 + ) u − v 2 2 ≥ 1 − e−t 2+O(t 3) ≥ 1 − 1 N3 . Now there are at most N 2 pairs, hence by union bound for all u, v ∈ V , Pr (1 − ) u − v 2 2 ≤ f(u) − f(v) 2 2 ≤ (1 + ) u − v 2 2 ≥ 1 − e−t 2+O(t 3) ≥ 1 − 1 N . Therefore, by repeating a few times one can obtain the required map. JL lemma has found many applications is computer science. We now focus on one particular application of it namely, nearest neighbor search 4 Nearest Neighbor Problem Given a set of points V , |V | = N , a distance metric d and a query point q, we want to find out the point x ∈ V nearest to q. We first focus on the related near neighbor problem, where we ask given a set of points V , |V | = N and a query point q, does there exist a point x ∈ V such that d(x, q) ≤ R. Clearly, if we can solve this later question, then with an addition of O(log N) increase in query time the nearest neighbor problem can be solved within arbitrary (1 + ) ( > 0 is a constant) approximation by binary search on R. When dimension is low, both nearest neighbor problem and near neighbor problem can be solved efficiently. For example, when dimension (denoted by d) is 2, one requires space O(N) and query time O(log N) to solve both of these problems. However, with increasing d, either the required space or the query time becomes exponential in d. This is known as curse of dimensionality. To tackle this, one resort to approximate near neighbor problem which we formally define below. 4
  • 5. Definition 3 ((c, R)-Near Neighbor Problem). Given a set of points V , a distance metric d and a query point q, the (c, R)-Near Neighbor problem, c ≥ 1, requires if there exists a point x such that d(x, q) ≤ R, then one must find a point x such that d(x , q) ≤ cR with probability > (1 − δ) for a given δ > 0. We now introduce locality sensitive hashing (LSH) to solve (c, R)-Near Neighbor problem. 5 Locality Sensitive Hashing The basic intuition behind LSH is that two points that are close to each other should hash to the same bucket with high probability, while those which are far apart should hash to different buckets. During preprocessing, we hash all the points in V to the respective buckets. When a query q comes, one only searches in the buckets that contain q. Definition 4. LSH A family of hash functions H is said to be (c, R, p1, p2)-sensitive for a distance metric d, when: 1. Prh∼H[h(x) = h(y)] ≥ p1 for all x and y such that d(x.y) ≤ R 2. Prh∼H[h(x) = h(y)] ≤ p2 for all x and y such that d(x.y) > cR For H to be LSH one must have p1 > p2. Example 5. Let V ⊆ [0, 1]n and d(x, y) = Hamming distance between x and y. Let R << n and cR << n, define H = {h1, h2, ..., hn} such that hi(x) = xi. p1 ≥ 1 − R n and p2 ≤ 1 − cR n . 6 Description of Algorithm: Algorithm 1 Preprocessing for all x ∈ V do for all j ∈ [L] do add x to bucketj(gj(x)) end for end for Algorithm 2 Query(q) for j = 1 to L do for all x ∈ bucketj(gj(q)) do if d(x, q) ≤ cR then return x end if end for end for return none 5
  • 6. Let H be a family of LSH which is (c, R, p1, p2) -sensitive. Select K×L hash functions from this fam- ily independently and randomly: hi,j ∼ H, i ∈ [1, K], j ∈ [1, L]. Now define, gj = h1,j, h2,j, ..., hK,j for all j ∈ [1, L]. That is, g1 = h1,1, h2,1.....hk,1 g2 = h1,2, h2,2.....hk.2 ... gL = h1,L, h2,L.....hk,L The preprocessing time is O(NLK) assuming computation of each hi,j requires O(1) time. Then computing L g1, g2, ..., gL require O(KL) time and each of the N points need to be hashed giving the required time complexity. Hashing query point q by g1, g2, .., gL require O(KL) time. Suppose, F be the probability for any given j that a point x is hashed to the same bucket by gj as q but d(x, q) > cR. Then the number of such points hashed to the same bucket at a particular level on expectation is NF. There are L levels, so searching through all these points require O(NLF) time giving the said running time. Therefore, in order to obtain a good query time, we want F to be as small as possible. References [1] Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998: 604-613 6