SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
MS&E 317/CS 263: Algorithms for Modern Data Models, Aut 2011-12
Course URL: http://www.stanford.edu/~ashishg/amdm.
Instructor: Ashish Goel, Stanford University.
Lecture 8, 10/19/2011. Scribed by Pong Eksombatchai.
In the first part of this lecture, we will continue to improve on our Naive Algorithm to keep track
of the connected components of an undirected graph and modify it into the Union-Find Algorithm.
Then in the second part, we will introduce Local Sensitive Hashing.

Finding Connected Components of an Undirected Graph
We are given an initial set of the nodes V of an undirected graph G and edges (ut , vt ) appearing
at time t. We have to support the ability to find the number of connected components in G and
check if nodes u and v are in the same connected component at any given time. The basic idea
is that we have multiple arrays; COLOR is to keep track the connected component each node
belongs to and the others to keep track of the nodes in each component. We will suppose that
V = {1, 2, ..., N }. Before any edges arrive, we have N non-empty arrays corresponding to the N
components consisting of individual vertices. After each edge arrival, we update the arrays so that
there are exactly as many non-empty arrays as the number of connected components. This gives
the following simple algorithm.

Naive Algorithm
State
Store an array COLOR of size N and arrays A1 , A2 , ..., AN , where Ai = {u | COLOR[u] = i}
Initialization
Set COLOR[u] = u and Au = {u} for all u ∈ V .
Update(t, ut , vt )
If COLOR[ut ] = COLOR[vt ] then
Set i = COLOR[ut ] and j = COLOR[vt ]
If |Aj | > |Ai | then
Swap(i, j)
Concatenate Ai with Aj
Set COLOR[v] = i for all v ∈ Aj
Set Aj = {}
Query: Number of Connected Components
Count the number of Ai for all integers i in the range [1, N ] such that Ai is not empty.

1
Query: Is Connected(u, v)
Return true if COLOR[u] is equal to COLOR[v], otherwise return false.
The state space size is O(N ) because we have the array COLOR and the total size of all the
Ai is N at any given time. Suppose that M in the total number of edges in graph G when all of
the edges are given, we have that the total running time of this algorithm is O(M + N log N ). The
O(N log N ) comes from the part when we need to update COLOR to the correct values after we
concatnate Ai with Aj . It may seem at first that this process takes O(N 2 ) in total, but we have
that each COLOR[u] is updated at most O(log N ) times, because the connected component that
u is in at least doubles in size everytime it is merged into another component; making the total
running time of merging O(N log N ).
Note that when we are dealing with very large graphs such as a real world social network graph,
we cannot keep the whole COLOR array in memory and need to store portions of it across different machines. After we have all the edges of G, we know that we have to look up COLOR[u]
at most O(deg(u) + log N ) total number of times, because we need to look at COLOR[u] at least
once for every edge (u, v) in the update step and the number of times COLOR[u] can be updated
is O(log N ). This may seem that the load for each machine is quite balanced, but in real world
networks, we often have nodes with very high degree. We will end up hitting machines that contain
these high degree nodes a lot more often than others, which is not the behavior we want.

Union-Find Algorithm
Although the Naive Algorithm works quite well, we will now develop it into a simple Union-Find
Algorithm. The idea behind it is to represent each set by a tree, where each node keeps a reference
to its parent and the representative of the set is the root of the tree. The simple algorithm, which
excludes the Path Compression improvement, works as follow:
State
Store an array parent and an array size of length N , where parent[u] contains the parent of u in
the tree and size[u] is the size of the set in which u is the representative of. Note that size[u] is
undefined if u is not a representative of a set. For initialization, we will set parent[u] = u and
size[u] = 1 for all u ∈ V .
Union(t, ut , vt )
Set r1 = Find(ut ) and r2 = Find(vt )
If r1 = r2 then
If size[r1 ] < size[r2 ] then
Swap(r1 ,r2 )
parent[r2 ] = r1
size[r1 ] = size[r1 ] + size[r2 ]

2
Find(u)
while parent[u] = u
u = parent[u]
return u
Query: Number of Connected Components
Count the number of u for all u ∈ V such that parent[u] = u.
Query: Is Connected(u, v)
Call Find(u) and Find(v) and if the return values are the same return true, otherwise return false.
The state space size is O(N ) because we have two arrays of length N ; parent and size. We
also have that the running time for each update step is O(log N ) because most of our work comes
from the call to Find which has a runtime of O(log N ). For each u, the update to the connected
component U that node u is in, makes the path length from u to the root either the same when
a smaller connected component is merged into U or increase by one when U is merged into a connected component at least the size of U . In the worst case we have that path length from u to the
root increases by one in each update, but this can only happen O(log N ) times because U at least
doubles in size each time this happens. We have that the path length from any node to the root is
O(log N ) and thus the runtime of each Find is O(log N ).
In a distributed network, most of our load will come from the looking up parent[u] in Find. There
is still a problem with this because we need to look up the nodes which are representatives of the
set a lot more often than others. Fortunately, there are techniques that can solve this problem
which will be discussed later in class.

Locality Sensitive Hashing
Given a set of points V and a distance metric d, we are often interested in finding out if there are
any points x ∈ V such that x is close to the query point q. This problem can be solved quite well
in low dimension spaces by using space partitioning techniques, but in high dimension spaces it is
impossible to do that. However, fortunately, we will be able to use LSH to solve some types of
these problems, one of which is the (c, R)-Near Neighbor problem. The problem is defined as follow:
Input
The input to this problem which is given up front is a set of points V of size N and a distance
metric d.
Query(q)
For each query point q, if there exists some x ∈ V such that d(x, q) ≤ R, then the algorithm must
output x ∈ V such that d(x , q) ≤ cR with probability at least 1 − δ.

3
After defining the problem that we are trying to solve, we will define the tool that will be used to
solve this problem.
Locality Sensitive Hashing Family: A family of hash functions H that is said to be (c, R, P1 , P2 )sensitive for a distance metric d, when:
(1) P rh∼H [h(x) = h(y)] ≥ P1 for all x and y such that d(x, y) ≤ R
(2) P rh∼H [h(x) = h(y)] ≤ P2 for all x and y such that d(x, y) > cR
Example:
Let V ⊆ {0, 1}J and d(x, y) = Hamming distance between x and y. If R << J and cR << J,
one example LSH family is H = {hi } for all integers i from the range [1, J], where hi (x) = xi .
Therefore, the family H is (c, R, P1 , P2 )-sensitive for the distance metric d, when P1 = 1 − R and
J
P2 = 1 − cR .
J
After defining LSH families, the algorithm that we will use to solve the (c, R)-Near Neighbor
problem will be as follow:
Locality-Sensitive Hashing for solving (c, R)-Near Neighbor
Suppose that we have a LSH family H that is (c, R, P1 , P2 )-sensitive for the distance metric d. Let
hi,j be hash functions such that hi,j ∼ H for all integers i and j such that 1 ≤ i ≤ K and 1 ≤ j ≤ L.
We will define gj (x) to be < h1,j (x), h2,j (x), ..., hK,j (x) > for all intergers j such that 1 ≤ j ≤ L.
Preprocessing
For all x ∈ V and for each integer j in the range [1, L], we add x to the bucketj (gj (x)).
Query(q)
for j = 1 to L
for all x ∈ bucketj (gj (q))
if d(x, q) ≤ cR then return x
return none
It is easy to see that the total runtime to preprocess all of the points is O(N KL) and the space
required is O(N L). The runtime for each query is O(KL + N LF ), where F is the probability that
a point x is hashed using gj into a bucketj that q is also hashed into, but d(x, q) > cR. To find F ,
from the definition of LSH family, we notice that:
P r[gj (x) = gj (q) | d(x, q) > cR] = P r[∩i=1,...,K hij (x) = hij (q) | d(x, q) > cR]
= Πi=1,...,K P r[hij (x) = hij (q) | d(x, q) > cR]
K
≤ P2
K
Hence, we have that the running time is O(KL + N LP2 ). Next, we try to analyze the success

4
probability of the algorithm. We can find the lower bound of it, which is the probability that a
point x such that d(x, q) ≤ R is hashed using any of the gj into a bucketj that q is also hashed
into. First, we calculate the probability P r[gj (x) = gj (q) | d(x, q) ≤ R].
P r[gj (x) = gj (q) | d(x, q) ≤ R] = P r[∩i=1,...,K hij (x) = hij (q) | d(x, q) ≤ R]
= Πi=1,...,K P r[hij (x) = hij (q) | d(x, q) ≤ R]
K
≥ P1
Next, we calculate the probability of success, which is P r[∪j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R].
P r[∪j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R] = 1 − P r[∩j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R]
= 1 − Πj=1,...,L P r[gj (x) = gj (q) | d(x, q) ≤ R]
K
≥ 1 − (1 − P1 )L
K
Last, if we set L = 1/P1 and K = log N /log(1/P2 ), we will have that L = N ρ where ρ =
1
log P1 / log P2 . From this we have that the probability of success is ≥ 1 − (1 − L )L which is
approximately 1 − 1 . Hence, if we let δ = 1 , we have fulfilled the requirement for the (c, R)-Near
e
e
Neighbor Problem.

5

Contenu connexe

Tendances

Bidimensionality
BidimensionalityBidimensionality
BidimensionalityASPAK2014
 
The Exponential Time Hypothesis
The Exponential Time HypothesisThe Exponential Time Hypothesis
The Exponential Time HypothesisASPAK2014
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Traian Rebedea
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 
Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineAlina Barbulescu
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Shiang-Yun Yang
 
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra Sahil Kumar
 
Kernel Lower Bounds
Kernel Lower BoundsKernel Lower Bounds
Kernel Lower BoundsASPAK2014
 
Brief summary of signals
Brief summary of signalsBrief summary of signals
Brief summary of signalsaroosa khan
 
Kolmogorov complexity and topological arguments
Kolmogorov complexity and topological argumentsKolmogorov complexity and topological arguments
Kolmogorov complexity and topological argumentsshenmontpellier
 
Johnson's algorithm
Johnson's algorithmJohnson's algorithm
Johnson's algorithmKiran K
 
Kernelization Basics
Kernelization BasicsKernelization Basics
Kernelization BasicsASPAK2014
 
Minimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiMinimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiIbrahim Alfayoumi
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treeoneous
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
Treewidth and Applications
Treewidth and ApplicationsTreewidth and Applications
Treewidth and ApplicationsASPAK2014
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Traian Rebedea
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsShrey Verma
 

Tendances (20)

Bidimensionality
BidimensionalityBidimensionality
Bidimensionality
 
The Exponential Time Hypothesis
The Exponential Time HypothesisThe Exponential Time Hypothesis
The Exponential Time Hypothesis
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 
Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametricline
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)
 
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
 
Kernel Lower Bounds
Kernel Lower BoundsKernel Lower Bounds
Kernel Lower Bounds
 
Brief summary of signals
Brief summary of signalsBrief summary of signals
Brief summary of signals
 
Kolmogorov complexity and topological arguments
Kolmogorov complexity and topological argumentsKolmogorov complexity and topological arguments
Kolmogorov complexity and topological arguments
 
Johnson's algorithm
Johnson's algorithmJohnson's algorithm
Johnson's algorithm
 
Kernelization Basics
Kernelization BasicsKernelization Basics
Kernelization Basics
 
Minimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiMinimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumi
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Treewidth and Applications
Treewidth and ApplicationsTreewidth and Applications
Treewidth and Applications
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher Dimensions
 

En vedette

Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Laura Po
 
Movie topics- Efficient features for movie recommendation systems
Movie topics- Efficient features for movie recommendation systemsMovie topics- Efficient features for movie recommendation systems
Movie topics- Efficient features for movie recommendation systemssuvir bhargav
 
A Non-Intrusive Movie Recommendation System
A Non-Intrusive Movie Recommendation SystemA Non-Intrusive Movie Recommendation System
A Non-Intrusive Movie Recommendation SystemLaura Po
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jMax De Marzi
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics supraja reddy
 
A content based movie recommender system for mobile application
A content based movie recommender system for mobile applicationA content based movie recommender system for mobile application
A content based movie recommender system for mobile applicationArafat X
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendationsproksik
 
Graph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayGraph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayDataStax Academy
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

En vedette (16)

Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014
 
Movie topics- Efficient features for movie recommendation systems
Movie topics- Efficient features for movie recommendation systemsMovie topics- Efficient features for movie recommendation systems
Movie topics- Efficient features for movie recommendation systems
 
A Non-Intrusive Movie Recommendation System
A Non-Intrusive Movie Recommendation SystemA Non-Intrusive Movie Recommendation System
A Non-Intrusive Movie Recommendation System
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4j
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics
 
A content based movie recommender system for mobile application
A content based movie recommender system for mobile applicationA content based movie recommender system for mobile application
A content based movie recommender system for mobile application
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
Graph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayGraph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBay
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similaire à Scribed lec8

Average Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete ProblemsAverage Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete ProblemsAudrey Britton
 
lecture 17
lecture 17lecture 17
lecture 17sajinsc
 
IntroductionTopological sorting is a common operation performed .docx
IntroductionTopological sorting is a common operation performed .docxIntroductionTopological sorting is a common operation performed .docx
IntroductionTopological sorting is a common operation performed .docxmariuse18nolet
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpProgramming Exam Help
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Deepak John
 
Class1
 Class1 Class1
Class1issbp
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpProgramming Homework Help
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 

Similaire à Scribed lec8 (20)

Design & Analysis of Algorithms Assignment Help
Design & Analysis of Algorithms Assignment HelpDesign & Analysis of Algorithms Assignment Help
Design & Analysis of Algorithms Assignment Help
 
Network Design Assignment Help
Network Design Assignment HelpNetwork Design Assignment Help
Network Design Assignment Help
 
Algorithm Exam Help
Algorithm Exam HelpAlgorithm Exam Help
Algorithm Exam Help
 
Average Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete ProblemsAverage Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete Problems
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 
lecture 17
lecture 17lecture 17
lecture 17
 
Planted Clique Research Paper
Planted Clique Research PaperPlanted Clique Research Paper
Planted Clique Research Paper
 
IntroductionTopological sorting is a common operation performed .docx
IntroductionTopological sorting is a common operation performed .docxIntroductionTopological sorting is a common operation performed .docx
IntroductionTopological sorting is a common operation performed .docx
 
Lecture5
Lecture5Lecture5
Lecture5
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam Help
 
Unit 3 daa
Unit 3 daaUnit 3 daa
Unit 3 daa
 
Big o
Big oBig o
Big o
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
 
Assignment 2 solution acs
Assignment 2 solution acsAssignment 2 solution acs
Assignment 2 solution acs
 
algorithm Unit 3
algorithm Unit 3algorithm Unit 3
algorithm Unit 3
 
Class1
 Class1 Class1
Class1
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
 
Q
QQ
Q
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 

Plus de Praveen Kumar

Plus de Praveen Kumar (20)

Summer2014 internship
Summer2014 internshipSummer2014 internship
Summer2014 internship
 
Summer+training 2
Summer+training 2Summer+training 2
Summer+training 2
 
Summer+training
Summer+trainingSummer+training
Summer+training
 
Solutions1.1
Solutions1.1Solutions1.1
Solutions1.1
 
Slides15
Slides15Slides15
Slides15
 
Scholarship sc st
Scholarship sc stScholarship sc st
Scholarship sc st
 
Networks 2
Networks 2Networks 2
Networks 2
 
Mithfh lecturenotes 9
Mithfh lecturenotes 9Mithfh lecturenotes 9
Mithfh lecturenotes 9
 
Mcs student
Mcs studentMcs student
Mcs student
 
Math350 hw2solutions
Math350 hw2solutionsMath350 hw2solutions
Math350 hw2solutions
 
Matching
MatchingMatching
Matching
 
Line circle draw
Line circle drawLine circle draw
Line circle draw
 
Lecture3
Lecture3Lecture3
Lecture3
 
Lec2 state space
Lec2 state spaceLec2 state space
Lec2 state space
 
Graphtheory
GraphtheoryGraphtheory
Graphtheory
 
Graphics display-devicesmod-1
Graphics display-devicesmod-1Graphics display-devicesmod-1
Graphics display-devicesmod-1
 
Games.4
Games.4Games.4
Games.4
 
Dda line-algorithm
Dda line-algorithmDda line-algorithm
Dda line-algorithm
 
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
 
Cse3461.c.signal encoding.09 04-2012
Cse3461.c.signal encoding.09 04-2012Cse3461.c.signal encoding.09 04-2012
Cse3461.c.signal encoding.09 04-2012
 

Dernier

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Scribed lec8

  • 1. MS&E 317/CS 263: Algorithms for Modern Data Models, Aut 2011-12 Course URL: http://www.stanford.edu/~ashishg/amdm. Instructor: Ashish Goel, Stanford University. Lecture 8, 10/19/2011. Scribed by Pong Eksombatchai. In the first part of this lecture, we will continue to improve on our Naive Algorithm to keep track of the connected components of an undirected graph and modify it into the Union-Find Algorithm. Then in the second part, we will introduce Local Sensitive Hashing. Finding Connected Components of an Undirected Graph We are given an initial set of the nodes V of an undirected graph G and edges (ut , vt ) appearing at time t. We have to support the ability to find the number of connected components in G and check if nodes u and v are in the same connected component at any given time. The basic idea is that we have multiple arrays; COLOR is to keep track the connected component each node belongs to and the others to keep track of the nodes in each component. We will suppose that V = {1, 2, ..., N }. Before any edges arrive, we have N non-empty arrays corresponding to the N components consisting of individual vertices. After each edge arrival, we update the arrays so that there are exactly as many non-empty arrays as the number of connected components. This gives the following simple algorithm. Naive Algorithm State Store an array COLOR of size N and arrays A1 , A2 , ..., AN , where Ai = {u | COLOR[u] = i} Initialization Set COLOR[u] = u and Au = {u} for all u ∈ V . Update(t, ut , vt ) If COLOR[ut ] = COLOR[vt ] then Set i = COLOR[ut ] and j = COLOR[vt ] If |Aj | > |Ai | then Swap(i, j) Concatenate Ai with Aj Set COLOR[v] = i for all v ∈ Aj Set Aj = {} Query: Number of Connected Components Count the number of Ai for all integers i in the range [1, N ] such that Ai is not empty. 1
  • 2. Query: Is Connected(u, v) Return true if COLOR[u] is equal to COLOR[v], otherwise return false. The state space size is O(N ) because we have the array COLOR and the total size of all the Ai is N at any given time. Suppose that M in the total number of edges in graph G when all of the edges are given, we have that the total running time of this algorithm is O(M + N log N ). The O(N log N ) comes from the part when we need to update COLOR to the correct values after we concatnate Ai with Aj . It may seem at first that this process takes O(N 2 ) in total, but we have that each COLOR[u] is updated at most O(log N ) times, because the connected component that u is in at least doubles in size everytime it is merged into another component; making the total running time of merging O(N log N ). Note that when we are dealing with very large graphs such as a real world social network graph, we cannot keep the whole COLOR array in memory and need to store portions of it across different machines. After we have all the edges of G, we know that we have to look up COLOR[u] at most O(deg(u) + log N ) total number of times, because we need to look at COLOR[u] at least once for every edge (u, v) in the update step and the number of times COLOR[u] can be updated is O(log N ). This may seem that the load for each machine is quite balanced, but in real world networks, we often have nodes with very high degree. We will end up hitting machines that contain these high degree nodes a lot more often than others, which is not the behavior we want. Union-Find Algorithm Although the Naive Algorithm works quite well, we will now develop it into a simple Union-Find Algorithm. The idea behind it is to represent each set by a tree, where each node keeps a reference to its parent and the representative of the set is the root of the tree. The simple algorithm, which excludes the Path Compression improvement, works as follow: State Store an array parent and an array size of length N , where parent[u] contains the parent of u in the tree and size[u] is the size of the set in which u is the representative of. Note that size[u] is undefined if u is not a representative of a set. For initialization, we will set parent[u] = u and size[u] = 1 for all u ∈ V . Union(t, ut , vt ) Set r1 = Find(ut ) and r2 = Find(vt ) If r1 = r2 then If size[r1 ] < size[r2 ] then Swap(r1 ,r2 ) parent[r2 ] = r1 size[r1 ] = size[r1 ] + size[r2 ] 2
  • 3. Find(u) while parent[u] = u u = parent[u] return u Query: Number of Connected Components Count the number of u for all u ∈ V such that parent[u] = u. Query: Is Connected(u, v) Call Find(u) and Find(v) and if the return values are the same return true, otherwise return false. The state space size is O(N ) because we have two arrays of length N ; parent and size. We also have that the running time for each update step is O(log N ) because most of our work comes from the call to Find which has a runtime of O(log N ). For each u, the update to the connected component U that node u is in, makes the path length from u to the root either the same when a smaller connected component is merged into U or increase by one when U is merged into a connected component at least the size of U . In the worst case we have that path length from u to the root increases by one in each update, but this can only happen O(log N ) times because U at least doubles in size each time this happens. We have that the path length from any node to the root is O(log N ) and thus the runtime of each Find is O(log N ). In a distributed network, most of our load will come from the looking up parent[u] in Find. There is still a problem with this because we need to look up the nodes which are representatives of the set a lot more often than others. Fortunately, there are techniques that can solve this problem which will be discussed later in class. Locality Sensitive Hashing Given a set of points V and a distance metric d, we are often interested in finding out if there are any points x ∈ V such that x is close to the query point q. This problem can be solved quite well in low dimension spaces by using space partitioning techniques, but in high dimension spaces it is impossible to do that. However, fortunately, we will be able to use LSH to solve some types of these problems, one of which is the (c, R)-Near Neighbor problem. The problem is defined as follow: Input The input to this problem which is given up front is a set of points V of size N and a distance metric d. Query(q) For each query point q, if there exists some x ∈ V such that d(x, q) ≤ R, then the algorithm must output x ∈ V such that d(x , q) ≤ cR with probability at least 1 − δ. 3
  • 4. After defining the problem that we are trying to solve, we will define the tool that will be used to solve this problem. Locality Sensitive Hashing Family: A family of hash functions H that is said to be (c, R, P1 , P2 )sensitive for a distance metric d, when: (1) P rh∼H [h(x) = h(y)] ≥ P1 for all x and y such that d(x, y) ≤ R (2) P rh∼H [h(x) = h(y)] ≤ P2 for all x and y such that d(x, y) > cR Example: Let V ⊆ {0, 1}J and d(x, y) = Hamming distance between x and y. If R << J and cR << J, one example LSH family is H = {hi } for all integers i from the range [1, J], where hi (x) = xi . Therefore, the family H is (c, R, P1 , P2 )-sensitive for the distance metric d, when P1 = 1 − R and J P2 = 1 − cR . J After defining LSH families, the algorithm that we will use to solve the (c, R)-Near Neighbor problem will be as follow: Locality-Sensitive Hashing for solving (c, R)-Near Neighbor Suppose that we have a LSH family H that is (c, R, P1 , P2 )-sensitive for the distance metric d. Let hi,j be hash functions such that hi,j ∼ H for all integers i and j such that 1 ≤ i ≤ K and 1 ≤ j ≤ L. We will define gj (x) to be < h1,j (x), h2,j (x), ..., hK,j (x) > for all intergers j such that 1 ≤ j ≤ L. Preprocessing For all x ∈ V and for each integer j in the range [1, L], we add x to the bucketj (gj (x)). Query(q) for j = 1 to L for all x ∈ bucketj (gj (q)) if d(x, q) ≤ cR then return x return none It is easy to see that the total runtime to preprocess all of the points is O(N KL) and the space required is O(N L). The runtime for each query is O(KL + N LF ), where F is the probability that a point x is hashed using gj into a bucketj that q is also hashed into, but d(x, q) > cR. To find F , from the definition of LSH family, we notice that: P r[gj (x) = gj (q) | d(x, q) > cR] = P r[∩i=1,...,K hij (x) = hij (q) | d(x, q) > cR] = Πi=1,...,K P r[hij (x) = hij (q) | d(x, q) > cR] K ≤ P2 K Hence, we have that the running time is O(KL + N LP2 ). Next, we try to analyze the success 4
  • 5. probability of the algorithm. We can find the lower bound of it, which is the probability that a point x such that d(x, q) ≤ R is hashed using any of the gj into a bucketj that q is also hashed into. First, we calculate the probability P r[gj (x) = gj (q) | d(x, q) ≤ R]. P r[gj (x) = gj (q) | d(x, q) ≤ R] = P r[∩i=1,...,K hij (x) = hij (q) | d(x, q) ≤ R] = Πi=1,...,K P r[hij (x) = hij (q) | d(x, q) ≤ R] K ≥ P1 Next, we calculate the probability of success, which is P r[∪j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R]. P r[∪j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R] = 1 − P r[∩j=1,..,L gj (x) = gj (q) | d(x, q) ≤ R] = 1 − Πj=1,...,L P r[gj (x) = gj (q) | d(x, q) ≤ R] K ≥ 1 − (1 − P1 )L K Last, if we set L = 1/P1 and K = log N /log(1/P2 ), we will have that L = N ρ where ρ = 1 log P1 / log P2 . From this we have that the probability of success is ≥ 1 − (1 − L )L which is approximately 1 − 1 . Hence, if we let δ = 1 , we have fulfilled the requirement for the (c, R)-Near e e Neighbor Problem. 5