SlideShare une entreprise Scribd logo
1  sur  31
CS 6213 –
Advanced
Data
Structures
DISJOINT SET DATA
STRUCTURES
 Instructor
 Prof. Amrinder Arora
 amrinder@gwu.edu
 Please copy TA on emails
 Please feel free to call as well
 TA
 Iswarya Parupudi
 iswarya2291@gwmail.gwu.edu
LOGISTICS
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 2
CS 6213
Basics
Record / Struct
Arrays / Linked
Lists / Stacks /
Queues
Graphs / Trees
/ BSTs
Advanced
Trie, B-Tree
Splay Trees
R-Trees
Heaps and PQs
Union Find,
Sets
WHERE WE ARE
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 3
 Robert Tarjan – Princeton University
 Mikko Malinen – University of Eastern Finland
 Pasi Fränti – University of Eastern Finland
 Henry Kautz – University of Washington
 Michael Mitzenmacher – Harvard University
 Eli Upfal – Brown University
CREDITS
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 4
 Kruskals Algorithm for Minimum Spanning Tree
 Identifying islands in Social Networks
 Maze Design
 Logical/Physical Network Design
 Identifying equivalence classes
APPLICATIONS
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 5
1.Connected
2.Just one path between any two rooms
3.Random
WHAT’S A GOOD MAZE?
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 6
THE MAZE CONSTRUCTION PROBLEM
 Given:
 collection of rooms: V
 connections between rooms (initially all closed): E
 Construct a maze:
 collection of rooms: V = V
 designated rooms in, iV, and out, oV
 collection of connections to knock down: E  E
such that one unique path connects every two rooms
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 7
 While edges remain in E
 Remove a random edge e = (u, v) from E
 How can we do this efficiently?
 If u and v have not yet been connected
 add e to E
 mark u and v as connected
 How to check connectedness efficiently?
MAZE CONSTRUCTION ALGORITHM
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 8
 Operations to support
 Make Set(x): Make a new set with a single element x
 Union (S1, S2): Merge the sets S1 and S2
 Find(x): Find the set containing the element x
DISJOINT SET PROBLEM
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 9
 If using linked lists
 Find can take O(n) time
 Union can be done in O(1) time
 Makeset can be done in O(1) time
 If using hash function (hash table)
 Find can be done in O(1) time
 Union takes O(n) time
 Makeset can be done in O(1) time
 Any other “trivial” ideas?
WHY NOT USE LISTS OR HASH TABLES?
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 10
UP-TREE UNION-FIND
DATA STRUCTURE
 Each subset is an up-tree
with its root as its
representative member
 All members of a given
set are nodes in that set’s
up-tree
a c g h
d b
e
Up-trees are not necessarily binary!
f i
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 11
FIND
a c g h
d b
e
f i
find(f)
find(e)
Just traverse to the root!
Time taken is O(height)
runtime:Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 12
UNION
a c g h
d b
e
f i
union(a,c)
Just hang one root from the other!
runtime:Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 13
f
g ha
b
c
id
e
0 -1 0 1 2 -1 -1 7-1
0 (a) 1 (b) 2 (c) 3 (d) 4 (e) 5 (f) 6 (g) 7 (h) 8 (i)
 A forest of up-trees can easily be stored in an array.
NIFTY STORAGE TRICK
up-index:
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 14
WEIGHTED UNION
 Always makes the root of the larger tree the new
root
 Often cuts down on height of the new up-tree
f
g ha
b
c
id
e
f
g h
a
b
c
i
d
eCould we do a
better job on this union?
Weighted union!
f
g ha
b c id
e
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 15
WEIGHTED UNION FIND ANALYSIS
 Finds with weighted union are O(max up-tree
height)
 An up-tree of height h with weighted union must
have at least 2h nodes (why)
  2max height  n and
max height  log n
 So, find takes O(log n)
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 16
 Base case: h = 0, tree has 20 = 1 node
 Induction hypothesis: assume true for h < h
 and consider the sequence of unions.
 Case 1: Union does not increase max height.
Resulting tree still has  2h nodes.
 Case 2: Union has height h’= 1+h, where h = height
of each of the input trees. By induction hypothesis
each tree has  2h-1 nodes, so the merged tree has
at least 2h nodes. QED.
WEIGHTED UNION FIND ANALYSIS (CONT.)
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 17
ALTERNATIVES TO WEIGHTED UNION
 Union by height: Just use the height of the tree
 Union by rank: Same thing as the height (except
when it is not)
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 18
ROOM FOR IMPROVEMENT:
PATH COMPRESSION
f g ha
b
c i
d
e
While we’re finding e,
could we do anything else?
 Points everything along the path of a find to the root
 Reduces the height of the entire access path to 1
f g ha
b
c i
d
e
Path compression!
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 19
PATH COMPRESSION EXAMPLE
f ha
b
c
d
e
g
find(e)
i
f ha
c
d
e
g
b
i
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 20
 Let log(k) n = log (log (log … (log n)))
 Then, let log* n = minimum k such that log(k) n  1
 How fast does log* n grow?
 log* (2) = 1
 log* (4) = 2
 log* (16) = 3
 log* (65536) = 4
 log* (265536) = 5 (a 20,000 digit number!)
 log* (2265536) = 6
DIGRESSION: INVERSE ACKERMANN’S
k logs
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 21
 Tarjan (1984) proved that m weighted union and find
operations with path compression on a set of n
elements have worst case complexity
O(m log*(n))
 Later results showed that time complexity is actually
m alpha(m,n) where alpha function is the inverse
Ackermann’s function.
 For all practical purposes this is amortized constant
time
COMPLEX COMPLEXITY OF
WEIGHTED UNION + PATH COMPRESSION
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 22
 How can we handle the set membership questions if
we are in the context of big data?
 Can we hold 10 billion sets and items in main
memory?
 Is there an alternative to doing a file/database
search?
HOW DO WE HANDLE “BIG DATA”?
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 23
 Does x belong to a set S?
 Bloom Filter suggests:
 Does x belong to a set S?
 Yes (Possibly. With probability p, it still may not be there)
 No
 Assume inputs as probability p, bloom filter execution time as
x, and database search time as y
 Derive the scenario (using variables p, x and y) where using a
Bloom Filter makes sense
SLIGHTLY DIFFERENT CONSTRUCT
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 24
 Suppose we have a set
 S = {s1,s2,...,sm}  universe U
 Represent S in such a way we can quickly answer “Is
x an element of S ?”
 To take as little space as possible, we allow false
positive (i.e. xS , but we answer yes )
 If xS , we must answer yes. (That is, there are no
false negatives)
APPROXIMATE SET MEMBERSHIP
PROBLEM
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 25
BLOOM FILTERS
 Consist of an arrays A[n] of n bits (space) , and k
independent random hash functions
h1,…,hk : U --> {0,1,..,n-1}
1. Initially set the array to 0
2.  sS, A[hi(s)] = 1 for 1  i  k
(an entry can be set to 1 multiple times, only the
first times has an effect )
3. To check if xS, we check whether all location
A[hi(x)] for 1  i  k are set to 1
If not, clearly xS.
If all A[hi(x)] are set to 1, we report xS
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 26
0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1
x1 x2
Each element of S is hashed k times
Each hash location set to 1
1 1 1 1 1
y
To check if y is in S, check the k hash
location. If a 0 appears, y is not in S
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 27
0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1
y
If only 1s appear, report that y is in S
This may yield false positive
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 28
 We assume the hash function are random.
 After all m elements of S are hashed into the bloom
filter array of n bits using all k hash functions, the
probability that a specific bit is still 0 is
 Probability of a false positive is given by
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 29
BLOOM FILTERS: PROBABILITY OF A
FALSE POSITIVE
/1
(1 )km km n
p e
n

  
/
(1 ) (1 )k km n k
f p e
   
 Using the desired bound on the probability of a false
positive and the expected number of items (m), you
can design a bloom filter by choosing values for n
and k
 For example, if m = 1000,000,000 and n =
10,000,000,000 and k = 10, this becomes: 0.01
 Uses 10 B bits (approx 1.2 GB of RAM)
DESIGNING A BLOOM FILTER
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 30
CS 6213
Basics
Record / Struct
Arrays / Linked
Lists / Stacks /
Queues
Graphs / Trees
/ BSTs
Advanced
Trie, B-Tree
Splay Trees
R-Trees
Heaps and PQs
Union Find,
Sets
WHERE WE ARE (PHEW, AT THE END,
FINALLY)
Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 31

Contenu connexe

Tendances

4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
17. Trees and Graphs
17. Trees and Graphs17. Trees and Graphs
17. Trees and GraphsIntro C# Book
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil duttAnil Dutt
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#Shahzad
 
Trees (data structure)
Trees (data structure)Trees (data structure)
Trees (data structure)Trupti Agrawal
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureRai University
 
1.1 binary tree
1.1 binary tree1.1 binary tree
1.1 binary treeKrish_ver2
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsYu Liu
 
Data structure and algorithm All in One
Data structure and algorithm All in OneData structure and algorithm All in One
Data structure and algorithm All in Onejehan1987
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1Aahwini Esware gowda
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce worldYu Liu
 
Preparation Data Structures 06 arrays representation
Preparation Data Structures 06 arrays representationPreparation Data Structures 06 arrays representation
Preparation Data Structures 06 arrays representationAndres Mendez-Vazquez
 

Tendances (20)

4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Data Structure Basics
Data Structure BasicsData Structure Basics
Data Structure Basics
 
Chapter 9 ds
Chapter 9 dsChapter 9 ds
Chapter 9 ds
 
17. Trees and Graphs
17. Trees and Graphs17. Trees and Graphs
17. Trees and Graphs
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil dutt
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
 
Trees (data structure)
Trees (data structure)Trees (data structure)
Trees (data structure)
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
1.1 binary tree
1.1 binary tree1.1 binary tree
1.1 binary tree
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
Data structure and algorithm All in One
Data structure and algorithm All in OneData structure and algorithm All in One
Data structure and algorithm All in One
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Data structure
Data structureData structure
Data structure
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
Preparation Data Structures 06 arrays representation
Preparation Data Structures 06 arrays representationPreparation Data Structures 06 arrays representation
Preparation Data Structures 06 arrays representation
 
Chapter 5 ds
Chapter 5 dsChapter 5 ds
Chapter 5 ds
 

Similaire à Set Operations - Union Find and Bloom Filters

Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsAmrinder Arora
 
Path compression
Path compressionPath compression
Path compressionDEEPIKA T
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsAmrinder Arora
 
III_Data Structure_Module_1.pptx
III_Data Structure_Module_1.pptxIII_Data Structure_Module_1.pptx
III_Data Structure_Module_1.pptxshashankbhadouria4
 
III_Data Structure_Module_1.ppt
III_Data Structure_Module_1.pptIII_Data Structure_Module_1.ppt
III_Data Structure_Module_1.pptshashankbhadouria4
 
multipleSeqAlignment.ppta4432455344534534
multipleSeqAlignment.ppta4432455344534534multipleSeqAlignment.ppta4432455344534534
multipleSeqAlignment.ppta4432455344534534alizain9604
 
17. Trees and Tree Like Structures
17. Trees and Tree Like Structures17. Trees and Tree Like Structures
17. Trees and Tree Like StructuresIntro C# Book
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopLucasMakinen1
 
Suppression of grating lobes
Suppression of grating lobesSuppression of grating lobes
Suppression of grating lobeskavindrakrishna
 
DSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsDSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsRESHAN FARAZ
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentSanaym
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
03-data-structures.pdf
03-data-structures.pdf03-data-structures.pdf
03-data-structures.pdfNash229987
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Ameer B. Alaasam
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 

Similaire à Set Operations - Union Find and Bloom Filters (20)

Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their Representations
 
Path compression
Path compressionPath compression
Path compression
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
III_Data Structure_Module_1.pptx
III_Data Structure_Module_1.pptxIII_Data Structure_Module_1.pptx
III_Data Structure_Module_1.pptx
 
III_Data Structure_Module_1.ppt
III_Data Structure_Module_1.pptIII_Data Structure_Module_1.ppt
III_Data Structure_Module_1.ppt
 
multipleSeqAlignment.ppta4432455344534534
multipleSeqAlignment.ppta4432455344534534multipleSeqAlignment.ppta4432455344534534
multipleSeqAlignment.ppta4432455344534534
 
17. Trees and Tree Like Structures
17. Trees and Tree Like Structures17. Trees and Tree Like Structures
17. Trees and Tree Like Structures
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI Workshop
 
Suppression of grating lobes
Suppression of grating lobesSuppression of grating lobes
Suppression of grating lobes
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
DSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsDSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) Questions
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
M tree
M treeM tree
M tree
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
03-data-structures.pdf
03-data-structures.pdf03-data-structures.pdf
03-data-structures.pdf
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016)
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 

Plus de Amrinder Arora

Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Amrinder Arora
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchAmrinder Arora
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalAmrinder Arora
 
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Amrinder Arora
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
 
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Amrinder Arora
 
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Amrinder Arora
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine LearningAmrinder Arora
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
 
Dynamic Programming - Part II
Dynamic Programming - Part IIDynamic Programming - Part II
Dynamic Programming - Part IIAmrinder Arora
 
Dynamic Programming - Part 1
Dynamic Programming - Part 1Dynamic Programming - Part 1
Dynamic Programming - Part 1Amrinder Arora
 
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsDivide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsAmrinder Arora
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
 
Binary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackBinary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackAmrinder Arora
 

Plus de Amrinder Arora (20)

Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
 
NP-Completeness - II
NP-Completeness - IINP-Completeness - II
NP-Completeness - II
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First Search
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search Traversal
 
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
 
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
 
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
NP completeness
NP completenessNP completeness
NP completeness
 
Algorithmic Puzzles
Algorithmic PuzzlesAlgorithmic Puzzles
Algorithmic Puzzles
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
 
Dynamic Programming - Part II
Dynamic Programming - Part IIDynamic Programming - Part II
Dynamic Programming - Part II
 
Dynamic Programming - Part 1
Dynamic Programming - Part 1Dynamic Programming - Part 1
Dynamic Programming - Part 1
 
Greedy Algorithms
Greedy AlgorithmsGreedy Algorithms
Greedy Algorithms
 
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsDivide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic Notation
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTs
 
Binary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackBinary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red Black
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Dernier (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Set Operations - Union Find and Bloom Filters

  • 2.  Instructor  Prof. Amrinder Arora  amrinder@gwu.edu  Please copy TA on emails  Please feel free to call as well  TA  Iswarya Parupudi  iswarya2291@gwmail.gwu.edu LOGISTICS Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 2
  • 3. CS 6213 Basics Record / Struct Arrays / Linked Lists / Stacks / Queues Graphs / Trees / BSTs Advanced Trie, B-Tree Splay Trees R-Trees Heaps and PQs Union Find, Sets WHERE WE ARE Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 3
  • 4.  Robert Tarjan – Princeton University  Mikko Malinen – University of Eastern Finland  Pasi Fränti – University of Eastern Finland  Henry Kautz – University of Washington  Michael Mitzenmacher – Harvard University  Eli Upfal – Brown University CREDITS Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 4
  • 5.  Kruskals Algorithm for Minimum Spanning Tree  Identifying islands in Social Networks  Maze Design  Logical/Physical Network Design  Identifying equivalence classes APPLICATIONS Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 5
  • 6. 1.Connected 2.Just one path between any two rooms 3.Random WHAT’S A GOOD MAZE? Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 6
  • 7. THE MAZE CONSTRUCTION PROBLEM  Given:  collection of rooms: V  connections between rooms (initially all closed): E  Construct a maze:  collection of rooms: V = V  designated rooms in, iV, and out, oV  collection of connections to knock down: E  E such that one unique path connects every two rooms Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 7
  • 8.  While edges remain in E  Remove a random edge e = (u, v) from E  How can we do this efficiently?  If u and v have not yet been connected  add e to E  mark u and v as connected  How to check connectedness efficiently? MAZE CONSTRUCTION ALGORITHM Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 8
  • 9.  Operations to support  Make Set(x): Make a new set with a single element x  Union (S1, S2): Merge the sets S1 and S2  Find(x): Find the set containing the element x DISJOINT SET PROBLEM Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 9
  • 10.  If using linked lists  Find can take O(n) time  Union can be done in O(1) time  Makeset can be done in O(1) time  If using hash function (hash table)  Find can be done in O(1) time  Union takes O(n) time  Makeset can be done in O(1) time  Any other “trivial” ideas? WHY NOT USE LISTS OR HASH TABLES? Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 10
  • 11. UP-TREE UNION-FIND DATA STRUCTURE  Each subset is an up-tree with its root as its representative member  All members of a given set are nodes in that set’s up-tree a c g h d b e Up-trees are not necessarily binary! f i Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 11
  • 12. FIND a c g h d b e f i find(f) find(e) Just traverse to the root! Time taken is O(height) runtime:Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 12
  • 13. UNION a c g h d b e f i union(a,c) Just hang one root from the other! runtime:Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 13
  • 14. f g ha b c id e 0 -1 0 1 2 -1 -1 7-1 0 (a) 1 (b) 2 (c) 3 (d) 4 (e) 5 (f) 6 (g) 7 (h) 8 (i)  A forest of up-trees can easily be stored in an array. NIFTY STORAGE TRICK up-index: Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 14
  • 15. WEIGHTED UNION  Always makes the root of the larger tree the new root  Often cuts down on height of the new up-tree f g ha b c id e f g h a b c i d eCould we do a better job on this union? Weighted union! f g ha b c id e Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 15
  • 16. WEIGHTED UNION FIND ANALYSIS  Finds with weighted union are O(max up-tree height)  An up-tree of height h with weighted union must have at least 2h nodes (why)   2max height  n and max height  log n  So, find takes O(log n) Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 16
  • 17.  Base case: h = 0, tree has 20 = 1 node  Induction hypothesis: assume true for h < h  and consider the sequence of unions.  Case 1: Union does not increase max height. Resulting tree still has  2h nodes.  Case 2: Union has height h’= 1+h, where h = height of each of the input trees. By induction hypothesis each tree has  2h-1 nodes, so the merged tree has at least 2h nodes. QED. WEIGHTED UNION FIND ANALYSIS (CONT.) Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 17
  • 18. ALTERNATIVES TO WEIGHTED UNION  Union by height: Just use the height of the tree  Union by rank: Same thing as the height (except when it is not) Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 18
  • 19. ROOM FOR IMPROVEMENT: PATH COMPRESSION f g ha b c i d e While we’re finding e, could we do anything else?  Points everything along the path of a find to the root  Reduces the height of the entire access path to 1 f g ha b c i d e Path compression! Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 19
  • 20. PATH COMPRESSION EXAMPLE f ha b c d e g find(e) i f ha c d e g b i Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 20
  • 21.  Let log(k) n = log (log (log … (log n)))  Then, let log* n = minimum k such that log(k) n  1  How fast does log* n grow?  log* (2) = 1  log* (4) = 2  log* (16) = 3  log* (65536) = 4  log* (265536) = 5 (a 20,000 digit number!)  log* (2265536) = 6 DIGRESSION: INVERSE ACKERMANN’S k logs Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 21
  • 22.  Tarjan (1984) proved that m weighted union and find operations with path compression on a set of n elements have worst case complexity O(m log*(n))  Later results showed that time complexity is actually m alpha(m,n) where alpha function is the inverse Ackermann’s function.  For all practical purposes this is amortized constant time COMPLEX COMPLEXITY OF WEIGHTED UNION + PATH COMPRESSION Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 22
  • 23.  How can we handle the set membership questions if we are in the context of big data?  Can we hold 10 billion sets and items in main memory?  Is there an alternative to doing a file/database search? HOW DO WE HANDLE “BIG DATA”? Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 23
  • 24.  Does x belong to a set S?  Bloom Filter suggests:  Does x belong to a set S?  Yes (Possibly. With probability p, it still may not be there)  No  Assume inputs as probability p, bloom filter execution time as x, and database search time as y  Derive the scenario (using variables p, x and y) where using a Bloom Filter makes sense SLIGHTLY DIFFERENT CONSTRUCT Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 24
  • 25.  Suppose we have a set  S = {s1,s2,...,sm}  universe U  Represent S in such a way we can quickly answer “Is x an element of S ?”  To take as little space as possible, we allow false positive (i.e. xS , but we answer yes )  If xS , we must answer yes. (That is, there are no false negatives) APPROXIMATE SET MEMBERSHIP PROBLEM Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 25
  • 26. BLOOM FILTERS  Consist of an arrays A[n] of n bits (space) , and k independent random hash functions h1,…,hk : U --> {0,1,..,n-1} 1. Initially set the array to 0 2.  sS, A[hi(s)] = 1 for 1  i  k (an entry can be set to 1 multiple times, only the first times has an effect ) 3. To check if xS, we check whether all location A[hi(x)] for 1  i  k are set to 1 If not, clearly xS. If all A[hi(x)] are set to 1, we report xS Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 26
  • 27. 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 x1 x2 Each element of S is hashed k times Each hash location set to 1 1 1 1 1 1 y To check if y is in S, check the k hash location. If a 0 appears, y is not in S Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 27
  • 28. 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 y If only 1s appear, report that y is in S This may yield false positive Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 28
  • 29.  We assume the hash function are random.  After all m elements of S are hashed into the bloom filter array of n bits using all k hash functions, the probability that a specific bit is still 0 is  Probability of a false positive is given by Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 29 BLOOM FILTERS: PROBABILITY OF A FALSE POSITIVE /1 (1 )km km n p e n     / (1 ) (1 )k km n k f p e    
  • 30.  Using the desired bound on the probability of a false positive and the expected number of items (m), you can design a bloom filter by choosing values for n and k  For example, if m = 1000,000,000 and n = 10,000,000,000 and k = 10, this becomes: 0.01  Uses 10 B bits (approx 1.2 GB of RAM) DESIGNING A BLOOM FILTER Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 30
  • 31. CS 6213 Basics Record / Struct Arrays / Linked Lists / Stacks / Queues Graphs / Trees / BSTs Advanced Trie, B-Tree Splay Trees R-Trees Heaps and PQs Union Find, Sets WHERE WE ARE (PHEW, AT THE END, FINALLY) Set Data Structures CS213 - Advanced Data Structures - Arora - GWU 31