VictoriaMetrics Q1 Meet Up '24 - Community & News Update
Time complexity of union find
1. Dec. 09, 2015
Wei Li
Zehao Cai
Ishan Sharma
Time Complexity of Union Find
1Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
2. Algorithm Definition
Disjoint-set data structure is a data structure that keeps track of a
set of elements partitioned into a number of disjoint (non-overlapping)
subsets.
Union find algorithm
supports three operations on a set of elements:
• MAKE-SET(x). Create a new set containing only element x.
• FIND(x). Return a canonical element in the set containing x.
• UNION(x, y). Merge the sets containing x and y.
Implementation: Linked-list, Tree(Often)
2Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
3. Find(b) = c | Find(d) = f | Find(b) = f
b → h → c | d → f | b → h → c → f
Quick-find & Quick-union
3Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
4. Definition: The rank of a node x is similar to the height of x.
When performing the operation Union(x, y), we compare rank(x) and
rank(y):
• If rank(x) < rank(y), make y the parent of x.
• If rank(x) > rank(y), make x the parent of y.
• If rank(x) = rank(y), make y the parent of x and increase the rank of
y by one.
First Optimization: Union By Rank Heuristic
4Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Note. In this case, rank = height.
5. During the execution of Find(e), e and all intermediate vertices on
the path from e to the root are made children of the root x.
Second Optimization: Path Compression
5Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
7. Algorithms Worst-case time
Quick-find 𝑚𝑛
Quick-union 𝑚𝑛
QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗
𝒏
m union-find operations on a set of n objects.
Time Complexity
7Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
8. Lemma 1: as the find function follows the path along to
the root, the rank of node it encounters is increasing.
Union: a tree with smaller rank will be attached to a tree with greater
rank, rather than vice versa.
Find: all nodes visited along the path will be attached to the root,
which has larger rank than its children.
8Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
9. Lemma 2: A node u which is root of a sub-tree with
rank r has at least 2r nodes.
Proof: Initially when each node is the root of its own tree, it's trivially true.
Assume that a node u with rank r has at least 2r nodes. Then when two
tree with rank r Unions by Rank and form a tree with rank r + 1, the new
node has at least 2r + 2r = 2r + 1 nodes.
9Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
10. Lemma 3: The maximum number of nodes of rank r is
at most n/2r.
Proof: From lemma 2, we know that a node u which is root of a sub-tree
with rank r has at least 2r nodes. We will get the maximum number of nodes
of rank r when each node with rank r is the root of a tree that has exactly 2r
nodes. In this case, the number of nodes of rank r is n / 2r
10Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
11. We define “bucket” here: a bucket is a set that contains vertices with
particular ranks.
Proof
11Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
12. 𝒍𝒐𝒈∗ 𝒏
𝑙𝑜𝑔∗
𝑛 ∶= /
0 𝑖𝑓 𝑛 ≤ 1
1 + 𝑙𝑜𝑔∗
𝑙𝑜𝑔𝑛 𝑖𝑓 𝑛 > 1
Definition: For all non-negative integer n, 𝑙𝑜𝑔∗
𝑛 is defined as
We have 𝑙𝑜𝑔∗
𝑛 ≤ 5 unless n exceeds the atoms in the universe.
𝑙𝑜𝑔∗
29
= 1 + 𝑙𝑜𝑔∗
2:
= 1
𝑙𝑜𝑔∗
16 = 𝑙𝑜𝑔∗
2<=
= 1 + 𝑙𝑜𝑔∗
2<
= 3
𝑙𝑜𝑔∗
65536 = 𝑙𝑜𝑔∗
2<==
= 1 + 𝑙𝑜𝑔∗
2<=
= 4
𝑙𝑜𝑔∗2@AAB@ = 𝑙𝑜𝑔∗2<===
= 1 + 𝑙𝑜𝑔∗2<==
= 5
𝑙𝑜𝑔∗
4 = 𝑙𝑜𝑔∗
2<
= 1 + 𝑙𝑜𝑔∗
29
= 2
12Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
13. We can make two observations about the buckets.
The total number of buckets is at most 𝒍𝒐𝒈∗ 𝒏.
Proof: When we go from one bucket to the next, we add one more two
to the power, that is, the next bucket to [B, 2B − 1] will be [2C
,2<E
− 1 ]
The maximum number of elements in bucket [B, 2B – 1] is at
most 𝒏.
Proof: The maximum number of elements in bucket [B, 2B – 1] is at
most 𝑛 2 𝐵⁄ + 𝑛 2CI9⁄ + 𝑛 2CI<⁄ + ⋯ + 𝑛 2<EK9
≤ 2 𝐵 − 1 − 𝐵 ∗ 𝑛/2 𝐵⁄ ≤ n
Proof
13Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
14. Let F represent the list of "find" operations performed, and let
Then the total cost of m finds is T = T1 + T2 + T3
Proof
14Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
16. T1 = constant time cost (1) per m operations: O(m)
T2 = maximum number of different buckets: O(𝑚 𝑙𝑜𝑔∗
𝑛)
T3 = for all buckets ( for all notes in one bucket)
= ∑ ∑
N
<O
<E
K9
PQC
RST∗
N
9
≤ 𝑙𝑜𝑔∗
𝑛 2C
− 1 − 𝐵
N
<E
≤ 𝑙𝑜𝑔∗ 𝑛 2C
N
<E
= 𝑛 𝑙𝑜𝑔∗
𝑛
Proof
T = T1 + T2 + T3 = O(m) + O(𝑚𝑙𝑜𝑔∗
𝑛) + O(𝑛𝑙𝑜𝑔∗
𝑛)
𝑚 ≥ 𝑛 → O(𝒎𝒍𝒐𝒈∗
𝒏)
16Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
17. Algorithms Worst-case time
Quick-find 𝑚𝑛
Quick-union 𝑚𝑛
QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗
𝒏
m union-find operations on a set of n objects.
Time Complexity
17Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
18. Algorithm & Time Complexity
• Simple data structure, algorithm easy to implement.
• Complex to prove time complexity. (Proved in 1975, Tarjan,
Robert Endre )
• Time complexity is near linear.
Applications
• Keep track of the connected components of an undirected
graph;
• Find minimum spanning tree of a graph.
Conclusions
18Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015