2. Elementary implementations: summary
worst case average case ordered operations
implementation
iteration? on keys
search insert search hit insert
unordered array N N N/2 N no equals()
unordered list N N N/2 N no equals()
ordered array lg N N lg N N/2 yes compareTo()
ordered list N N N/2 N/2 yes compareTo()
Challenge. Efficient implementations of search and insert.
2
4. Binary search trees
Def. A BST is a binary tree in symmetric order.
it
A binary tree is either: best was
• Empty. the
• A key-value pair and two disjoint binary trees. of times
Symmetric order. Every node’s key is:
• Larger than all keys in its left subtree. BST
• Smaller than all keys in its right subtree. Node key val
left right
BST with smaller keys BST with larger keys
Binary search tree
4
5. BST representation
A BST is a reference to a root node.
A Node is comprised of four fields: key val
• A Key and a Value. left right
• A reference to the left and right subtree.
smaller keys larger keys
root
private class Node it 2
{
private Key key;
private Value val; best 1 was 2
private Node left; null null null
private Node right;
} the 1
Key and Value are generic types;
Key is Comparable
of 1 times 1
null null null null
5
6. BST implementation (skeleton)
public class BST<Key extends Comparable<Key>, Value>
{
private Node root; instance variable
private class Node
{
private Key key;
private Value val;
private Node left, right;
instance variable
public Node(Key key, Value val)
{
this.key = key;
this.val = val;
}
}
public void put(Key key, Value val)
{ /* see next slides */ }
public Val get(Key key)
{ /* see next slides */ }
}
6
7. BST search
Get. Return value corresponding to given key, or null if no such key.
successful search
successful search unsuccessful search
unsuccessful search
for a node node key the the
for a with with key for a node node key times
for a with with key times
times is after
times is after it it
it it it it
so go so go to the right
to the right
best best was was best best was was
the is after after it
the is it
so go so go to the right the
to the right the the the
of of of of
it it it it
best best was was best best was was
times is before
times is before was was
the the the the
so go so go to the left
to the left
the is before
the is before was was
of of of of
so go so go to the left
to the left
it it it it
best best was was best best was was
the the the the times is after after the
times is the
but the right rightis null null
but the link link is
of of of of so theso the has no node node
BST BST has no
success!
success! having that key key
having that
Searching in a in a BST
Searching BST
7
8. BST search: Java implementation
Get. Return value corresponding to given key, or null if no such key.
public Value get(Key key)
{
Node x = root;
while (x != null)
{
int cmp = key.compareTo(x.key);
if (cmp < 0) x = x.left;
else if (cmp > 0) x = x.right;
else if (cmp == 0) return x.val;
}
return null;
}
Running time. Proportional to depth of node.
8
9. BST insert
Put. Associate value with key. insert times
times is after it
it so go to the right
best was
the
of
it
best was
times is before was
the
so go to the left
of
it
best was
the
of times is after the
so it goes on the right
it
best was
the
of times
Inserting a new node into a BST 9
10. BST insert: Java implementation
Put. Associate value with key.
public void put(Key key, Value val)
{ root = put(root, key, val); }
private Node put(Node x, Key key, Value val)
{
if (x == null) return new Node(key, val);
int cmp = key.compareTo(x.key);
if (cmp < 0) x.left = put(x.left, key, val); concise, but tricky,
else if (cmp > 0) x.right = put(x.right, key, val); recursive code;
read carefully!
else if (cmp == 0) x.val = val;
return x;
}
Running time. Proportional to depth of node.
10
11. the
BST construction example of it
best was
the
key of
inserted
it it times it
was it best was
was the
the it
of times
was
worst it
the
best was
best it
the worst
best was
of times
the
Constructing a BST
of it
best was
the
of
times it
11
14. Correspondence between BSTs and quicksort partitioning
K
E S
C I Q T
A E M R U
L P U
O
Remark. Correspondence is 1-1 if no duplicate keys.
13
15. Tree shape
• Many BSTs correspond to same input data.
• Cost of search/insert is proportional to depth of node.
best case typical case worst case
the of best
it was it times it
best of times worst best the was of
worst the
times
was
worst
Typical BSTs constructed from randomly ordered keys
Typical BSTs constructed from randomly ordered keys worst
was
Remark. Tree shape depends on order of insertion. times
the
of
14
it
16. BSTs: mathematical analysis
Proposition. If keys are inserted in random order, the expected number of
compares for a search/insert is ~ 2 ln N.
Pf. 1-1 correspondence with quicksort partitioning.
Proposition. [Reed, 2003] If keys are inserted in random order,
expected height of tree is ~ 4.311 ln N – 1.953 ln ln N.
But… Worst-case for search/insert/height is N (but occurs with
exponentially small chance when keys are inserted in random order).
15
17. ST implementations: summary
guarantee average case
ordered operations
implementation
iteration? on keys
search insert search hit insert
unordered array N N N/2 N no equals()
unordered list N N N/2 N no equals()
ordered array lg N N lg N N/2 yes compareTo()
ordered list N N N/2 N/2 yes compareTo()
BST N N 1.38 lg N 1.38 lg N ? compareTo()
Next challenge. Ordered iteration.
16
18. Inorder traversal
Traversing the tree inorder yields keys in ascending order.
public void show()
{ return show(root); } BST
key val
private void show(Node x)
{ left right
if (x == null) return;
show(x.left); BST with smaller keys BST with larger keys
StdOut.println(x.key + " " + x.val);
smaller keys, in order key larger keys, in order
show(x.right);
} all keys, in order
Recursive inorder traversal of a binary search tree
To implement an iterator: need a non-recursive version.
17
19. Non-recursive inorder traversal
To process a node:
• Follow left links until empty (pushing onto stack).
• Pop and process.
• Follow right link (push onto stack).
visit(E) E
visit(B) E B
visit(A) E B A E
print A A E B
print B B E
B S
visit(C) E C
print C C E
print E E - A C I
visit(S) S
visit(I) S I
visit(H) S I H H N
print H H S I
print I I S
visit(N) S N
print N N S
print S S -
recursive calls output stack contents
18
20. Inorder iteartor: Java implementation
public Iterator<Key> iterator()
{ return new Inorder(); }
private class Inorder implements Iterator<Key>
{
private Stack<Node> stack = new Stack<Node>();
private void pushLeft(Node x)
{ go down left
while (x != null) spine and push
{ stack.push(x); x = x.left; } all keys onto stack
}
BSTIterator()
{ pushLeft(root); }
public boolean hasNext()
{ return !stack.isEmpty(); }
public Key next()
{
Node x = stack.pop();
pushLeft(x.right);
return x.key;
}
}
19
21. ST implementations: summary
guarantee average case
ordered operations
implementation
iteration? on keys
search insert search hit insert
unordered array N N N/2 N no equals()
unordered list N N N/2 N no equals()
ordered array lg N N lg N N/2 yes compareTo()
ordered list N N N/2 N/2 yes compareTo()
BST N N 1.38 lg N 1.38 lg N yes compareTo()
Next challenge. Guaranteed efficiency for search and insert.
20
22. Searching challenge 3 (revisited):
Problem. Frequency counts in “Tale of Two Cities”
Assumptions. Book has 135,000+ words; about 10,000 distinct words.
Which searching method to use?
1) Unordered array.
2) Unordered linked list
3) Ordered array with binary search.
4) Need better method, all too slow.
5) Doesn’t matter much, all fast enough.
6) BSTs.
insertion cost < 10000 * 1.38 * lg 10000 < .2 million
lookup cost < 135000 * 1.38 * lg 10000 < 2.5 million
21
24. Rotation in BSTs
Two fundamental operations to rearrange nodes in a tree.
• Maintain symmetric order.
• Local transformations (change just 3 pointers).
• Basis for advanced BST algorithms.
Strategy. Use rotations on insert to adjust tree shape to be more balanced.
h
u h
v
h = rotL(u)
v
u
A h = rotR(v) C
B C A B
Key point. No change to BST search code (!)
23
25. Rotation in BSTs
Two fundamental operations to rearrange nodes in a tree.
• Easier done than said.
• Raise some nodes, lowers some others.
root = rotL(A) a.right = rotR(S)
private Node rotL(Node h)
{
Node v = h.right;
h.right = v.left;
v.left = h;
return v;
}
private Node rotR(Node h)
{
Node u = h.left;
h.left = u.right;
u.right = h;
return u;
}
24
26. BST root insertion
insert G
Insert a node and make it the new root.
• Insert node at bottom, as in standard BST.
• Rotate inserted node to the root.
• Compact recursive implementation.
private Node putRoot(Node x, Key key, Val val)
{
if (x == null) return new Node(key, val);
int cmp = key.compareTo(x.key);
if (cmp < 0)
{
x.left = putRoot(x.left, key, val);
x = rotR(x);
} tricky recursive
else if (cmp > 0) code; read very
{ carefully!
x.right = putRoot(x.right, key, val);
x = rotL(x);
}
else if (cmp == 0) x.val = val;
return x;
}
25
27. BST root insertion: construction
Ex. A S E R C H I N G X M P L
Why bother?
• Recently inserted keys are near the top (better for some clients).
• Basis for randomized BST.
26
28. Randomized BSTs (Roura, 1996)
Intuition. If tree is random, height is logarithmic.
Fact. Each node in a random tree is equally likely to be the root.
Idea. Since new node should be the root with probability 1/(N+1),
make it the root (via root insertion) with probability 1/(N+1).
and apply idea recursively
private Node put(Node x, Key key, Value val)
{
if (x == null) return new Node(key, val);
int cmp = key.compareTo(x.key);
if (cmp == 0) { x.val = val; return x; } no rotations if key in table
if (StdRandom.bernoulli(1.0 / (x.N + 1.0)) root insert with the right
return putRoot(h, key, val); probability
if (cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
maintain count of nodes in
x.N++;
subtree rooted at x
return x;
}
27
32. Randomized BST: analysis
Proposition. Randomized BSTs have the same distribution as BSTs under
of
random insertion order, no matter in what order keys are inserted.
it times
best the was
worst
Typical BSTs constructed from randomly ordered keys
• Expected height is ~ 4.31107 ln N.
• Average search cost is ~ 2 ln N.
• Exponentially small chance of bad balance.
Implementation cost. Need to maintain subtree size in each node.
30
33. ST implementations: summary
guarantee average case
ordered operations
implementation
iteration? on keys
search insert search hit insert
unordered array N N N/2 N no equals()
unordered list N N N/2 N no equals()
ordered array lg N N lg N N/2 yes compareTo()
ordered list N N N/2 N/2 yes compareTo()
BST N N 1.38 lg N 1.38 lg N yes compareTo()
randomized BST 3 lg N 3 lg N 1.38 lg N 1.38 lg N yes compareTo()
Bottom line. Randomized BSTs provide the desired guarantee.
probabilistic, with exponentially
Bonus. Randomized BSTs also support delete (!)
small chance of linear time
31
35. BST deletion: lazy approach
To remove a node with a given key:
• Set its value to null.
• Leave key in tree to guide searches (but don't consider it equal to search key).
E E
A S A S
delete I
C I C I tombstone
H R H R
N N
Cost. O(log N') per insert, search, and delete, where N' is the number of
elements ever inserted in the BST.
Unsatisfactory solution. Tombstone overload.
33
36. BST deletion: Hibbard deletion
To remove a node from a BST:
• Zero children: just remove it.
• One child: pass the child up.
• Two children: find the next largest node using right-left*, swap with
next largest, remove as above.
zero children one child two children
34
37. BST deletion: Hibbard deletion
To remove a node from a BST:
• Zero children: just remove it.
• One child: pass the child up.
• Two children: find the next largest node using right-left*, swap with
next largest, remove as above.
zero children one child two children
Unsatisfactory solution. Not symmetric, code is clumsy.
Surprising consequence. Trees not random (!) ⇒ sqrt(N) per op.
Longstanding open problem. Simple and efficient delete for BSTs.
35
38. Randomized BST deletion
To delete a node containing a given key:
• Find the node containing the key.
• Remove the node.
• Join its two subtrees to make a tree.
Ex. Delete S.
E
A S
C I X
H R
N
36
39. Randomized BST deletion
To delete a node containing a given key: private Node remove(Node x, Key key)
• Find the node containing the key. {
if (x == null) return null;
• Remove the node. int cmp = key.compareTo(x.key);
•
if (cmp < 0)
Join its two subtrees to make a tree. x.left = remove(x.left, key);
else if (cmp > 0)
x.right = remove(x.right, key);
else if (cmp == 0)
return join(x.left, x.right);
Ex. Delete S. return x;
}
E
join these two subtrees
A
C I X
H R
N
37
40. Randomized BST join
To join two subtrees with all keys in one less than all keys in the other:
• Maintain counts of nodes in subtrees a and b.
• With probability |a|/(|a|+|b|):
- root = root of a
- left subtree = left subtree of a
- right subtree = join b and right subtree of a
• With probability |a|/(|a|+|b|) do the symmetric operations.
make I the root and recursively join these
to join these two subtrees with probability 4/5 two subtrees to form right
subtree of I
X
I X I X
H R H R
N N
38
41. Randomized BST join
To join two subtrees with all keys in one less than all keys in the other:
• Maintain counts of nodes in subtrees a and b.
• With probability |a|/(|a|+|b|):
- root = root of a
- left subtree = left subtree of a
- right subtree = join b and right subtree of a
• With probability |a|/(|a|+|b|) do the symmetric operations.
to join these two
subtrees
private Node join(Node a, Node b)
{
if (a == null) return b; X
if (b == null) return a;
R
if (StdRandom.bernoulli((double) a.N / (a.N + b.N))
{ a.right = join(a.right, b); return a; }
else N make R the root
{ b.left = join(a, b.left ); return b; } with probability 2/3
}
R
N X
39
42. Randomized BST deletion
To delete a node containing a given key:
• Find the node containing the key.
• Remove the node.
• Join its two subtrees to make a tree.
E E
A S A I
C I X C H R
H R
N X
N
Proposition. Tree still random after delete (!).
Bottom line. Logarithmic guarantee for search/insert/delete.
40
43. ST implementations: summary
guarantee average case
ordered operations
implementation
iteration? on keys
search insert delete search hit insert delete
unordered array N N N N/2 N N/2 no equals()
unordered list N N N N/2 N N/2 no equals()
ordered array lg N N N lg N N/2 N/2 yes compareTo()
ordered list N N N N/2 N/2 N/2 yes compareTo()
BST N N N 1.38 lg N 1.38 lg N ? yes compareTo()
randomized BST 3 lg N 3 lg N 3 lg N 1.38 lg N 1.38 lg N 1.38 lg N yes compareTo()
Bottom line. Randomized BSTs provide the desired guarantee.
Next lecture. Can we do better?
probabilistic, with exponentially
small chance of linear time
41
Notes de l'éditeur
Key extends Comparable: not actually inheritance - means that whatever generic type Item the client specifies must implement the Comparable interface
Use GrowingTree WebStart application to demo
Here depth of node = depth of node containing the key (or bottom of tree where search ends)
Search, then insert
Use GrowingTree WebStart application to demo
iterative version more cumbersome, but not difficult
Here depth of node = depth of node containing the key (or bottom of tree where search ends)
Iterative version possible, but more cumbersome
variance = O(1) for both propositions
worst case = keys inserted in ascending order
Reference: The Height of a Random Binary Search Tree by Bruce Reed, JACM 2003
Approach. Mimic recursive inorder traversal.
The rotate methods return the resulting subtree. The third pointer changed is the pointer of the parent.
exact same code as before, except for rotations
Note: root inserting N keys inorder results in same tree as inserting them in reverse order
The 4.33107 ln N term is actually 4.31107 ln N - 1.95 ln ln N. The second term might have an impact for moderate values of N
Note 4.31107 ln N ~ 2.99 lg N
right-left* means right-left-left-left-left-left until you find the next largest
Hibbard = person credited with deletion strategy
Conjecture (that I think is still unresolved): symmetric version of Hibbard deletion leads to O(log N) performance
Note figure contains duplicate keys...
right-left* means right-left-left-left-left-left until you find the next largest
Hibbard = person credited with deletion strategy
Conjecture (that I think is still unresolved): symmetric version of Hibbard deletion leads to O(log N) performance
Note figure contains duplicate keys...
Two roots have a battle. Which one should win?
Just because X lost the battle with I doesn't mean that X should be at bottom of tree. Now it gets to throwdown with subtree rooted at R to see who becomes the right child
an illustration created by Ellen Kim '10, inspired by randomized BST lecture