2. Space and Time Tradeoffs
Space and Time tradeoffs in algorithm design are a
well-known issue .
Example: computing values of a function at many points.
One type of technique is to use extra space to
facilitate faster and/or more flexible access to the
data.
This approach is called prestructuring.
We illustrate this approach by Hashing.
2
3. Hashing
A dictionary is a set that supports operations
of searching, insertion, and deletion.
Each element in the set contains a key and
satellite data (the remainder of the record.)
The keys are unique, but the satellite data are
not.
A hash table is an effective data structure for
implementing dictionaries.
Hashing is based on the idea of distributing
keys among an one-dimensional array.
3
4. Direct-address Tables
Suppose that an application needs a dynamic set in
which each element has a key drawn from the
Universe U = {0, 1, …, m-1}, where m is not too
large. Denote direct-address table by T[0..m-1], in
which each position, or slot, corresponds to a key in
the universe U.
Operations
DIRECT-ADDRESS-SEARCH(T, k) O(1)
Return T[k]
DIRECT-ADDRESS-INSERT(T, x) O(1)
T[key[x]] x
DIRECT-ADDRESS-DELETE(T, x) O(1)
T[key[x]] NIL
4
5. Hash Tables
A hash table is used when the set K of keys stored in
dictionary is much smaller than the universe U = {0,
1, …, n-1}, of all possible Keys.
An example, the key space of strings of characters.
Requires much less storage while search cost is still O(1).
An example of hash table
Direct addressing vs. Hashing
Direct addressing: an element with key k is stored in slot k;
Hashing: an element with k is stored in slot h(k), where h(k)
is the hash function.
5
6. Hash Tables
Hash function assigns an integer between 0 and m-1,
called hash address, to a key.
An example hash function: h(K) = K mod m
Integer keys (example)
Character keys: ord(K), the position of the key in the alphabet.
Character string keys:
s −1
(∑ ord (c j )) mod m
i =0
( ord(c s-1) Cs-1 + ord(c
s-2) Cs-2 + … + ord(c
0) C0 ) mod m
Let m = 13, calculate the hash address of the following
strings
A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED
6
7. Hash Function
A hash function needs to satisfy two
requirements:
Needs to distribute keys among the cells of
the hash table as evenly as possible. (m is
usually chosen to be prime)
Has to be easy to compute.
7
8. Collision and Resolution
Collision: two keys hash to the same
slot.
Collision resolution by open hashing
(separate chaining)
Collision resolution by closed hashing
(open addressing)
8
9. Open Hashing (Separate Chaining)
Put all the elements that hash to the same
slot in a linked list.
Example
Dictionary Operations
CHAINED-HASH-SEARCH(T, k)
search for an element with key k in list T[h(k)]
CHAINED-HASH-INSERT(T, x) O(1)
insert x at the head of list T[h(key[x])]
CHAINED-HASH-DELETE(T, x)
search and delete x from the list T[h(key[x])]
Exercise
9
10. Cost of Search
Load factor of the hash table
α = n/m, where n is the number of keys and m is
the number of slots in the hash table.
Too small: waste of space but fast in search
Too large: save space but slow in search
The worst case O(n): all keys hash to the same slot
The average case
Average cost of a successful search: O(1 + α / 2)
Average cost of an unsuccessful search: O(α)
If n is about equal to m, O(1)
10
11. Closed Hashing (Open Address Hashing)
Open address hashing
a strategy for storing all elements right in the array of the hash
table, rather than using linked lists to accommodate collisions.
Assumption: (m >=n)
The idea is that if the hash slot for a certain key is occupied by a
different element, then a sequence of alternative locations for the
current element is defined.
For every key k, a probe sequence <h(k, 0), h(k, 1), …, h(k, m-1)>
is generated so that when a collision occurs, we successively
examine, or probe the hash table until we find an empty slot in
which to put the key..
Probing policies
Linear probing
Quadratic probing
Double hashing
11
12. Linear Probing
Given an ordinary hash function: h’, an auxiliary hash function,
the method of linear probing uses the hash function
h(k, i) = (h’(k) + i) mod m, for i = 0, 1, …, m-1.
Search
Compare the given key with the key in the probed position until
either the key is found or an empty slot is encountered.
An example
The problem with deletion and the solution
Lazy deletion: mark the previously occupied locations as “obsolete”
to distinguish them from locations that have not been occupied.
Advantage & Disadvantage:
Easy to implement
but when the load factor approaches 1, it suffers from clustering:
Long runs of occupied slots build up, increasing the average search
time.
Exercise
12
13. Quadratic Probing
Given an ordinary hash function: h’, an auxiliary hash
function, the method of quadratic probing uses the
hash function
h(k, i) = (h’(k) + c1i + c2i2) mod m,
where i = 0, 1, …, m-1, c1 and c2 ‡ 0.
Advantage & Disadvantage:
Easy to implement
It suffers from a milder form clustering: If two keys have the
same initial probe position, then their probe sequences are
the same.
13
14. Double Hashing
Given two auxiliary hash functions: h1 and h2,
double hashing uses the hash function
h(k, i) = (h1(k) + ih2(k)) mod m,
where i = 0, 1, …, m-1.
An example
One of the best methods available for open
addressing.
14