2. AGENDA
Dictionaries,
Symbol table and their
implementation
Series_VIT University
TanmaySinha_Student Seminar
What is Hashing…..Why
Hashing????
Components
Comparison of techniques
Time Complexity
Examples
3. DICTIONARIES
Real time examples of dictionaries
Spelling Checker
Series_VIT University
TanmaySinha_Student Seminar
Symbol tables generated by assemblers and
compilers
Routing tables used in networking
components(for DNS lookup)
4. SYMBOL TABLEA MODIFIED DICTIONARY
Data structure that associates a value with
key
Basic operations allowed
Series_VIT University
TanmaySinha_Student Seminar
Implemented using
1)Arrays(Unordered/Ordered)-O(n), O(n) /O(lg n)
2)Linked List(Ordered/Unordered)-O(n)
3)Binary Search Trees-O(lg n)
4)HASHING….!!!!
THE “DREADED” TAG of TIME
COMPLEXITY of an algorithm..!!!!!
5. UNDERSTANDING HASHING
ArraysHash Table
Example Design an algorithm for printing the
Series_VIT University
TanmaySinha_Student Seminar
1st repeated character, if there are duplicate
elements in it……!!!!!!
Possible Solutions From Brute Force Approach to a
better solution
IF ARRAYS ARE THERE……WHY
HASHING…?????
Map Keys to locations…!!!
6. COMPONENTS IN HASHING
Hash Table
1)Generalization of an array
Series_VIT University
TanmaySinha_Student Seminar
2)Direct addressing
3)ProblemsLess Locations and more possible
keysanalogous to VIRTUAL MEMORY concept
Basically , a hash table is a data structure that
stores the keys and their associated values!!!
7. COMPONENTS IN HASHING…CONTD
Hash Function
1)Transform the key to index, ‘k’ to ‘h(k)’….thereby
reducing range of array indices!!
Series_VIT University
TanmaySinha_Student Seminar
2)Characteristics of Good Hash fn
Minimize collision
Be quick and easy to compare
Distribute key values evenly in the hash table
Use all the information provided in the key
Have a high load factor for a given set of keys
8. COMPONENTS IN HASHING…CONTD
DEFINING TERMS
1. Load Factor No. of elements in hash
Series_VIT University
TanmaySinha_Student Seminar
table/hash table size=n/m
2. Collisions2 records stored in same memory
location
What if the keys are non-integers…???
Choice of x=33,37,39,41 gives atmost 6
collisions on a vocabulary of 50000 elglish
words!!!!!!!!
9. COLLISION RESOLUTION
TECHNIQUES
Process of finding an alternate location
Direct Chaining- array of linked lists –
Series_VIT University
TanmaySinha_Student Seminar
Separate chaining
Open Addressing – array based – Linear
Probing, Quadratic probing , Double Hashing
10. CHAINING
Slot ‘x’ contains a pointer(reference) to head
of the list of all the stored elements that hash to
‘x’
Analogous to adjacency matrix
Series_VIT University
TanmaySinha_Student Seminar
representation of graphs
Doubly Linked list preferable Given the
node’s address, it helps to delete quickly(takes an
i/p element ‘x’ and not it’s key ‘k’)
Worst case behaviour is terribleall ‘n’ keys
hash to the same slot,creating a list of length ‘n’
Avg. Case behaviour can be improved , if we
assume that any given element in equally likely
to hash into any of the table slotsSIMPLE
UNIFORM HASHING!!!!
11. LINEAR PROBING
Search Sequentially If location occupied, check
next location
Restrictionno. of elements inserted into the table <
Series_VIT University
TanmaySinha_Student Seminar
table size
Fn. For rehashing
H(Key)= (n+1) % tablesize
Problems – Clustering!!!
Importance of Tablesizeshould be prime,should
not be a power of 2
PROBLEM IN DELETION->use of tombstones!!!!
13. QUADRATIC PROBING
Our main requirement now is to eliminate
CLUSTERING problem
Series_VIT University
TanmaySinha_Student Seminar
Instead of step size 1 , if the location is
occupied check at locations i+12 , i+22 ……
Fn. For rehashing
H(Key)= (n+k2 ) % tablesize
15. DOUBLE HASHING
Reduces Clustering in a better way.
Use of a 2nd hash function h2(offset), such that h2!=0
Series_VIT University
TanmaySinha_Student Seminar
and h2!=h1
Concept
First probe at location h1
If it’s occupied, probe at location
(probe+k*offset)(h1+h2) , (h1+2*h2)…….
Specialized case is Linear Probing offset is 1
If Size of table is prime, then the technique
ensures we look at all table locations.
16. EXAMPLE
0
H1(key)= key% 11
1 H2(key)=7-(key%7)
2 58 % 11=3
Series_VIT University
TanmaySinha_Student Seminar
3 58 14 % 11=33+7=10
4 91% 11=33+73+2*7
%11= 6
5
25%11=33+33+2*3=9
6 91
7
(key%7) lies between 0
8 and 6, so that h2 always
9 25 lies between 1 and 7
10 14
17. COMPARISON
Linear Probing Quadratic probing Double Hashing
Fastest amongst three Easier to implement and Makes more efficient use
deploy of memory
Series_VIT University
TanmaySinha_Student Seminar
Uses few probes Uses extra memory for Uses few probes but
links + does not probe all takes more time
table locations
Problem of Primary Problem of Secondary More complicated to
Clustering Clustering implement
Interval between probes Interval between probes Interval between probes
is fixed – often at 1 increases proportional to is computed by another
hash value hash function
18. HOW DOES HASHING GET O(1)
COMPLEXITY???
Each block(may be a linked list) on the avg. stores max. no.
of elements less than the “Load Factor(lf)”
Generally “Load Factor” is constant So,searching time
Series_VIT University
TanmaySinha_Student Seminar
becomes constant
Rehashing the elements with bigger hash table size , if
avg. no. of elements in block is > Load Factor
Access time of table depends on Load factor, which in-turn
depends on Hash Function
Unsuccessful/Successful Search For chaining.Total
time = O(1+lf), including time req. to compute h(k)
Unsuccessful/Successful Search For Probing.Total
time = O(1/(1+lf)), including time req. to compute h(k)
19. EXTRA POINTS
Static Hashing data is staticset of keys fixed
ExampleSet of reserved words in a programming
Series_VIT University
TanmaySinha_Student Seminar
language, set of file names on CD-ROM
Dynamic Hashingkeys can change dynamically.
Example Cache design, Hash functions in
Cryptography
20. A ONE-WAY HASH FUNCTION TAKES VARIABLE-LENGTH INPUT—IN THIS
CASE, A MESSAGE OF ANY LENGTH, EVEN THOUSANDS OR MILLIONS OF
BITS—AND PRODUCES A FIXED-LENGTH OUTPUT; SAY, 160-
BITS(MESSAGE DIGEST)
hash function
Series_VIT University
TanmaySinha_Student Seminar
plaintext
digest signed
with private
key
message digest plaintext
+
signature
private key
use for
signing
21. PROBLEM 1
Can you Give an algorithm for finding the 1st non
repeated character in the string????? For e.g, the
1st non repeated character in the string “abzddab”
is ‘z’
Series_VIT University
TanmaySinha_Student Seminar
Brute Force approach Improvement using
For each character in the hash tables
string, scan the remaining Create a hash table by
string….If that character reading all characters in i/p
doesn’t appear, we’re done string and keep their
with the solution, else we count.
move to the next character After creating hash table,
O(n2 ) just read the hash table
entries to find out, which
element has count = 1
O(n)
22. PROBLEM 2
Given an array of ‘n’ elements. Find 2
elements in the array whose sum is equal to
given element ‘K’
Alternative Approach
Brute ForceO(n2 )
Series_VIT University
TanmaySinha_Student Seminar
ObejctiveA[x]+A[y]=K
Improving Time
ComplexityO(nlgn) Insert A[x] into hash table.
Before moving to next
Maintain 2 indices ‘low=0’
element,check whether K-
and ‘high=n-1’.
A[x] also exists in hash
Compute A[low]+A[high] table.
If sum is < K, decrement Existence of such a no.
‘high’ , else increment ‘low’ means that we are able to
If sum = K, that’s the find the indices.
solution…BINGO!!! Else,proceed to next i/p
element.
O(n)