1. 1
ANNA UNIVERSITY TIRUCHIRAPPALLI
Regulations 2008 Syllabus
B. Tech IT/ B.E EEESEMESTER III
CS1201 - DATA STRUCTURES
Prepared By:
B.Sundara vadivazhagan HOD i/c / IT
S.Karthik Lect/ IT
G.Mahalakshmi Lect / IT
UNIT I - FUNDAMENTALS OF ALGORITHMS
Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of
Algorithm using Data Structures – Performance Analysis – Time Complexity – Space
Complexity – Amortized Time Complexity – Asymptotic Notation
UNIT II - FUNDAMENTALS OF DATA STRUCTURES
Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists
– Queue and its Representation – Applications of Stack – Queue and Linked Lists.
UNIT III - TREES
Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and
External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –
Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –
Hashing
UNIT - IV GRAPHS AND THEIR APPLICATIONS
Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s
Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum
Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked
Representation of Graphs – Graph Traversals
UNIT V - STORAGE MANAGEMENT
General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes –
Automatic List Management : Reference Count Method – Garbage Collection – Collection and
Compaction
2. 2
TEXT BOOKS
1. Cormen T. H.., Leiserson C. E, and Rivest R.L., ―Introduction to Algorithms‖, Prentice Hall of
India, New Delhi, 2007.
2. M.A.Weiss, ―Data Structures and Algorithm Analysis in C‖, Second Edition, Pearson
Education, 2005.
REFERENCES
1. Ellis Horowitz, Sartaj Sahni and Sanguthevar Rajasekaran, ―Computer Algorthims/C++‖,
Universities Press (India) Private Limited, Second Edition, 2007.
2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, ―Data Structures and Algorithms‖,First Edition,
Pearson Education, 2003.
3. R. F. Gilberg and B. A. Forouzan, ―Data Structures‖, Second Edition, Thomson India Edition,
2005.
4. Robert L Kruse, Bruce P Leung and Clovin L Tondo, ―Data Structures and Program Design in
C‖, Pearson Education, 2004.
5. Tanaenbaum A. S. Langram, Y. Augestein M.J, ―Data Structures using C‖, Pearson Education,
2004.
4. 4
UNIT I - FUNDAMENTALS OF ALGORITHMS
Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of
Algorithm using Data Structures – Performance Analysis – Time Complexity – Space
Complexity – Amortized Time Complexity – Asymptotic Notation
5. 5
UNIT – I FUNDAMENTALS OF ALGORITHMS
I. ALGORITHM
An algorithm is any well-defined computational procedure that takes some value, or set
of values, as input and produces some values, as Output. In other words, an algorithm is a
sequence of computational steps that transform the input into the output.
An algorithm can be viewed as a tool for solving a well-specified computational problem.
The statement of the problem specifies the desired input/ output relationship. The algorithm
describes a specific computational procedure for achieving that input/output relationship.
Study of Algorithm :
Problem of sorting a sequence of numbers into a non-decreasing order. The Sorting
problem is defined as
Input: A sequence of n numbers (a1,a2….an)
Output: A permutation (reordering) (a1‘,a2‘,…an‘) of the input sequence such that
a1‘≤a2‘≤…..≤an‘
Given an input sequence such as (31,41,59,26,41,58), a sorting algorithm returns as
output the sequence (26,31,41,41,58,59). Such an input sequence is called an instance of the
sorting problem. In general, an instance of a problem consists of all the inputs needed to
compute a solution to the problem.
An algorithm is said to be correct if, for every instance, it halts with the correct output.
The correct algorithm solves the given computational problem.
An incorrect algorithm might not halt at all on some input instances, or it might halt with
other than the desired answer.
Example
Insertion Sort
Insertion sort is an efficient algorithm for sorting a small number of elements.
Insertion sort works the way many people sort a bridge or gin rummy hand.
We start with an empty left hand and the cards face down on the table. We then
remove one card at a time from the table and insert it into the correct position in
the left hand.
To find the correct position for a card, we compare with each of the cards already
in the hand, from right to left.
Pseudocode for INSERTION-SORT
INSERTION-SORT (A)
for j 2 to length[A]
do keyA[j]
6. 6
Insert A[j] into the sorted sequence A[1….j-1]
i j-1
while i>0 and A[i]>key
do A[I+1] A[i]
ii-1
A[i+1]key
The insertion sort is presented as a procedure called INSERTION SORT, which
takes as parameter an array A[1….n] containing a sequence of length n that is to
be sorted.
The input numbers are sorted in place: the numbers are rearranged within the
array A, with at most a constant number of them stored outside the array at any
time.
The input array A contains the sorted output sequence when INSERTION-SORT
is finished.
5 4 6 1 3
2 5 6 1 3
2 4 5 1 3
2 4 5 6 3
1 2 4 5 6
1 2 3 4 5 6
Fig: The operation of INSERTION-SORT on the array A=(5, 2, 4,6, 1 ,3). The position
of index j is indicated by a circle
The fig shows how this algorithm works for A (5, 2, 4, 6, 1, 3). The index j indicates the
current card being inserted into the hand. Array elements A [1..j-1] constitute the currently
sorted hand, and elements A[j+1…..n] corresponds to the pile of cards still on the table.
The index j moves left to right through the array. At each iteration of the ―outer‖ for loop,
the element A[j] is picked out of the array.
Then starting in position j-1 elements are successively moved one position to the right
until the proper position for A[j] is found, at which point it is inserted.
2
4
6
1
3
7. 7
Goals for an algorithm
Basic goals for an algorithm
1. Always correct
2. Always terminates
3. Performance- Performance often draws the line between what is possible and what is
impossible.
The notion of ―algorithm‖
Description of a procedure which is
1. Finite (i.e., consists of a finite sequence of Characters)
2. Complete (i.e., describes all computation steps)
3. Unique (i.e., there are no ambiguities)
4. Effective (i.e., each step has a defined effect and Can be executed in finite time)
Properties:
Desired properties of algorithms
Correctness
o For each input, the algorithm calculates the requested value
Termination
For each input, the algorithm performs only a finite number of steps
Efficiency
o Runtime : The algorithm runs as fast as possible
o Storage space: The algorithm requires as little storage space as possible.
Algorithms-Distinct areas:
Five distinct areas to study of algorithms.
1. Creating or devising algorithms: Various design techniques are created to yield good
algorithms.
2. Expressing the algorithms in a structured representation.
3. Validating algorithms: The algorithms devised should compute the correct answer for
all possible legal inputs. This process is known as a algorithm validation.
8. 8
4. Analyzing algorithms: It refers to the process of determining how much computing time
and storage an algorithm will require. How well an algorithm does performs in the best case,
worst case, average case.
Kinds of Analyses
Worst-case: (usually)
T(n) = maximum time of algorithm on any input of size n.
Average-case: (sometimes)
T(n) = expected time of algorithm over all inputs of size n.
Need assumption of statistical distribution of inputs.
Best-case: (Never)
Cheat with a slow algorithm that works fast on some input.
5. Testing algorithm
It consists of two phases: Debugging and profiling
Debugging is the process of executing programs on sample data to determine if any
faulty results occur.
Profiling is the process of executing a correct programs on data sets and measuring the
time and space it takes to compute the results.
II. ANALYSIS OF ALGORITHM
Analyzing an algorithm has come to mean predicting the resources that the algorithm
requires. Occasionally, resources such as memory, communication bandwidth, or logic gates are
of primary concern, but most often it is computational time that we want to measure.
Generally, by analyzing several candidate algorithms for a problem, a most efficient one
can be easily identified. Such analysis may indicate more than one viable candidate, but several
inferior algorithms are usually discarded in the process.
Analysis predicting the resources that the algorithm requires, resources such as memory,
communication bandwidth or computer hardware are of primary concern, but most often it is
necessary to measure the computational time.
By analyzing several candidate algorithms for a problem, a most efficient one can be
easily identified and others are discarded in the process.
The main reasons for analyzing algorithms are
It is an intellectual activity
It is a challenging one to predict the future by narrowing the predictions to algorithms.
Computer science attracts many people who enjoy being efficiency experts.
Structural Programming Model
Niklaus Wirth started that any algorithm could be written with only three programming
constructs Sequence, Selection, Loop
9. 9
The implementation of these constructs relies on the implementation language like C++
language.
Sequence is a series of statement that do not alter the execution path within an algorithm.
Selection statements evaluate one or more alternatives. If alternatives are true, one path is taken.
If alternatives are false, a different path is taken.
Loop
Iterates a block of code.
Usually the condition is evaluate before the Body of the loop is executed.
If the condition is true, the body is executed .
If the condition is false, the loop terminates.
ANALYSIS OF INSERTION SORT :
The time taken by the Insertion Sort procedure depends on the input: sorting a thousand
numbers takes longer than sorting three numbers.
Insertion sort can take different amounts of time to sort two input sequences of the same
size depending on how nearly sorted they already are.
In general, the time taken by an algorithm grows with the size of the input, so it is
traditional to describe the running time of a program as a function of the size of its input.
To do so, we need to define the terms "running time" and "size of input" more carefully.
The best notion for input size depends on the problem being studied.
For many problems, such-as sorting or computing discrete Fourier transforms, the most
natural measure is the number a/items in the input.
For example, the array size n for sorting. For many other problems, such as multiplying
two integers, the best measure of input size is the total number of bits needed to represent
the input in ordinary binary notation.
The running time of an algorithm on a particular input is the number of primitive
operations or "steps" executed.
We start by presenting the INSERTION-SORT procedure with the time "cost" of each
statement and the number of times each statement is executed. For each j = 2,3, ... , n,
where n = length[A], we let tj be the number of times the while loop test in line 5 is
executed for that value of j.
We assume that comments are not executable statements, and so they take no time.
INSERTION-SORT(A) cost times
1 for j +- 2 to length[A] C1 n
2 do key +- A[j] C2 n - 1
3 l> Insert A[j] into the sorted
l> sequence A[l .. j - 1]. 0 n - 1
4 i+-j-l C4 n-l
5 while i > a and A[i] > key Cs L;=2 tj
6 do A[i + 1] +- A[i] C6 L;=2(tj - 1)
7 i+-i-l C7 L;=2(tj - 1)
10. 10
8 A[i + 1] +- key C8 n - 1
The running time of the algorithm is the sum of running times for each statement
executed; a statement that takes Ci steps to execute and is executed n times will contribute Ci n
to the total running time. To compute T(n), the running time of INSERTION-SORT, we sum the
products of the cost and times columns, obtaining
n n n
T(n) =c1n+c2 (n-1)+c4(n-1)+ c5∑ tj + c6 ∑ (tj-1) +c7 ∑ (tj-1) + c8 (n-1)
j=2 j=2 j=2
Even for inputs of a given size, an algorithm's running time may depend on which input
of that size is given. For example, in INSERTION-SORT, the best case occurs if the array is
already sorted. For each j = 2,3, ... , n, we then find that A[i]≤key in line 5 when i has its initial
value of
j - 1. Thus tj = 1 for j = 2,3, ... , n, and the best-case running time is
T(n) = cln + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1)
= (cl + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8)
This running time can be expressed as an+b for constants a and b that depend on the
statement costs ci. It is thus a linear function of n.
If the array is in reverse sorted order the worst case results. Compare each element A[j]
with each element in the entire sorted sub array A[1…j-1] and so tj=j for j=2,3,…n
n
∑ j=n (n+1)/2 -1
j=2
and
n
∑ (j-1)=n (n+1)/2
j=2
T(n)=c1n+c2(n-1)+c5(n(n-1)/2 -1)+c6((n(n-1)/2 + c7((n(n-1)/2+c8(n-1)
= (c5/2+c6/2+c7/2) n2
+(c1+c2+c4+c5/2-c6/2-c7/2+c8) n-(c2+c4+c5+c8)
This worst case running time can be expressed as an2
= bn+c for constants a, b, c that
again depend on the statement costs ci; it is thus a quadratic function of n
11. 11
Worst case and Average case Analysis
In the analysis of Insertion sort, the best case in which the input array was already sorted,
and the worst case, in which the input array was reverse sorted. To find only the worst case
running time, that is, the longest running time for any input of size n. Three reasons are
The worst case running time of an algorithm is an upper bound on the running time for
any input.
Knowing it gives us the guarantee that the algorithm will never take any longer. For some
algorithms, the worst case occurs fairly often. For example, in searching a database for a
particular piece of information, the searching algorithm worst case will often occur when
the information is not present in the database.
The average case is often roughly as bad as the worst case. Suppose that we randomly
choose n numbers and apply insertion sort. How long it takes to determine where in sub
array A[1…j-1] to insert element A[j]? On average half the elements in A[1..j-1] are less
than A[j], and half the elements are greater. On average, we check half of the sub array
A[1..j-1], so tj=j/2. Resulting average case running time, it turns out to be a quadratic
function of the input size, just like the worst case running time.
One problem with performing an average case analysis, however is that it may not be
apparent what constitutes an average input for a particular problem.
DESIGNING ALGORITHMS
There are many ways to design algorithms. Insertion sort uses an incrementa1
approach: having sorted the sub array A [I .. j - 1], we insert the single element
A[j] into its proper place, yielding the sorted sub array A[l .. j].
In this section, we examine an alternative design approach, known as "divide-
and-conquer."
We shall use divide-and-conquer to design a sorting algorithm whose worst-case
running time is much less than that of insertion Sort.
One advantage of divide-and-conquer algorithms is that their running times are
often easily determined using techniques
Divide and Conquer approach:
Many useful algorithms are recursive in structure: to solve a given problem, they call
themselves recursively one or more times to deal with closely related sub problems.
These algorithms typically follow a divide-and-conquer approach: they break the problem
into several sub problems that are similar to the original problem but smaller in size, solve the
sub problems recursively, and then combine these solutions to create a solution to the original
problem.
The divide-and-conquer paradigm involves three steps at each level of the recursion:
12. 12
Divide the problem into a number of sub problems.
Conquer the sub problems by solving them recursively. If the sub problem sizes are small
enough, however, just solve the sub problems in a straightforward manner.
Combine the solutions to the sub problems into the solution for the original problem.
EXAMPLE - MERGE SORT
The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it
operates as follows.
Divide: Divide the n-element sequence to be sorted into two subsequences of nl2 elements each.
Conquer: Sort the two subsequences recursively using merge sort. Combine: Merge the two
sorted subsequences to produce the sorted answer.
We note that the recursion "bottoms out" when the sequence to be sorted has length I, in
which case there is no work to be done, since every sequence of length I is already in sorted
order.
The key operation of the merge sort algorithm is the merging of two sorted sequences in
the "combine" step. To perform the merging, we use an auxiliary procedure MERGE (A,p, q, r),
where A is an array and p, q, and r are indices numbering elements of the array such that p :S q <
r. The procedure assumes that the sub arrays A(p .. q] and A[q + I .. r] are in sorted order. It
merges them to form a single sorted sub array that replaces the current sub array A(p .. r].
Although we leave the pseudo code as an exercise it is easy to imagine a MERGE procedure that
takes time 8(n), where n = r - p + 1 is the number of elements being merged. Returning to our
card playing motif, suppose we have two piles of cards face up on a table. Each pile is sorted,
with the smallest cards on top. We wish to merge the two lines into a single sorted output pile,
which is to be face down on the table input pile and place it face down onto the output pile.
Computationally, each basic step takes constant time, since we are checking just two top cards.
Since we perform at most n basic steps, merging takes 8(n) time.
We can now use the MERGE procedure as a subroutine in the merge sort algorithm. The
procedure MERGE-SORT (A,p, r) sorts the elements in the sub array A[p .. r]. If p, r, the sub
array has at most one element and is therefore already sorted. Otherwise, the divide step simply
computes an index q that partitions A[p .. r] into two sub arrays: A[p .. q], containing rn/2l
elements, and A[q + I .. r], containing In/2J elements.4
MERGE-SORT (A,p, r)
I if p < r
2 then q <- l(p + r)/2J
3 MERGE-SORT(A,p, q)
4 MERGE-SORT(A,q + I,r)
5 MERGE(A,p,q,r)
To sort the entire sequence A = (A[I],A[2], ... ,A[nJ), we call MERGESORT(A,
1,length[A]), where once again length[A] = n.
If we look at the operation of the procedure bottom-up when n is a power of two, the al-
gorithm consists of merging pairs of I-item sequences to form sorted sequences of length 2,
merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on, until two
sequences of length n /2 are merged to form the final sorted sequence of length n
13. 13
When an algorithm contains a recursive call to itself, its running time can often be
described by a recurrence equation or recurrence equation, which describes the overall running
time on a problem of size n in terms of the running time on smaller inputs. We can then use
mathematical tools to solve the recurrence and provide bounds on the performance of the
algorithm.
A recurrence for the running time of a divide-and-conquer algorithm is based on the three
steps of the basic paradigm. As before, we let T(n) be the running time on a problem of size n.
If the problem size is small enough, say n :: c for some constant c, the straightforward
solution takes constant time, which we write as 8( I). suppose we divide the problem into sub
problems, each of which is 1/ b the size of the original.
If we take D( n) time to divide the problem into sub problems and C(11) time to combine
the solutions to the sub problems into the solution to the original problem, we get the recurrence
BEST CASE, WORST CASE AND AVG. CASE EFFICIENCIES
Time efficiency – function in terms of n (input size)
For some algorithm, the running time depends not only on input size n also on the
individual elements.
eg. Linear search, here we go for worst case/ best case and avg. case efficiency.
We will mainly focus on worst-case analysis, but sometimes it is useful to do average one.
Worst- / average- / best-case
Worst-case running time of an algorithm
– The longest running time for any input of size n
– An upper bound on the running time for any input
– Guarantee that the algorithm will never take longer
– Sequential search for an item which is not present / present at the end of list.
– Sort a set of numbers in increasing order; and the data is in decreasing order
– The worst case can occur fairly often
– Provides the expected running time
Best-case running time
– if the algorithm is executed, the fewest number of instructions are executed
– takes shortest running time for any input of size n
– Sequential search for an item which is present at beginning of the list.
– sort a set of numbers in increasing order; and the data is already in
increasing order
14. 14
Average-case running time
– May be difficult to define what ―average‖ means, but gives the necessary
details about an algo‘s behavior on a typical /random input.
EXAMPLE: Sequential Search
A sequential search steps through the data sequentially until a match is found.
A sequential search is useful when the array is not sorted.
The basic operation count for
1. Best case input c(n)=1
2. Worst case input c(n)
Unsuccessful search --- n times
Successful search (worst) ---n times
3. Avg.case input
Here basic operation count is calculated as follows
Assumptions:
a) The probability of a successful search =p (0<=p<=1)
b) The probability of the first match occuring in the ith position of list is same for every i,
which is equal to p/n and the no of compare operations made by the algo. in such a
situation is i.
c) In case of unsuccessful search ,the no of comparisons made is n with the probability of
such a search is (1-p)
So c(n)=[1*p/n+2*p/n+…..i*p/n+….n*p/n]+n*(1-p)
=p/n(1+2+….i+…n)+n(1-p)
=p/n*n(n+1)/2+n(1-p)
c(n)=p(n+1)/2+n(1-p)
For successful search, p=1,c(n)=(n+1)/2
Unsuccessful search, p=0,c(n)=n
15. 15
ANALYSIS OF ALGORITHM USING DATA STRUCTURES:
The analysis of algorithm is made considering both qualitative and quantitative aspects to
get the solution that is economical in the use of computing and human resources which improves
the performance of an algorithm. A good algorithm usually possesses the following qualities and
capabilities.
They are simple but powerful and general solutions
They are user friendly
They can be easily updated
They are correct
They are able to be understood on a number of levels
They are economical in the use of computer time, storage and peripherals
They are independent to run on a particular computer
They can be used as subprocedures for other problems
The solution is pleasing and satisfying to its designer
IV. COMPUTATIONAL COMPLEXITY
Space Complexity
The space complexity of an algorithm is the amount of memory it needs to run to
completion
[Core dumps = the most often encountered cause is ―memory leaks‖ – the amount of
memory required larger than the memory available on a given system]
Some algorithms may be more efficient if data completely loaded into memory
1. Need to look also at system limitations
2. E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters,
etc.] – can I afford to load the entire collection?
Time Complexity
The time complexity of an algorithm is the amount of time it needs to run to completion
Often more important than space complexity
1. space available (for computer programs!) tends to be larger and larger
2. time is still a problem for all of us
Algorithms running time is an important issue
Space Complexity
The Space needed by each algorithm is the sum of the following components:
1. Instruction space
2. Data space
3. Environment stack space
Instruction space
16. 16
The space needed to store the compiled version of the program instructions
Data space
The space needed to store all constant and variable values
Environment stack space
The space needed to store information to resume execution of partially completed
functions
The total space needed by an algorithm can be simply divided into two parts from the 3
components of space complexity
1. Fixed part
2. Variable part
Fixed part
A fixed part space is independent of the characteristics of the inputs and outputs. This
part typically includes the instruction space, space for simple variables and fixed size component
variables, space for constants and so on
e.g. name of the data collection
same size for classifying 2GB or 1MB of texts
Variable part
A variable part space needed by component variables whose, size is dependent on the
particular problem instance being solved, the space needed by referenced variables and the
recursion stack space.
e.g. actual text
load 2GB of text VS. load 1MB of text
The space requirement S(P) of any algorithm or program P may be written as:
S(P)=C+Sp (instance characteristics)
C= Constant that denotes the fixed part of the space requirement
Sp= Variable Component depends on the magnitude of the inputs to and outputs from the
algorithm.
Example
void float sum (float* a, int n)
{
float s = 0;
for(int i = 0; i<n; i++) {
17. 17
s+ = a[i];
}
return s;
}
Space one word for n, one for a [passed by reference!], one for i constant space!
When memory was expensive we focused on making programs as space efficient as
possible and developed schemes to make memory appear larger than it really was (virtual
memory and memory paging schemes)Space complexity is still important in the field of
embedded computing
Time Complexity
The time T(P) taken by a program P is T(P)-Compile time +Run(or)Execution time
Compile time
It does not depend on the instance characteristics. We assume that a compiled program
will be run several times without recompilation
Run time
It depends on the instance characteristics denoted by tp. The tp(n) can be calculated by
the following form of expression
tp(n)= Ca ADD(n) + Cs SUB(n)+ Cm MUL(n)+ Cd DIV(n)+….
n = instance characteristics
Ca, Cs, Cm, Cd = time needed for addition, subtraction, multiplication and division
ADD, SUB, MUL, DIV= number of additions, subtractions, multiplications and divisions
performed for the program p on the instance characteristics n.
To find the value of tp(n) from the above expression is an impossible task, since the time
needed for Ca, Cs, Cm, Cd is often depends on the numbers being involved in the operation.
The value of tp(n) for any given n can be obtained experimentally, that is, the program
typed, compiled and run on a particular machine, the execution time is physically clocked, and
tp(n) is obtained.
The value of tp(n) depends on some factors, such as system load, the number of other
programs running on the computer at the time program p is run and so on. To overcome this
disadvantage, count only the program steps, where the time is required by each step is relatively
independent of the instant characteristics.
A program step is defined as syntactically or semantically meaningful segment of a program that
has an execution time that is independent of the instant characteristics.
The program statements are classified into three steps
1. Comments-Zero step
2. Assignment statement-One step
3. Iterative statement-finite number of steps.
18. 18
V. AMORTIZED ANALYSIS
In an amortized analysis, the time required to perform a sequence of data structure
operations is averaged over all the operations performed.
Amortized analysis can be used to show that the average cost of an operation is
small, if one averages over a sequence of operations, even though a single
operation might be expensive.
Amortized analysis differs from average case analysis in that probability is not
involved; an amortized analysis guarantees the average performance of each
operation in the worst case.
In the aggregate method of amortized analysis, we show that for all n, a sequence
of n operations takes worst-case time T( n) in total. In the worst case, the average
cost, or amortized cost, per operation is therefore T( n) / n. Note that this
amortized cost applies to each operation, even when there are several types of
operations in the sequence.
The other two methods we shall study in this chapter, the accounting method and
the potential method, may assign different amortized costs to different types of
operations.
Stack operations
In our first example of the aggregate method, we analyze stacks that have been
augmented with a new operation. Section II. J presented the two fundamental stack operations,
each of which takes O( I) time:
PUSH(S, x) pushes object x onto stack S.
Pop(S) pops the top of stack S and returns the popped object.
Since each of these operations runs in O( I ) time, let us consider the cost of each to be I.
The total cost of a sequence of n PUSH and POP operations is therefore n, and the actual running
time for n operations is therefore 9(n).
The situation becomes more interesting if we add the stack operation MULTIPOP(S,k),
which removes the k top objects of stack S, or pops the entire stack if it contains less than k
objects.
In the following pseudo code, the operation STACK-EMPTY returns TRUE if there are
no objects currently on the stack, and FALSE otherwise.
MULTIPOP(S, k)
1 while not STACK-EMPTY(S) and k 1= 0
2 do Pop(S)
3 k..-k-l
19. 19
VI. ASYMPTOTIC NOTATION
Complexity analysis: rate at which storage or time grows as a function of the problem size
Asymptotic analysis: describes the inherent complexity of a program, independent of
machine and compiler
Idea: as problem size grows, the complexity can be described as a simple
proportionality to some known function.
A) Big Oh (O)-Upper Bound
This notation is used to define the worst case running time of an algorithm and concerned
with very large values of n.
f(n) = O(g(n)) iff f(n) < cg(n)
for some constants c and n0, and all n > n0
B) Big Omega (Ω )-Lower Bound
This notation is used to describe the best case running time of algorithms and concerned
with large values of n
f(n) = Ω(g(n)) iff f(n) > cg(n)
for some constants c and n0, and all n > n0
C) Big Theta (Θ)-Two-way Bound
This notation is used to describe the average case running time of algorithms and
concerned with very large values of n
f(n) = Θ (g(n)) iff c1g(n) < f(n) < c2g(n)
for some constants c1, c2, and n0, and all n > n0
D) Little Oh (o)-Only Upper Bound
This notation is used to describe the worst case analysis of algorithms and concerned with
small values of n
f(n) = o(g(n)) iff
f(n) = O(g(n)) and f(n)≠ Ω(g(n))
To compare and rank order of growth of algorithm. 3 notaions O, Ω ,
Informal definition.
Let t(n) and g(n) be any non negative funs defined on the set of natural nos.
t(n)- algo‘s running time
g(n) – simple function to compare the count with.
i) O(g(n)) is the set of all fns with a smaller or same order of growth as g(n)
20. 20
Eg.
n E O(n2
) n3
^E O(n2
)
100n+5 E O(n2
) n4
+n+1^E O(n2
)
100n+5 E O(n)
1/2n(n-1) E O(n2
)
ii) Ω(g(n)) stands for set of all fns with a larger or same order of growth as g(n)
n3
E Ω(n2
)
1/2n(n-1) E Ω(n2
)
100n+5 ^E Ω(n2
)
iii) (g(n)) stands for set of all fns with the same order of growth as g(n)
n3 ^
E (n2
)
an2
+bn+c E (n2
)
100n+5 E (n)
Big Oh
f(N) = O(g(N))
There are positive constants c and n0 such that
o f(N) c g(N) when N n0
The growth rate of f(N) is less than or equal to the growth rate of g(N)
g(N) is an upper bound on f(N)
o We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the
right of n0, the value of f(n) always lies on or below cg(n).
Meaning: For all data sets big enough (i.e., n>n0), the algorithm always executes in less than
cf(n) steps in [best, average, worst] case.
The idea is to establish a relative order among functions for large n
c , n0 > 0 such that f(N) c g(N) when N n0
21. 21
f(N) grows no faster than g(N) for ―large‖ N
Big O Rules
• If is f(n) a polynomial of degree d, then f(n) is
• O(d), i.e.,
1. Drop lower-order terms
2. Drop constant factors
• Use the smallest possible class of functions
Say ―2n is O(n)‖ instead of ―2n is O(n2)‖
• Use the simplest expression of the class
Say ―3n + 5 is O(n)‖ instead of ―3n + 5 is O(3n)‖
Big-Oh: example
• Let f(N) = 2N2
. Then
– f(N) = O(N4
)
– f(N) = O(N3
)
– f(N) = O(N2
) (best answer, asymptotically tight)
• N2
/ 2 – 3N = O(N2
)
• 1 + 4N = O(N)
• 7N2
+ 10N + 3 = O(N2
) = O(N3
)
Big-Omega
• f(N) = (g(N))
• There are positive constants c and n0 such that
f(N) c g(N) when N n0
• The growth rate of f(N) is greater than or equal to the growth rate of g(N).
• c , n0 > 0 such that f(N) c g(N) when N n0
• f(N) grows no slower than g(N) for ―large‖ N
Big-Omega: example
22. 22
• Let f(N) = 2N2
. Then
– f(N) = (N)
– f(N) = (N2
) (best answer)
Big-Theta
• f(N) = (g(N)) iff
f(N) = O(g(N)) and f(N) = (g(N))
• The growth rate of f(N) equals the growth rate of g(N)
• f(n) is Θ(g(n)) if there are constants c‘ > 0 and c‘‘ > 0 and an integer constant n0 ≥ 1
such that c‘•g(n) ≤ f(n) ≤ c‘‘•g(n) for n ≥ n0
• Big-Theta means the bound is the tightest possible.
• the growth rate of f(N) is the same as the growth rate of g(N)
Big-Theta rules
• Example: Let f(N)=N2
, g(N)=2N2
– Since f(N) = O(g(N)) and f(N) = (g(N)),
thus f(N) = (g(N)).
• If T(N) is a polynomial of degree k, then
T(N) = (Nk
).
• For logarithmic functions,
T(logm N) = (log N)
Mathematical Expression Relative Rates of Growth
T(n) = O(F(n)) Growth of T(n) <= growth of F(n)
23. 23
T(n) = Ω(F(n)) Growth of T(n) >= growth of F(n)
T(n) = Θ(F(n)) Growth of T(n) = growth of F(n)
T(n) = o(F(n)) Growth of T(n) < growth of F(n)
Computation of step count using asymptotic notation
Asymptotic complexity can be determined easily without determining the exact step
count. This is done by first determining the asymptotic complexity of each statement in the
algorithm and then adds these complexities to derive the total step count.
Question Bank
UNIT I - PROBLEM SOLVING
PART – A (2 MARKS)
1. Define Modularity.
2. What do you mean by top down design?
3. What is meant by algorithm? What are its measures?
4. Give any four algorithmic techniques.
5. Write an algorithm to find the factorial of a given number
6. List the types of control structures
7. Define the top down design strategy
8. Define the worst case & average case complexities of an algorithm
9. What is meant by modular approach?
10. What is divide & conquer strategy?
11. What is dynamic programming?
12. What is program testing?
13. Define program verification
14. What is input/output assertion?
15. Define symbolic execution
16. Write the steps to verify a program segment with loops
17. What is CPU time?
18. Write at least five qualities & capabilities of a good algorithm
19. Write an algorithm to exchange the values of two variables
20. Write an algorithm to find N factorial (written as n!) where n>=0.
PART- B (16 MARKS)
1. Explain Top down design in detail.
2. (a) Explain in detail the types on analysis that can be performed on an algorithm (8)
(b) Write an algorithm to perform matrix multiplication algorithm and analyze the
same (8)
3. Design an algorithm to evaluate the function sin(x) as defined by the infinite series
expansion sin(x) = x/1!-x3/3! +x5/5!-x7/7! +……
4. Write an algorithm to generate and print the first n terms of the Fibonacci series
where
n>=1 the first few terms are 0, 1, 1, 2, 3, 5, 8, 13.
5. Design an algorithm that accepts a positive integer and reverse the order of its digits.
6. Explain the Base conversion algorithm to convert a decimal integer to its
corresponding
octal representation
24. 24
UNIT II - FUNDAMENTALS OF DATA STRUCTURES
Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists
– Queue and its Representation – Applications of Stack – Queue and Linked Lists.
25. 25
Unit: II - FUNDAMENTALS OF DATA STRUCTURES
I. ARRAYS
Array is a finite ordered set of homogeneous elements. Array size may of large or small,
but it must exist. An array must contains collection of elements of same datatype.
Array declaration in C is given below.
int a[100];
Here array name is ‗a‘, and size is 100. Each element is represented by its index, which
starts from 0. For example 1st
element index is 0, 2nd
element index is 1 and 100th
element
index is 99.
The two basic operations that access an array are extraction and storing. The extraction
operation is a function that accepts an array element by using array name and index. The storing
operation of value x in the index i is,
a[i] = x;
The smallest element of an array`s index is called its lower bound and in c is always 0, and
the highest element is called the upper bound. The number of elements in an array is called
range.
If lower bound is represented by ―lower‖ and upper bound is represented by ―upper‖, then
range = upper – lower + 1. For example for array a, lower bound is 0, upper bound is 99 and
range is 100.
An important feature of a C array is that neither the upper bound nor the lower bound may
be changed during a program`s execution. The lower bound is always fixed at 0, and the upper
bound is fixed at the time the program is written.
One very useful technique is to declare a bound as a constant identifier, so that work required
to modify the size of an array is minimized. For example, consider the following program
int a[100];
for(i = 0; i <100; a[i++] = 0);
To change the array to a larger (or smaller) size, the constant 100 must be changed in two
pieces, once in the declarations and once in the for statement. Consider the following equivalent
alternative,
#define NUMELTS 100
int a[NUMELTS];
26. 26
for(i = 0; i < NUMELTS; a[i++] = 0);
Now only a single change in the constant definition is needed to change the upper bound.
One Dimensional Array
A one-dimensional array is used when it is necessary to keep a larger number of items in
memory and reference all the items in a uniform manner. Consider an application to read 100
integers, and find its average.
#define NUMELTS 100
aver()
{
int num[NUMELTS];
int i;
int total;
float avg;
total = 0;
for(i = 0; i < NUMELTS; i++)
{
Scanf(―%d‖, &num[i]);
Total += num[i];
}
avg = total / NUMELTS;
printf(―Average = %f‖, avg);
}
The first statement (int b[100];) reseves 100 successive memory locations, each large enough
to contain a single integer. The address of the first of these locations is called the base address of
the array b.
Two-Dimensional array
27. 27
Two-Dimensional array is an array of another array. For example
int a[3][5];
This represents a array containing three elements. Each of these elements is itself an array
containing five elements. This can be represented in figure given below.
Total 15 (3 x 5) elements can be stored in this array. Each element can be accessed by its
corresponding row index and array index. Suppose if you want to access the first cell in the
second row means, just use a[1][0]. Like wise if you want to access second cell in third row,
use a[2][1].
Nested looping statement is used to access each element efficiently, sample code to read
values and full it in this array is given below.
for(i = 0; i < 3; i++)
{
for(j = 0; j < 5; j++)
scanf(―%d‖, &a[i][j]);
}
Multi-Dimensional array
C allows developers to declare array of dimensions more than two also. Three-Dimensional
array declaration is given below.
int [3][2][5];
This can be accessed using three nested looping statements. Developer can also declare more
than three dimensional arrays.
II. STRUCTURES
Row 0
Row 1
Row 2
Col 0 Col 1 Col 2 Col 3 Col 4
28. 28
A structure is a group of items in which each item is identified by its own identifier. In
programming language, a structure is called a ―record‖ and a member is called a ―field‖.
Consider the following structure declaration,
struct
{
Char first[10];
Char midinit;
Char last[20];
}sname, ename;
This declaration creates two structure variables, sname and ename, each of which contains
three members: first, midinit and last. Two of the members are character strings, and one is
single character. Declaration of the structure can be also in another format given below
struct nametype
{
Char first[10];
Char midinit;
Char last[20];
};
Struct nametype sname, ename;
The above definition creates a structure tag nametype containing three members. Once a
structure tag has been defined, variables sname and ename can be declared. An alternative
method for structure tag assigning is use of typedef definintion in C, which is given below
typedef struct
{
Char first[10];
Char midinit;
Char last[20];
} nametype;
nametype sname, ename;
29. 29
Structure variable sname contains three members and ename contains a separate three
members. Each member of a structure variable can be accessed using a dot(.) operator.
Consider a structure given below.
struct data
{
int a;
float b;
char c;
};
int main()
{
struct data x,y;
printf(―nEnter the values for first variablen‖);
scanf(―%d%f%c‖, &x.a,&x.b,&x.c);
printf(―nEnter the values for second variablen‖);
scanf(―%d%f%c‖, &y.a,&y.b,&y.c);
return 0;
}
Structure variable can also be an array variable. Looping statement is used to get input for
that structure array variable. Sample code is given below,
int main()
{
struct data x[5];
int i;
for(i = 0; i < 5; i++)
{
printf(―nEnter the values for %d variablen‖, (i+1));
scanf(―%d%f%c‖, &x[i].a, &x[i].b, &x[i].c);
}
30. 30
return 0;
}
STACK AND QUEUE
Stacks and queues are used to represent sequence of elements which can be modified by
insertion and deletion. Both stacks and queues can be implemented efficiently as arrays or as
linked lists.
III. STACK
A stack is a list with the restriction that inserts and deletes can be performed in only one
position, namely the end of the list called the top. The fundamental operations on a stack
are push, which is equivalent to an insert, and pop, which deletes the most recently inserted
element.
The most recently inserted element can be examined prior to performing a pop by use of
the top routine. Stacks are used often in processing tree-structured objects, in compilers (in
processing nested structures), and is used in systems to implement recursion. Stacks are also
known as LIFO.
Stack model
31. 31
(Stack model: only the top element is accessible)
REPRESENTATION OF STACK
A) Implementation of Stack using array
A stack K is most easily represented by an infinite array K[0], K[1], K[2],…. and an
index TOP of type integer. The stack K consist of K[0],…K[TOP] elements. Element at index
TOP (K[TOP]) is the top element of stack. Insertion of elements is called push and deletion of
element is called pop. The following code explains the operation push(K,a)
TOP = TOP + 1;
K[TOP] = a;
The following code explains the operation pop(K)
if(TOP<0) then
error;
else
X = K[TOP];
TOP = TOP – 1;
end if
Infinite array is not available. So use a finite array, of size n. In this case a push operation
must check whether overflow occurs.
B) Linked List Implementation of Stacks
Stack can be implemented using a singly linked list. We perform a push by inserting at
the front of the list. We perform a pop by deleting the element at the front of the list.
A top operation merely examines the element at the front of the list, returning its
value. Sometimes the pop and top operations are combined into one. Structure definition is
given below,
typedef struct node *node_ptr;
struct node
{
element_type element;
node_ptr next;
32. 32
};
typedef node_ptr STACK;
Routine to test whether a stack is empty-linked list implementation is given below,
int is_empty( STACK S )
{
return( S->next == NULL );
}
We merely create a header node; make_null sets the nextpointer to NULL. Routine to
create and empty stack-linked list implementation are given below,
STACK S;
S = (STACK) malloc( sizeof( struct node ) );
if( S == NULL )
fatal_error("Out of space!!!");
return S;
}
Void make_null( STACK S )
{
if( S != NULL )
S->next = NULL;
else
error("Must use create_stack first");
}
The push is implemented as an insertion into the front of a linked list, where the front of the list
serves as the top of the stack. Routine to push onto a stack-linked list implementation is given
below,
push( element_type x, STACK S )
{
node_ptr tmp_cell;
tmp_cell = (node_ptr) malloc( sizeof ( struct node ) );
if( tmp_cell == NULL )
fatal_error("Out of space!!!");
else
{
tmp_cell->element = x;
33. 33
tmp_cell->next = S->next;
S->next = tmp_cell;
}
}
The top is performed by examining the element in the first position of the list. Routine to return
top element in a stack--linked list implementation is given below,
element_type top( STACK S )
{
if( is_empty( S ) )
error("Empty stack");
else
return S->next->element;
}
Routine to pop from a stack--linked list implementation is given below,
Void pop( STACK S )
{
node_ptr first_cell;
if( is_empty( S ) )
error("Empty stack");
else
{
first_cell = S->next;
S->next = S->next->next;
free( first_cell );
}
}
One problem that affects the efficiency of implementing stacks is error testing. Our
linked list implementation carefully checked for errors. A pop on an empty stack or
a pushon a full stack will overflow the array bounds and cause a crash.
APPLICATIONS OF STACKS
34. 34
A) Balancing Symbols
Every brace, bracket, and parenthesis must correspond to their left counterparts. The
sequence [()] is legal, but [(]) is wrong. It is easy to check these things using stack. Just check for
balancing of parentheses, brackets, and braces and ignore any other character that appears.
Make an empty stack. Read characters until end of file. If the character is an open anything,
push it onto the stack. If it is a close anything, then if the stack is empty report an error.
Otherwise, pop the stack. If the symbol popped is not the corresponding opening symbol, then
report an error. At end of file, if the stack is not empty report an error.
B) Postfix Expressions
Suppose we have a pocket calculator and would like to compute the cost of a shopping trip.
To do so, we add a list of numbers and multiply the result by 1.06; this computes the purchase
price of some items with local sales tax added. If the items are 4.99, 5.99, and 6.99, then a
natural way to enter this would be the sequence
4.99 + 5.99 + 6.99 * 1.06 =
Depending on the calculator, this produces either the intended answer, 19.05, or the scientific
answer, 18.39. Most simple four-function calculators will give the first answer, but better
calculators know that multiplication has higher precedence than addition.
On the other hand, some items are taxable and some are not, so if only the first and last items
were actually taxable, then the sequence
4.99 * 1.06 + 5.99 + 6.99 * 1.06 =
would give the correct answer (18.69) on a scientific calculator and the wrong answer (19.37)
on a simple calculator. A scientific calculator generally comes with parentheses, so we can
always get the right answer by parenthesizing, but with a simple calculator we need to remember
intermediate results.
A typical evaluation sequence for this example might be to multiply 4.99 and 1.06, saving this
answer as a1. We then add 5.99 and a1, saving the result in a1. We multiply 6.99 and 1.06, saving
the answer in a2, and finish by adding al and a2, leaving the final answer in al. We can write this
sequence of operations as follows:
4.99 1.06 * 5.99 + 6.99 1.06 * +
This notation is known as postfix or reverse Polish notation. For instance, the postfix
expression
6 5 2 3 + 8 * + 3 + *
35. 35
is evaluated as follows: The first four symbols are placed on the stack. The resulting stack is
Next a '+' is read, so 3 and 2 are popped from the stack and their sum, 5, is pushed.
Next 8 is pushed.
Now a '*' is seen, so 8 and 5 are popped as 8 * 5 = 40 is pushed.
Next a '+' is seen, so 40 and 5 are popped and 40 + 5 = 45 is pushed.
Now, 3 is pushed.
Next '+' pops 3 and 45 and pushes 45 + 3 = 48.
36. 36
Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed.
The time to evaluate a postfix expression is O(n), because processing each element in the input
consists of stack operations and thus takes constant time. The algorithm to do so is very simple.
Notice that when an expression is given in postfix notation, there is no need to know any
precedence rules; this is an obvious advantage.
C) Infix to Postfix Conversion
Not only can a stack be used to evaluate a postfix expression, but we can also use a stack to
convert an expression in standard form (otherwise known as infix) into postfix. Suppose we want
to convert the infix expression
a + b * c + ( d * e + f ) * g
into postfix. A correct answer is a b c * + d e * f + g * +.
When an operand is read, it is immediately placed onto the output. Operators are not
immediately output, so they must be saved somewhere. The correct thing to do is to place
operators that have been seen, but not placed on the output, onto the stack. We will also stack left
parentheses when they are encountered. We start with an initially empty stack.
If we see a right parenthesis, then we pop the stack, writing symbols until we encounter a
(corresponding) left parenthesis, which is popped but not output.
If we see any other symbol ('+','*', '(' ), then we pop entries from the stack until we find an
entry of lower priority. One exception is that we never remove a '(' from the stack except when
processing a ')'. For the purposes of this operation, '+' has lowest priority and '(' highest. When
the popping is done, we push the operand onto the stack.
Finally, if we read the end of input, we pop the stack until it is empty, writing symbols onto
the output.
To see how this algorithm performs, we will convert the infix expression above into its
postfix form. First, the symbol ais read, so it is passed through to the output. Then '+' is read and
37. 37
pushed onto the stack. Next b is read and passed through to the output. The state of affairs at this
juncture is as follows:
Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so nothing
is output and '*' is put on the stack. Next, c is read and output. Thus far, we have
The next symbol is a '+'. Checking the stack, we find that we will pop a '*' and place it on the
output, pop the other '+', which is not of lower but equal priority, on the stack, and then push the
'+'.
The next symbol read is an '(', which, being of highest precedence, is placed on the stack.
Then d is read and output.
We continue by reading a '*'. Since open parentheses do not get removed except when a
closed parenthesis is being processed, there is no output. Next, e is read and output.
The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read and
output .
38. 38
Now we read a ')', so the stack is emptied back to the '('. We output a '+'.
We read a '*' next; it is pushed onto the stack. Then g is read and output.
The input is now empty, so we pop and output symbols from the stack until it is empty.
As before, this conversion requires only O(n) time and works in one pass through the input.
IV) QUEUE
Queues supports insertions (called enqueues) at one end (called the tail or rear) and
deletions (called dequeues) from the other end (called the head or front). Queues are used in
operating systems and networking to store a list of items that are waiting for some resource.
Queues are also known as FIFO.
Model of a queue
ARRAY IMPLEMENTATION OF QUEUES
Both the linked list and array implementations give fast O(1) running times for every
operation. Array implementation of queue is given below
39. 39
For each queue data structure, keep an array, QUEUE[], and the positions q_front and
q_rear, which represent the ends of the queue.
Keep track of the number of elements that are actually in the queue, q_size. The cells
that are blanks have undefined values in them.
In particular, the first two cells have elements that used to be in the queue.
To enqueue an element x, Increment q_size and q_rear, then setQUEUE[q_rear] = x. To
dequeue an element, set the return value to QUEUE[q_front], decrementq_size, and then
increment q_front.
There is one potential problem with this implementation. After 10 enqueues, the queue
appears to be full, since q_front is now 10, and the next enqueue would be in a nonexistent
position.
However, there might only be a few elements in the queue, because several elements may
have already been dequeued.
The simple solution is that whenever q_front or q_rear gets to the end of the array, it is
wrapped around to the beginning. The following figure shows the queue during some operations.
This is known as acircular array implementation.
40. 40
There are two warnings about the circular array implementation of queues. First, it is
important to check the queue for emptiness, because a dequeue when the queue is empty will
return an undefined value, silently.
Secondly, some programmers use different ways of representing the front and rear of a
queue. For instance, some do not use an entry to keep track of the size, because they rely on the
base case that when the queue is empty, q_rear = q_front - 1. The size is computed implicitly by
comparing q_rear and q_front. This is a very tricky way to go, because there are some special
cases, so be very careful if you need to modify code written this way. If the size is not part of the
structure, then if the array size isA_SIZE, the queue is full when there are A_SIZE -1 elements,
since only A_SIZE different sizes can be differentiated, and one of these is 0.
Type declarations for queue--array implementation is given below.
struct queue_record
{
41. 41
unsigned int q_max_size; /* Maximum # of elements */
/* until Q is full */
unsigned int q_front;
unsigned int q_rear;
unsigned int q_size; /* Current # of elements in Q */
element_type *q_array;
};
typedef struct queue_record * QUEUE;
Routine to test whether a queue is empty-array implementation, is given below.
int is_empty( QUEUE Q )
{
return( Q->q_size == 0 );
}
Routine to make an empty queue-array implementation, is given below.
Void make_null ( QUEUE Q )
{
Q->q_size = 0;
Q->q_front = 1;
Q->q_rear = 0;
}
Routines to enqueue-array implementation, is given below
Void enqueue( element_type x, QUEUE Q )
{
if( is_full( Q ) )
error("Full queue");
else
{
Q->q_size++;
Q->q_rear = succ( Q->q_rear, Q );
Q->q_array[ Q->q_rear ] = x;
}
}
APPLICATION OF QUEUES
There are several algorithms that use queues to give efficient running times.
42. 42
When jobs are submitted to a printer, they are arranged in order of arrival. Thus,
essentially, jobs sent to a line printer are placed on a queue.
In computer networks, there are many network setups of personal computers in which the
disk is attached to one machine, known as the file server. Users on other machines are
given access to files on a first-come first-served basis, so the data structure is a queue.
Calls to large companies are generally placed on a queue when all operators are busy.
Queues are mostly used in graph theory.
V) LIST
List is an abstract data type (ADT). A general list is of the form a1, a2, a3, . . . , an. a1, a2
are called keys or values of list. The size of this list is n. The list of size 0 is called null list. List
can be implemented contiguously (array) or non-contiguously (linked list).
List Operations
A lot of operations are available to perform on the list ADT. Some popular operations are
find – returns the position of the first occurrence of a key (value)
insert – deletes key from the specified position in the list
delete – inserts key from the specified position in the list
find_kth – returns the element in some position
print_list – displays all the keys in the list.
make_null – makes the list as null list
For example consider a list, 34, 12, 52, 16, 12, then
find(52) – return 3
insert(x,4) – makes the list into 34, 12, 52, x, 16, 12
delete(3) – makes the list into 34, 12, x, 16, 12.
Simple Array Implementation of Lists
List can be implemented using an array. Even if the array is dynamically allocated, an
estimate of the maximum size of the list is required. Usually this requires a high over-estimate,
which wastes considerable space. This could be a serious limitation, especially if there are many
lists of unknown size.
Merits of List using array
An array implementation allows print_list and find to be carried out in linear time, which
is as good as can be expected, and the find_kth operation takes constant time.
43. 43
Demerits of List using array
However, insertion and deletion are expensive. For example, inserting at position 0
(which amounts to making a new first element) requires first pushing the entire array down one
spot to make room, whereas deleting the first element requires shifting all the elements in the list
up one, so the worst case of these operations is O(n). On average, half the list needs to be moved
for either operation, so linear time is still required. Merely building a list by n successive inserts
would require quadratic time. Because the running time for insertions and deletions is so slow
and the list size must be known in advance, simple arrays are generally not used to implement
lists.
Linked Lists
The linked list consists of a series of structures, which are not necessarily adjacent in memory.
Each structure contains the element variable and a pointer variable to a structure containing its
successor. Element variable is used to store a key (value). A pointer variable is just a variable
that contains the address where some other data is stored. This pointer variable is called as next
pointer.
A linked list
Thus, if p is declared to be a pointer to a structure, then the value stored in p is interpreted
as the location, in main memory, where a structure can be found. A field of that structure can be
accessed by p->field_name.
Consider a list contains five structures, which happen to reside in memory locations 1000,
800, 712, 992, and 692 respectively. The next pointer in the first structure has the value 800,
which provides the indication of where the second structure is. The other structures each have a
pointer that serves a similar purpose. Of course, in order to access this list, we need to know
where the first cell can be found. A pointer variable can be used for this purpose.
Linked list with actual pointer values
44. 44
To execute print_list(L) or find(L,key), we merely pass a pointer to the first element in the
list and then traverse the list by following the next pointers. This operation is clearly linear-
time, although the constant is likely to be larger than if an array implementation were used.
The find_kth operation is no longer quite as efficient as an array implementation; find_kth(L,i)
takes O(i) time and works by traversing down the list in the obvious manner.
The delete command can be executed in one pointer change. The result of deleting the third
element in the original list is shown below.
Deletion from a linked list
The insert command requires obtaining a new cell from the system by using an malloc call
(more on this later) and then executing two pointer maneuvers.
Insertion into a linked list
Programming Details
Keep a sentinel node, which is sometimes referred to as a header or dummy node. Our
convention will be that the header is in position 0. Linked list with a header is given below.
Type declarations for linked lists is given below.
typedef struct node *node_ptr;
struct node
{
element_type element;
node_ptr next;
};
45. 45
typedef node_ptr LIST;
typedef node_ptr position;
Function to test whether a linked list is empty
int is_empty( LIST L )
{
return( L->next == NULL );
}
Empty list with header
Function to test whether current position is the last in a linked list
Int is_last( position p, LIST L )
{
return( p->next == NULL );
}
Find function returns the position of the element in the list of some element
position find ( element_type x, LIST L )
{
position p;
p = L->next;
while( (p != NULL) && (p->element != x) )
p = p->next;
return p;
}
Our fourth routine will delete some element x in list L. We need to decide what to do if x
occurs more than once or not at all. Our routine deletes the first occurrence of x and does
nothing if x is not in the list. To do this, we find p, which is the cell prior to the one containing
x, via a call to find_previous.
void delete( element_type x, LIST L )
46. 46
{
position p, tmp_cell;
p = find_previous( x, L );
if( p->next != NULL ) /* Implicit assumption of header use */
{ /* x is found: delete it */
tmp_cell = p->next;
p->next = tmp_cell->next; /* bypass the cell to be deleted */
free( tmp_cell );
}
}
position find_previous( element_type x, LIST L )
{
position p;
p = L;
while( (p->next != NULL) && (p->next->element != x) )
p = p->next;
return p;
}
Insert routine allows us to pass an element to be inserted along with the list L and a position p.
Our particular insertion routine will insert an element after the position implied by p.
insert( element_type x, LIST L, position p )
{
position tmp_cell;
tmp_cell = (position) malloc( sizeof (struct node) );
if( tmp_cell == NULL )
fatal_error("Out of space!!!");
else
{
tmp_cell->element = x;
tmp_cell->next = p->next;
p->next = tmp_cell;
}
}
To delete a list
Void delete_list( LIST L )
{
47. 47
position p, tmp;
p = L->next; /* header assumed */
L->next = NULL;
while( p != NULL )
{
tmp = p->next;
free( p );
p = tmp;
}
}
DOUBLY LINKED LISTS
To traverse lists backwards add an extra field to the data structure, containing a pointer to
the previous cell. The cost of this is an extra link, which adds to the space requirement and also
doubles the cost of insertions and deletions because there are more pointers to fix. On the other
hand, it simplifies deletion, because you no longer have to refer to a key by using a pointer to the
previous cell.
A doubly linked list
CIRCULARLY LINKED LISTS
A popular convention is to have the last cell keep a pointer back to the first. This can be
done with or without a header (if the header is present, the last cell points to it), and can also be
done with doubly linked lists (the first cell's previous pointer points to the last cell).
A double
circularly linked list
Question Bank
Unit II - LISTS, STACKS AND QUEUES
PART – A (2 MARKS)
1. Define ADT.
2. Give the structure of Queue model.
3. What are the basic operations of Queue ADT?
4. What is Enqueue and Dequeue?
5. Give the applications of Queue.
6. What is the use of stack pointer?
7. What is an array?
8. Define ADT (Abstract Data Type).
48. 48
9. Swap two adjacent elements by adjusting only the pointers (and not the data) using
singly
linked list.
10. Define a queue model.
11. What are the advantages of doubly linked list over singly linked list?
12. Define a graph
13. What is a Queue?
14. What is a circularly linked list?
15. What is linear list?
16. How will you delete a node from a linked list?
17. What is linear pattern search?
18. What is recursive data structure?
19. What is doubly linked list?
PART- B (16 MARKS)
1. Explain the implementation of stack using Linked List.
2. Explain Prefix, Infix and postfix expressions with example.
3. Explain the operations and the implementation of list ADT.
4. Give a procedure to convert an infix expression a+b*c+(d*e+f)*g to postfix notation
5. Design and implement an algorithm to search a linear ordered linked list for a given
alphabetic key or name.
6. (a) What is a stack? Write down the procedure for implementing various stack
operations(8)
(b) Explain the various application of stack? (8)
7. (a) Given two sorted lists L1 and L2 write a procedure to compute L1_L2 using only
the
basic operations (8)
(b) Write a routine to insert an element in a linked list (8)
8. What is a queue? Write an algorithm to implement queue with example.
49. 49
UNIT III – TREES
Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and
External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –
Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –
Hashing
50. 50
Unit: III TREES
TREES
A tree is a finite set of one or more nodes such that there is a specially designated node
called the root, and zero or more non empty sub trees T1, T2…TK, each of whose roots are
connected by a directed edge from Root R.
Fig: Tree
PRELIMINARIES
Root
A node which doesn‘t have a parent. In the above tree, the root is A.
Node
Item of Information
Leaf
A node which doesn‘t have children is called leaf or Terminal node. Here B, K, L, G, H,
M, J are leafs.
Siblings
Children of the same parents are said to be siblings, Here B, C, D, E are siblings, F, G are
siblings. Similarly, I, J, K, L are Siblings.
Path
A path from node n1to nk is defined as a sequence of nodes n1, n2, n3 ….nk such that n1 is
the parent of ni+1. There is exactly only one path from each node to root.
In fig path from A to L is A, C, F, L where A is the parent for C, C is the parent of F and F
is the parent of L.
Length
The length is defined as the number of edges on the path. In fig the length for the path A
to L is 3.
A
B C D E
F G F I J
K L
51. 51
Degree
The number of sub trees of a node is called its degree.
Degree of A is 4
Degree of C is 2
Degree of D is 1
Degree of H is 0
The degree of the tree is the maximum degree of any node in the tree
In fig the degree of the tree is 4.
Level
The level of a node is defined by initially letting the root be at level one, if a node is at
level L then it as children are at level L+1
Level of A is 1
Level of A, B, C, D is 2
Level of F, G, H, I, J is 3
Level of K, L, M is 4
Depth
For any node n, the depth of n is the length of the unique path from root to n.
The depth of the root is zero
In fig Depth of node F is 2
Depth of node L is 3
Height
For any node n, the height of the node n is the length of the longest path from n to the
leaf.
The height of the leaf is zero
In fig Height of node F is 1
Height of L is 0
II. BINARY TREES
A binary tree is a special form of a tree. A binary tree is more important and frequently
used in various applications.
A T (Binary tree) is defined as,
T is empty or
T contains a specially designated node called the root of T, and the remaining nodes of T
from two disjoint binary trees T1and T2 which are called left-sub tree and the right sub-
tree respectively.
52. 52
Fig: A sample binary tree with 11 nodes
Two possible situations of a binary tree are (a) Full binary tree (b) Complete Binary tree
Full binary tree
A binary tree is a full binary tree, if it contains maximum possible number of nodes in
all level. The full binary tree of height 4
Fig: Full Binary tree of height 4
Complete binary tree
A binary tree is said to be a complete binary tree, if all its level, except possibly the last
level, have the maximum number of possible nodes, all the nodes at the last level appear as far
left as possible.
A complete binary tree of height 4
Fig: A complete binary tree of height 4
III.REPRESENTATION OF BINARY TREE
Two common methods used for representing this structure
1.Linear or sequential representation. (Using an array)
2.Linked representation (Using Pointers)
53. 53
A. Linear Representation of a Binary tree
In this representation, the nodes are stored level by level, starting from the zero level
where only root node is present. Root node is stored in the first memory location.
some rules to decide the location of any of a tree in the array.
The root node is at location 1.
For any node with index i, 1< i≤ n
o PARENT(i)=[i/2] when i=1, there is no parent
o L CHILD(i)=2*I If 2*i>n, then I has no left child
o R CHILD(i)=2*i+1 If 2*i+1>n, then I has no right child
Consider a binary tree for the following expression (A-B)+C*(D/E)
Fig: Binary Tree
The representation of the same Binary tree using array is shown in fig below
A full Binary tree and the index of its various nodes when stored in an array is shown in Fig
below
B. Linked representation of Binary Tree
When we inserting a new node or deleting a node in a linear representation, We require
data movement up and down in the array then it will take excessive amount of processing time.
Linear representation of binary trees has a number of overheads. All these overheads are
taken care of linked representation
Structure of a node in linked rep:
DATA
54. 54
RC LC
RC-Right Child LC=Left Child
Here LC & RC are two Link fields to store the address of left child and right child of a node.
DATA is the information of the node.
The tree with 9 nodes are represented as:
Fig: Binary Tree
OPERATIONS ON BINARY TREES
There are number of primitive operations that can be applied to a binary tree. If p is a
pointer to a node and of a binary tree, the function info(p) returns the contents of nd.
The functions left(p), right(p), father(p), and brother(p) return pointers to the left son of
nd, the right son of nd, the father of nd, and the brother of nd, respectively.
These functions return the null pointer if nd has no left son, right son, father, brother.
Finally, the logical functions isleft(p) and isright(p) return the value true if nd is a left or right
son, respectively, of some other node in the the tree, and false otherwise.
Note that the functions isleft(p),isright(p), and brother(p) can be implemented using the
functions left(p),right(p) and father(p).
Example isleft may be implemented as
Q=father(p);
if (q==null)
return(false);
if (left(q)==p)
return(true);
return(false);
or even simpler, asfather(p) && p== left(father(p)).isright may be implemented in a
similar manner, or by calling isleft.brother(p) may be implemented using isleft or isright as
if (father(p)==null)
return(null);
if (isleft(p))
55. 55
return(right(father(p)));
return(left(father(p));
In constructing a binary tree, the operations maketree, setleft and setright are useful.
Maketree(x) creates a new binary tree consisting of a single node with information field x and
returns a pointer to that node. Setleft(p,x) accepts a pointer p to a binary node with no left son. It
creates a new left son of node(p) with information field x.setright(p,x) is analogous to setleft
except that it creates a right son of node(p).
Make_Empty
This operation is mainly for initialization. Some programmers prefer to initialize the first
element as a one-node tree, but our implementation follows the recursive definition of trees more
closely. It is also a simple routine, as evidenced below
template <class Etype>
void
Binary_Search _ Tree<Etype>::
Make_Empty (Tree_Mode<Etype> * & T)
{
if (T!= NULL)
{
Make_Empty( T-> Left);
Make _ Empty( T -> Right) ;
T = NULL;
}
}
Find
This operation generally requires returning a pointer to the node in tree T that has key X
or NULL if there is no such node. The protected Find routine does The public routine then
returns nonzero if the Find succeeded, and sets Last_Find. If the find failed, Zero is returned, and
Last_Find point to NULL. The structures tree makes this simple.
template <class type>
Tree_Node<Etype>
Binary _Search_Tree<EType>::
Find (Const Etype & X,Tree-Node<EType>*T) const
{
if (T==NULL)
56. 56
return NULL;
if (x<T-> Element)
return Find(X,T->Left);
else
if (x>T->Element)
return Find( X,T-> right);
else
return T;
}
Find_Min and Find_Max
Internally, these routines return the position of the smallest and largest elements in the
tree, respectively.
Although returning the exact values of these elements seem more reasonable, this would
be inconsistently with the Find operation.
It is important that similar-looking operations do similar things. To perform a Find_Min
start at the root and go left as long as there is a left child.
The stopping point is the smallest element. The Find_Max routine is the same, except that
branching is to the right child. The public interface is similar to that of the Find routine
template <class Etype>
Tree_Node<Etype>
Binary _Search_Tree<EType>::
Find_Min (Tree_Node <Etype>* T) const
{
if (T==NULL)
return NULL;
else
if (T->Left== NULL)
return T;
else
return Find_Min(T->Left);
}
57. 57
template <class Etype>
void
Binary _Search_Tree<EType>::
Find_Max (Tree_Node <Etype>* T) const
{
if (T!=NULL)
while (T->Right!=NULL)
T=T->Right;
return T;
}
BINARY TREE REPRESENTATIONS
Node Representation of Binary Trees
Tree nodes may be implemented with array elements or as allocation of a dynamic
variable. Each node contains info, left, right and father fields. The left, right and father fields of a
node point to the node‘s left son, right son, and father respectively.
Using the array implementation,
#define NUMNODES 500
Struct nodetype {
int info;
int left;
int right;
int father;
};
Struct nodetype node[NUMNODES];
Under this representations, the operation info(p),left(p),right(p), and father(p) are
implemented by references to node[p].info, node[p].left, node[p].right and node[p].father
respectively.
To implement isleft and isright more efficiently, we include within each node an
additional flag isleft. The value of this flag is TRUE if the node is a left son and FALSE
otherwise. The root is uniquely identified by a NULL value(0) in its father field.
58. 58
Alternatively, the sign of the father field could be negative if the node is a left son or
positive if it is a right son. The pointer to a node‘s father then given by the absolute value of the
father field. The isleft or isright operations would then need only examine the sign of the father
field.
To implement brother(p) more efficiently, brother field is included in each node. Once
the array of nodes is declared, create an available list by executing the following statements.
int avail, I;
{
avail=1;
for(i=0;i<NUMNODES;i++)
node[i].left=i+1;
node[NUMNODES-1].left=0;
}
Note that the available list is not a binary tree but a linear list whose nodes are linked
together by the left field. Each node in a tree is taken from yhe available pool when needed and
returned to the available pool when no longer in use. This representation is called the linked
array representation of a binary tree.
A node may be defined by
Struct nodetype
{
int info;
struct nodetype *left;
struct nodetype *right;
struct nodetype *father;
};
typedef struct nodetype *NODEPTR;
The operations info(p),left(p),right(p),and father(p) would be implemented by the
references to p->info, p->left, p->right, and p->father respectively. An explicit available list is
not needed. The routines getnode and freenode simply allocate and free nodes using the routines
malloc and free. This representation is called the dynamic node representation of a binary tree.
59. 59
Both the linked array representation and the dynamic node representation are
implementations of an abstract linked representation (also called node representation)in which
implicit explicit pointers link together the nodes of a binary tree.
The maketree function, which allocates a node and sets it as the root of a single-node
binary tree, may be written as
NODEPTR maketree(x)
int x;
{
NODEPTR P;
P=getnode();
p->info=x;
p->left=NULL;
p->right=NULL;
return(p);
}
The routine setleft(x) sets a node with contents x as the left son of node(p):
Setleft(p,x);
NODEPTR P;
int x;
{
if (p==NULL)
printf(―void insertion‖);
else if(p->left!=NULL)
printf(―invalid insertionn‖);
else
p->left=maketree(x);
}
The routine setright(p, x) to create a right son of node(p) with contents x is
similar.
INTERNAL AND EXTERNAL NODES
By definition leaf nodes have no sons. Thus in the linked representation of binary trees,
left and right pointers are needed only in non-leaf nodes. Sometimes two separate set of nodes
are used for non-leaves and leaves. Non-leaf nodes contain info, left and right fields and are
allocated as dynamic records or as an array of records managed using an available list. Leaf
60. 60
nodes do not contain a left or right fields and are kept as a single info array that is allocated
sequentially as needed.
Alternatively they can be allocated as dynamic variables containing only an info value.
Each node can also contain a father field, if necessary. When this distinction is made between
non-leaf and leaf nodes, non-leaves are called internal nodes and leaves are called external
nodes.
IMPLICIT ARRAY REPRESENTATION OF BINARY TREES
In general, the nodes n of an almost complete binary tree can be numbered from 1 to n,
so that the number assigned a left son is twice the number assigned its father, and the number
assigned a right son is 1 more than twice the number assigned its father.
We can extend this implicit array representation of almost complete binary trees to an
implicit array representation of binary trees generally. This can be done by identifying an almost
complete binary tree that contains the binary tree being represented.
The Fig (a) illustrates two binary trees, and Fig (b) illustrates the smallest almost
complete binary trees that contain them. Finally Fig(c) illustrates the array representations of
these almost complete binary trees, and by extension, of the original binary trees.
The implicit array representation is also called the sequential representation, because it
allows a tree to be implemented in a contiguous block of memory rather than via pointers
connecting widely separated nodes.
Under the sequential representation, an array element is allocated whether or not it
serves to contain a node of a tree. Therefore, flag unused array elements as non-existent or null
tree nodes.
Fig (a) Two Binary trees
A
B C
D E
F G
H
I J
K L
M
A
B C
D E
GF
61. 61
Fig (b) Almost complete extensions
0 1 2 3 4 5 6 7 8 9 10 11 12
A B C D E F G
0 1 2 3 4 5 6 7 8 9
H I J K L M
Fig(c) Array representations
Example
The program to find duplicate numbers in an input list, as well as the routines
maketree and setleft, using the sequential representation of binary trees
#define NUMNODES 500
Struct nodetype
{
int info;
int used;
} node[NUMNODES];
main()
{
int p, q, number;
scanf(―%d‖,&number);
maketree(number);
H
I J
K L
M
62. 62
while(scanf(―%d‖,&number)!=EOF)
{
p=q=0;
while (q<NIMNODES && node[q].used && number!= node[p].info)
{
p=q;
if (number<node[p].info)
q=2*p+1;
else
q=2*p+2;
}
if (number==node[p].info)
printf(―%d is a duplicaten‖, number);
else if (number<node[p].info)
setleft(p,number);
else
setright(p,number);
}
}
maketree(x)
int x;
{
int p;
node[0].info=x;
node[0].used=TRUE;
for (p=1;p<NUMNODES;p++)
node[p].used=FALSE;
}
setleft (p, x)
int p,x;
{
int q;
q=2*p+1;
63. 63
if (q>=NUMNODES)
error (―array overflow‖);
else if (node[q].used)
error (―invalid insertion‖);
else {
node[q].info=x;
node[q].used=TRUE;
}
}
The routine for setright is similar. Note that the routine maketree initializes the
fields info and used to represent a tree with a single node.
IV. BINARY TREE TRAVERSALS
Traversing means visitng each node only once. Tree traversal is a method for visiting all
the nodes in the tree exactly once. There are three types of tree traversal techniques, namely
Inorder Traversal
Preorder Traversal
Postorder Traversal
Inorder Traversal
The Inorder traversal of a binary tree is performed as
Traverse the left subtree in inorder
Visit the root
Traverse the right subtree in inorder
Example
Fig: Inorder 10, 20, 30
20
10 30
20
10 30
64. 64
Fig: Inorder A B C D E G H I J K
Recursive routine for Inorder Traversal
Void Inorder (Tree T)
{
if ( T!= NULL)
{
Inorder (T->left);
printElement (T->Element);
Inorder (T->right);
}}
Preorder Traversal
The preorder traversal of a binary tree is performed as
Visit the root
Traverse the left subtree in preorder
Traverse the right subtree in preorder
Example
D
C I
A
B
G K
E H J
20
10 30
65. 65
Fig: Preorder 20, 10, 30
Fig: Preorder D C A B I G E H K J
Recursive routine for Inorder Traversal
Void Preorder (Tree T)
{
if ( T!= NULL)
{
printElement (T->Element);
Preorder (T->left);
Preorder (T->right);
}
}
Postorder Traversal
The postorder traversal of a binary tree is performed as
Traverse the left subtree in postorder
Traverse the right subtree in postorder
Visit the root
D
C I
A
B
G K
E H J
66. 66
Example
Fig: Postorder 10, 30, 20
Fig: Postorder B A C Education H G J K I D
Recursive routine for Inorder Traversal
Void Postorder (Tree T)
{
if ( T!= NULL)
{
Postorder (T->left);
Postorder (T->right);
printElement (T->Element);
}
}
V. HUFFMAN ALGORITHM
The inputs to the algorithm are n, the number of symbols in the original alphabet, and
frequency, an array of size at least n such that frequency [i] is the re1ative frequency of the ith
symbol.
The algorithm assigns values to an array code of size at least n, so that code[il contains
the code assigned to the ith symbol.
20
10 30
D
C I
A
B
G K
E H J
67. 67
The algorithm also constructs an array position of size at least n such that position[il
points to the node representing the ith symbol.
This array is necessary to identify the point in the tree from which to start in
constructing the code for a particular symbol in the alphabet. Once the tree has been
constructed, the isleft operation introduced earlier can be used to determine whether 0 or 1
should be placed at the front of the code as we climb the tree.
The info portion of a tree node contains the frequency of occurrence of the symbol
represented by that node.
A set root nodes is used to keep pointers to the roots of partial binary trees that are not yet
left or right subtrees.
Since this set is modified by removing elements with minimum frequency, combining
them and then reinserting the combined element into the set, it is implemented as an ascending
priority queue of pointers, ordered by the value of the info field of the pointers' target nodes.
We use the operations pqinsert, to insert a pointer into the priority queue, and
pqmindelete, to remove the pointer to the node with the smallest info value from the priority
queue.
We may outline Huffman's algorithm as follows:
/* initialize the set of root nodes */
rootnodes = the empty ascending priority queue;
/* construct a node for each symbol */
68. 68
Fig: Huffman trees
The huffman tree is strictly binary. Thus, if there are n symbols in the alphabet, the Huffman tree
can be presented by an array of nodes of size 2n-1.
REPRESENTING LISTS AS BINARY TREES
In this section we introduce a tree representation of a linear list in which
operations of fmding the kth element of a list and deleting a specific e1ement are
relatively efficient.
It is also possible to build a list with given elements using representation. We also
briefly consider the operation of inserting a single new element.
A list may be represented by a binary tree as illustrated in Fig. In Fig(a) shows a
list in the usual linked format. while Fig(b) and (c) show two binary tree
representations of the list.
Elements of the original list are represented by leaves of the tree (shown as
squares in the figure).
Whereas non leaf node tree (shown as circles in the figure) are present as part of
the internal tree structure.
69. 69
Associated with each leaf node are the contents of the corresponding list
edlement. Associated with each nonleaf node is a count representing the number
of leaves in the node's left subtree.
The elements of the list in their original sequence are assigned to the leaves of the
tree in the inorder sequence of the leaves. Note from Fig several binary trees can
represent the same list.
Fig: A list and two corresponding Binary Trees
Finding the kth Element
To justify using so many extra tree nodes to represent a list, we present an
algorithm to find the kth element of a list represented by a tree.
Let tree point to the root of the tree, and let lcount(p) represent the count
associated with the nonleaf node pointed to by p [lcount(p) is the number ofleaves
in the tree rooted at node(left(p))].
The following algorithm sets the variable find to point to the leaf containing the
kth element of the list.
o The algorithm maintains a variable r containing the number of list elements
remaining to be counted.
o At the beginning of the algorithm r is initialized to k. At each nonleaf
node(p), the algorithm determines from the values of rand lcount(p)
whether the kth element is located in the left or right subtree.
o If the leaf is in the left subtree, the algorithm proceeds directly to that
subtree. If the desired leaf is in the right subtree, the algorithm proceeds to
that subtree after reducing the value of r by the value of lcount(p).
o k is assumed to be less than or equal to the number of elements in the list.
r = k;
P = tree;
while (p is not a leaf node)
if(r <=lcount(p))
70. 70
p = left(p);
else {
r -= lcount(p);
p = right(p);
}
find = p;
Fig(a) illustrates finding the fifth element of a list in the tree of Fig(b), and Fig(b)
illustrates finding the eighth element in the tree of Fig(c).
The dashed line represents the path taken by the algorithm down the tree to the
appropriate leaf. We indicate the value of r (the remaining number of elements to
be counted) next to each node encountered by the algorithm.
The number of tree nodes examined in finding the kth list element is less than or equal to
1 more than the depth of the tree (the longest path in the tree from the root to a leaf). Thus four
nodes are examined in Fig (a) in finding the fifth element of the list, and also in Fig(b) in finding
the eighth element. If a list is represented as a linked structure, four nodes are accessed in finding
the fifth element of the list [that is, the operation p = next(p) is performed four times] and seven
nodes are accessed in finding the eighth element.
Although this is not a very impressive saving, consider a list with 1000 elements. A
binary tree of depth 10 is sufficient to represent such a list, since log2 1000 is less than 10.
Thus, finding the kth element using such a binary tree would require examining no more
than 11 nodes. Since the number of leaves of a binary tree increases as 2d, where d is the depth
of the tree, such a tree represents a relatively efficient data structure for finding the kth element
of a list.
If an almost complete tree is used, the kth element of an n-element list can be found in at
most log2n + 1 node accesses, whereas k accesses would be required if a linear linked list were
used.
Fig: Finding the nth element of a tree-represented list
71. 71
Deleting an Element
It involves only resetting a left or right pointer in the father of the deleted leaf dl to
null.Fig illustrates the results of this algorithm for a tree in which the nodes C, D, and B are
deleted in that order. Make sure that you follow the actions of the algorithm on these examples.
Note that the algorithm maintains a 0 count in leaf nodes for consistency, although the count is
not required for such nodes. Note I!o that the algorithm never moves up a nonleaf node even if
this could be done. We can easily modify the algorithm to do this but have not done so for
reasons that will become apparent shortly.
This deletion algorithm involves inspection of up to two nodes at each level. Thus, the
operation deleting the kth element of a list represented by a tree requires a number of node
accesses approximately equal to three times the tree depth.
Although deletion from a linked list requires acesses to only three nodes. For large lists,
therefore, the tree representation is more efficient.
Fig: Deletion Algorithm
TREE SEARCHING
There are several ways of organizing files as trees and some associated searching
algorithms.
In previous, we presented a method of using a binary tree to store a file in order to make
sorting the file more efficient. In that method, all the left descendants of a node with key key
have keys that are less than key, and all right descendants have keys that are greater than or equal
to key.
The inorder such a binary tree yields the file in ascending key order.
Such a tree may also be used as a binary search tree. Using binary tree noation,the
algorithm for searching for the key key in such a tree is as follows
p=tree;
while(p!=NULL && KEY!=k(p))
p=(key< k(p)) ? left(p):right(p);
return(p);
72. 72
The efficiency of the search process can be improved by using a sentinel, as in sequential
searching.
A sentinel node, with a separate external pointer pointing to it, remains allocated with
the tree. All left or right tree pointers that do not point to another tree node now point to this
sentinel node instead of equalling null. When a search is performed, the argument key is first
inserted into the sentinel node, thus guaranteeing that it will be located in the tree.
A sorted array can be produced from a binary search tree by traversing tree in inorder and
inserting each element sequentially into the array as it is visited. On the other hand, there are
many binary search trees that correspond to a given sorted array. Viewing the middle element of
the array as the root of a tree and viewing the remaining elements recursively as left and right
subtrees produces a relatively balanced binary search tree in Fig(a). Viewing the first element of
the array as the root of a tree and each successive element as the right predecessor produces a
very unbalanced binary tree in Fig (b).
The advantage of using a binary search tree over an array is that a tree enables search,
insertion, and deletion operations to be performed efficiently. If an array used, an insertion or
deletion requires that approximately half of the elements array be moved. (Why?) Insertion or
deletion in a search tree, on the other requires that only a few pointers be adjusted.
Fig (a) A sorted array and two of its binary tree representations
73. 73
Fig(b) cont..
Inserting into a Binary search Tree
The following algorithm searches a binary search tree and inserts a new record into the
tree if the search is unsuccessful.
q = null;
p = tree;
while (p != null) {
if (key == k(p))
return (p) ;
q = p;
if (key < k(p))
p = left(p);
else
p = right(p);
v = maketree(rec, key);
if (q == null)
tree = v;
else
if (key < k( q) )
left(q) = v;
74. 74
else
right( q) = v;
return ( v) ;
Note that after a new record is inserted, the tree retains the property of being sorted in an inorder
traversal
Deleting from a Binary Search Tree
We now present an algorithm to delete a node with key key from a binary search tree.
There are three cases to consider. If the node to be deleted has no sons, may be deleted without
further adjustment to the tree. This is illustrated in Fig (a).
If the node to be deleted has only one subtree, its only son can be moved up to take its
place. This is illustrated in Fig (b). If, however, the node p to be deleted has two subtrees. its
inorder successor s (or predecessor) must take its place. The inorder successor cannot have a left
subtree.
Thus the right son of s can be moved up to the place of s. This is illustrated in Fig(c),
where the node with key 12
Replaces the node with key 11 and is replaced, in turn by the node with the key 13. In the
algorithm below, if no node with key key exists in the tree, the tree is left unchanged.
Fig(a) Deleting node with key 15
Fig(b) Deleting node with key 5
76. 76
if (f!=p)
{
left (p)=right(p);
right (rp)=right(p);
}
left(rp)=left(p);
}
if (q==null)
tree=rp;
else
(p==left(q))? ,eft(q)=rp:right(q)=rp;
freenode(p);
return;
VI. SORTING AND SEARCHING TECHNIQUES
Sorting is the operation of arranging the records of a table according to the key value of
each record. A table of a file is an ordered sequence of records r[1],r[2],…r[n] each containing a
key k[1],k[2]…k[n]. The table is sorted based on the key.
A sorting algorithm is said to be stable if it preserves the order for all records. There are
Internal Sorting
External Sorting
Internal Sort:
All records to be sorted are kept internally in the main memory
External Sort:
If there are large number of records to be stored, they must be kept in external files
on auxiliary storage.
INTERNAL SORTING
A) INSERTION SORT
Insertion sort works by taking elements from the list one by one and inserting them in
their current position into the new sorted list.
Insertion sort consists of N-1 passes, where N is the number of elements to be sorted. The
ith pass of insertion sort will insert the ith element A[i] into its right place among A[1],A[2]..A[i-
1].
77. 77
After doing this insertion the records occupying A[1]..A[i] are in sorted order.
Procedure
Void Insertion_Sort (int a[], int n)
{
int i, j,temp;
for ( i=0;i<n;i++)
{
temp=a[i];
for (j=I;j>0 && a[j-1]> temp; j--)
{
a[j]=a[j-1];
}
a[j]=temp;
}
}
Example
Consider an unsorted array
20 10 60 40 30 1
Passes of Insertion sort
Analysis
Worst Case Analysis O(N2
)
Best Case Analysis O(N)
ORIGINAL 20 10 60 40 30 15 POSITIONS MOVED
After i=1 10 20 60 40 30 15 1
After i=2 10 20 60 40 30 15 0
After i=3 10 20 40 60 30 15 1
After i=4 10 20 30 40 60 15 2
After i=4 10 15 20 30 40 60 4
Sorted Array 10 15 20 30 40 60
78. 78
Average Case Analysis O(N2
)
B) SHELL SORT
Shell Sort was invented by Donald Shell. It improves upon bubble sort and insertion sort
by moving out of order elements more than one position at a time. It works by arranging the data
sequence in a two dimensional array and then sorting the columns of the array using insertion
sort.
In shell sort the whole array is first fragmented into K segments, where K is preferably a
prime number. After the first pass the whole array is partially sorted. In the next pass, the value
of K is reduced which increases the size of each argument and reduces the number of segments.
The next value of K is chosen so that it is relatively prime to its previous value. The
process is repeated until K=1 at which array is sorted. The insertion sort is applied to each
segment, so each successive segment is partially sorted.
The shell sort is also called the Diminishing Increment Sort, because the value of K
decreases continuously.
Procedure
Void shellsort (int A[],int N)
{
int i ,j,k,temp;
for (i=k;i<N;i++)
{
temp=A[i];
for( j=I;j>=k &&A[j-k]>temp;j=j-k)
{
A[j]=A[j-k];
}
A[j]=temp;
}
}
Example
Consider an unsorted array
81 94 11 96 12 35 17 95 28 58
Here N=10, the first pass K=5(10/2)
79. 79
81 94 11 96 12 35 17 95 28 58
After first pass
35 17 11 28 12 81 94 95 96 58
In second pass, K is reduced to 3
After second pass
28 12 11 35 17 81 58 95 96 94
In third pass, K is reduced to 1
The final sorted array is
11 12 17 28 35 58 81 94 95 96
Analysis
Worst Case Analysis O(N2
)
Best Case Analysis O(N log N)
Average Case Analysis O(N1.5
)
C) QUICK SORT
The basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960 and
formally introduced quick sort in 1962.
It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in
many situations because it is not difficult to implement, it is a good "general purpose" sort and it
consumes relatively fewer resources during execution.
Good points
It is in-place since it uses only a small auxiliary stack.
It requires only n log(n) time to sort n items.
It has an extremely short inner loop
This algorithm has been subjected to a thorough mathematical analysis, a very precise
statement can be made about performance issues.
Bad Points
It is recursive. Especially if recursion is not available, the implementation is extremely
complicated.
It requires quadratic (i.e., n2
) time in the worst-case.
It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to
perform badly.
80. 80
Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . .
q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . .
r]. Then the two sub arrays are sorted by recursive calls to Quick sort. The exact position of the
partition depends on the given array and index q is computed as a part of the partitioning
procedure.
QuickSort
1. If p < r then
2. q= Partition (A, p, r)
3. Recursive call to Quick Sort (A, p, q-1)
4. Recursive call to Quick Sort (A, q + 1, r)
Note that to sort entire array, the initial call Quick Sort (A, 1, length[A])
As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted.
Then array is then partitioned on either side of the pivot. Elements that are less than or equal to
pivot will move toward the left and elements that are greater than or equal to pivot will move
toward the right.
Partitioning the Array
Partitioning procedure rearranges the sub arrays in-place.
1. PARTITION (A, p, r)
2. x ← A[p]
3. i ← p-1
4. j ← r+1
5. while i<j do
6. Repeat j ← j-1
7. until A[j] ≤ x
8. repeat i ← i+1
9. until A[i] ≥ x
10. Exchange A[i] ↔ A[j]
11. Exchange A[p] ↔ A[j]
12. return j
Partition selects the first key, A[p] as a pivot key about which the array will partitioned:
Keys ≤ A[p] will be moved towards the left .
Keys ≥ A[p] will be moved towards the right.
The running time of the partition procedure is (n) where n = r - p +1 which is the
number of keys in the array. Another argument that running time of PARTITION on a subarray
of size (n) is as follows: