Daa chapter5

2.1.1. Introduction
We have learned that in order to write a computer program that performs
some task, we must construct a suitable algorithm. However, the algorithm we
construct is unlikely to be unique – there are likely many algorithms that perform
the same task. The question then arises as to whether some of these algorithms are
in any sense better than others. Algorithm analysis is the study of this question.
In this chapter we will analyze algorithms, two for each of the following tasks:
 ordering a list of values, and
 finding the position of a value within a sorted list.
Algorithm analysis should begin with a clear statement of the task to be
performed. This allows us to both check that the algorithm is correct and ensure that
the algorithms we are comparing perform the same task.
Although there in general many ways that algorithms might be compared, we
will focus our attention on the two that are of primary importance to many data
processing algorithms:
time complexity (how the number of steps required depends on the length of the
input)
space complexity (how the amount of extra memory or storage required depends on
the length of the input)
2.1.1. Review of searching techniques
Sequential Search
ALGORITHM SequentialSearch (A [0..n], K)
//The algorithm implements sequential search with a search key as a
sentinel //Input: An array A of n elements and a search key K
//Output: The position of the first element in A[0..n - 1] whose value is // equal to K
or -1 if no such element is found
A[n]=K

i=0
while A[i] = K do
i=i + 1
if i < n return i
else return
Let we have 5 nos.(n=5) 25,31,42,71,105 and we have to find any element
in the list.
Analysis
Let we have 5 nos.(n=5) 25,31,42,71,105 and we have to find any element in
the list.
Best case efficiency
Let we have to find 25 in List =>25,31,42,71,105
k=25
25 is present at the first position
Since only single comparison is made to search the element so we say that it is
best case efficiency
CBest (n)=1
Worst case efficiency
If we want to search the element which is present at the last of the list or not
present at all in the list then such cases are called the worst case efficiency.
let we have to find 105 in List =>25,31,42,71,105
k=105
Therefore we have to make 5 (=n) comparisons to search the element
CWorst (n)=n

And if we have to find 110
k=110
Since the element is not in the list even then we have to make 5 (=n)
comparisons to search the element
CWorst (n)=n
Average case efficiency
Let the element is not present at the first or the last position
Let it is present somewhere in middle of the list
We know that probability of a successful search = p where 0≤p≤1.
And probability of unsuccessful search = 1-p
Let the element we are searching for is present at position 'i' in the list
Therefore probability of the element to be found is given by p/n.
Therefore CAvg (n)
=[1*p/n + 2*p/n + ... + i*p/n + ... + n*p/n] + n(1-p)
=p/n[1+2+...+i+...+n] + n(1-p)
=p/n[n*(n+1)/2] + n(1-p)
=p[(n+1)/2] + n(1-p)
I II
Case 1. If element is available therefore p=1 (for successful search) Now
substituting p=1 in above equation.
CAvg (n) = 1[(n+1)/2] + n(1-1)
=(n+1)/2
Case 2. If element is unavailable therefore p=0 (for unsuccessful search) Now
substituting p=0 in above eqn
CAvg (n) = 0[(n+1)/2] + n(1-0)
=n
Therefore on average half of list will be searched to find the element in the list
Binary Search
Description: Binary tree is a dichotomic divide and conquer search
algorithm. Ti inspects the middle element of the sorted list. If equal to the sought
value, then the position has been found. Otherwise, if the key is less than the middle
element, do a binary search on the first half, else on the second half.
Algorithm:
Algorithm can be implemented as recursive or non-recursive algorithm.

ALGORITHM BinSrch ( A[0 … n-1], key)
//implements non-recursive binary search
//input: Array A in ascending order, key k
//output: Returns position of the key matched else -1
l→0
r→n-1
while l ≤ r do
m→( l + r) / 2
if key = = A[m]
return m
else
if key < A[m]
r→m-1
else
l→m+1
return -1
If searching for 23 in the 10 element array
Analysis
•Input size: Array size, n
•Basic operation: key comparison
•Depend on
Best – key matched with mid element
Worst – key not found or key sometimes in the list
•Let C(n) denotes the number of times basic operation is executed. Then

Cworst(n) = Worst case efficiency. Since after each comparison the algorithm
divides the problem into half the size, we have
Cworst(n) = Cworst(n/2) + 1 for n > 1
C(1) = 1
•Solving the recurrence equation using master theorem, to give the number of times
the search key is compared with an element in the array, we have:
C(n) = C(n/2) + 1 a = 1 b = 2
f(n) = n0 ; d = 0
case 2 holds:
C(n) = Θ (nd
log n)
=Θ (n0
log n)
=Θ ( log n)
Applications of binary search:
 Number guessing game
 Word lists/search dictionary etc
Advantages:
 Efficient on very big list
 Can be implemented iteratively/recursively
Limitations:
 Interacts poorly with the memory hierarchy
 Requires given list to be sorted
 Due to random access of list element, needs arrays instead of linked list.
2.1.2. SelectionSort
Problem:Given a list of n orderable items (e.g., numbers, characters from some
alphabet, character strings), rearrange them in nondecreasing order.
Selection Sort
ALGORITHM SelectionSort(A[0..n - 1])
//Input: The algorithm sorts a given array by selection sort
//Output: Array A[0..n - 1] sorted in ascending order
for i=0 to n - 2 do
min=i
for j=i + 1 to n - 1 do
if A[j ]<A[min]
min=j

swap A[i] and A[min]
Example
Unsorted list
1st iteration:
Smallest = 5
2 < 5, smallest = 2
1 < 2, smallest = 1
4 > 1, smallest = 1
3 > 1, smallest = 1
Swap 5 and 1
2nd iteration:
Smallest = 2
2 < 5, smallest = 2
2 < 4, smallest = 2
2 < 3, smallest = 2
No Swap
3rd iteration:
Smallest = 5
4 < 5, smallest = 4
3 < 4, smallest = 3
Swap 5 and 3
4th iteration:
Smallest = 4
4 < 5, smallest = 4
No Swap

Finally,
Performance Analysis of the selection sort algorithm:
Average Case / Worst Case / Best Case: Θ(n2
)
No. of comparisons : Θ(n2
)
No. of swaps : Θ(n) in worst/average case & 0 in best case At most the algorithm
requires N swaps, once you swap an element into place, you never touch it again..
2.1.3. Bubble Sort
Compare adjacent elements of the list and exchange them if they are out of
order.Then we repeat the process,By doing it repeatedly, we end up ‗bubbling up‘
the largest element to the last position on the list
Algorithm
BubbleSort(A[0..n - 1])
//The algorithm sorts array A[0..n - 1] by bubble sort
//Input: An array A[0..n - 1] of orderable elements
//Output: Array A[0..n - 1] sorted in ascending order
for i=0 to n - 2 do
for j=0 to n - 2 - i do
if A[j + 1]<A[j ]
swap A[j ] and A[j + 1]
Example:

The first 2 passes of bubble sort on the list 89, 45, 68, 90, 29, 34, 17. A new
line is shown after a swap of two elements is done. The elements to the right of the
vertical bar are in their final positions and are not considered in subsequent
iterations of the algorithm
Bubble Sort the analysis
Clearly, the outer loop runs n times. The only complexity in this analysis in
the inner loop. If we think about a single time the inner loop runs, we can get a
simple bound by noting that it can never loop more than n times. Since the outer
loop will make the inner loop complete n times, the comparison can't happen more
than O(n2) times.
The number of key comparisons for the bubble sort version given above is
the same for all arrays of size n.

The number of key swaps depends on the input. For the worst case of decreasing
arrays, it is the same as the number of key comparisons.
Observation: if a pass through the list makes no exchanges, the list has been sorted
and we can stop the algorithm Though the new version runs faster on some inputs, it
is still in O(n2
) in the worst and average cases. Bubble sort is not very good for big
set of input. However bubble sort is very simple to code.
Finally
Worst Case: Θ(n2
)
Best Case: Θ(n) ; on already sorted
No. of comparisons: Θ(n2
) in worst case & best case
No. of swaps: Θ(n2
) in worst case & 0 in best case
2.1.4. Merge sort
Definition:
Merge sort is a sort algorithm that splits the items to be sorted into two
groups, recursively sorts each group, and merges them into a final sorted sequence.
Features:
 Is a comparison based algorithm
 Is a stable algorithm
 Is a perfect example of divide & conquer algorithm design strategy
 It was invented by John Von Neumann
Algorithm:
ALGORITHM Mergesort ( A[0… n-1] )
//sorts array A by recursive mergesort
//input: array A
//output: sorted array A in ascending order
if n > 1
copy A[0… (n/2 -1)] to B[0… (n/2 -1)]
copy A[n/2… n -1)] to C[0… (n/2 -1)]
Mergesort ( B[0… (n/2 -1)] )
Mergesort ( C[0… (n/2 -1)] )
Merge ( B, C, A )
ALGORITHM Merge ( B[0… p-1], C[0… q-1], A[0… p+q-1] )

//merges two sorted arrays into one sorted array
//input: arrays B, C, both sorted
//output: Sorted array A of elements from B & C
I→0
j→0
k→0
while i < p and j < q do
if B[i] ≤ C[j]
A[k] →B[i]
i→i + 1
else
A[k] →C[j]
j→j + 1
k→k + 1
if i == p
copy C [ j… q-1 ] to A [ k… (p+q-1) ]
else
copy B [ i… p-1 ] to A [ k… (p+q-1) ]
Example :

Analysis:
•Input size: Array size, n
•Basic operation: key comparison
•Best, worst, average case exists:

Worst case: During key comparison, neither of the two arrays becomes empty
before the other one contains just one element.
Let C(n) denotes the number of times basic operation is executed. Then
C(n) = 2C(n/2) + Cmerge(n) for n > 1
C(1) = 0
Where, Cmerge(n) is the number of key comparison made during the merging stage.
In the worst case:
Cmerge(n) = 2 Cmerge(n/2) + n-1 for n > 1
Cmerge(1) = 0
Solving the recurrence equation using master theorem:
C(n) = 2C(n/2) + n-1 for n > 1
C(1) = 0
Here a = 2
b = 2
f(n) = n; d = 1
Therefore 2 = 21
, case 2 holds
C(n) = Θ (nd
log n)
= Θ (n1
log n)
= Θ (n log n)
Advantages:
 Number of comparisons performed is nearly optimal.
 Mergesort will never degrade to O(n2)
 It can be applied to files of any size
Limitations:
 Uses O(n) additional memory.
2.1.5. Quick Sort
Definition:Quick sort is a well –known sorting algorithm, based on divide &
conquer approach.
The steps are:
 Pick an element called pivot from the list
 Reorder the list so that all elements which are less than the pivot come
before the pivot and all elements greater than pivot come after it. After this
partitioning, the pivot is in its final position. This is called the partition
operation

 Recursively sort the sub-list of lesser elements and sub-list of greater
elements.
Features:
 Developed by C.A.R. Hoare
 Efficient algorithm
 NOT stable sort
 Significantly faster in practice, than other algorithms
Algorithm
ALGORITHM Quicksort (A[ l …r ])
//sorts by quick sort
//input: A sub-array A[l..r] of A[0..n-1],defined by its left and right indices l and r
//output: The sub-array A[l..r], sorted in ascending order if l < r
Partition (A[l..r]) // s is a split position Quicksort(A[l..s-1])
Quicksort(A[s+1..r]
ALGORITHM Partition (A[l ..r])
//Partitions a sub-array by using its first element as a pivot
//input: A sub-array A[l..r] of A[0..n-1], defined by its left and right indices l and r
(l < r)
//output: A partition of A[l..r], with the split position returned as this function‘s
value
p→A[l]
i→l
j→r + 1;
Repeat
repeat i→i + 1 until A[i] >=p //left-right scan
repeat j→j – 1 until A[j] < p //right-left scan
if (i < j) //need to continue with the scan
swap(A[i], a[j])
until i >= j //no need to scan
swap(A[l], A[j])
return j
Example: Sort by quick sort the following list: 4, 2, 6, 5, 3, 9

Analysis:
Input size: Array size, n
Basic operation: key comparison
Best, worst, average case exists:

Best case: when partition happens in the middle of the array each time.
Worst case: When input is already sorted. During key comparison, one half is
empty, while remaining n-1 elements are on the other partition.
Let C(n) denotes the number of times basic operation is executed in worst
case: Then
C(n) = C(n-1) + (n+1) for n > 1 (2 sub-problems of size 0 and n-1
respectively)
C(1) = 1
Best case:
C(n) = 2C(n/2) + Θ(n) (2 sub-problems of size n/2 each)
Solving the recurrence equation using backward substitution/
master theorem, we have:
C(n) = C(n-1) + (n+1) for n > 1; C(1) = 1
C(n) = Θ (n2
)
C(n) = 2C(n/2) + Θ(n).
=Θ (n1
log n)
=Θ (n log n)
Note:
The quick sort efficiency in average case is Θ( n log n) on random input.
2.1.6. Insertion sort
Insertion sort is a very simple method to sort numbers in an ascending or
descending order. This method follows the incremental method. It can be compared
with the technique how cards are sorted at the time of playing a game.
The numbers, which are needed to be sorted, are known as keys. Here is the
algorithm of the insertion sort method.
Algorithm: Insertion-Sort(A)
for j = 2 to A.length
key = A[j]
i = j – 1
while i > 0 and A[i] > key
A[i + 1] = A[i]
i = i -1
A[i + 1] = key
Analysis
Run time of this algorithm is very much dependent on the given input.

 If the given numbers are sorted, this algorithm runs in O(n) time.
 If the given numbers are in reverse order, the algorithm runs in O(n2
) time.
Unsorted list as
1st iteration:
Key = a[2] = 13, a[1] = 2 < 13
Swap, no swap
2nd iteration:
Key = a[3] = 5, a[2] = 13 > 5
Swap 5 and 13
Next, a[1] = 2 < 13, Swap, no swap
3rd iteration:
Key = a[4] = 18 a[3] = 13 < 18,
a[2] = 5 < 18, a[1] = 2 < 18
Swap, no swap
4th iteration:
Key = a[5] = 14 a[4] = 18 > 14
Swap 18 and 14
Next, a[3] = 13 < 14, a[2] = 5 < 14,
a[1] = 2 < 14 So, no swap
Finally,

the sorted list is
Expected final analysis
Average Case / Worst Case: Θ(n2
) ; happens when input is already sorted in
descending order
Best Case: Θ(n) ; when input is already sorted
No. of comparisons: Θ(n2
) in worst case & Θ(n) in best case
No. of swaps: Θ(n2
) in worst/average case & 0 in Best case
Algorithmic comparisons
Sorting algorithms provide the ability for one to impress another computer
scientist with his or her knowledge of algorithmic understanding. First of all,
O(n*log n) acts as a lower bound to how quickly we can sort using a comparison-
based sorting algorithm (we can sort faster than that in certain special cases).
Of the algorithms which share the same order class, a second consideration
is then the value of the big-O order constant. Even though it can have quadratic
worst-case time, Quicksort is often considered the fastest algorithm on random
arrays; the implication is that it has the smaller order constant, than, say Heapsort.
2.1.7. Shellsort
Shellsort, named after its inventor, Donald Shell, relies upon the fact that
insertion sort does very well if the array is nearly sorted. Another way of saying
this, is that insertion sort does well if it does not have to move each item "too far".
The idea is to repeatedly do insertion sort on all elements at fixed, decreasing
distances apart: hk, hk-1, ..., h1= 1. The choice of increments turns out to be crucial. It
turns out that a good choice of increments are these:
h1= 1, h2= 3, h3= 7, ..., hk= 2k
–1

These increments are termed the Hibbard increments. The original increments
suggested by the algorithm's inventor were simple powers of 2, but the Hibbard
increments do provably much better. To be able to use the hk increment, you need
an array of size at least hk+1.
Psuedo-code
The psuedo-code for shellSort using the Hibbard increments is as follows:
find k0 so that 2k0
- 1 < size
for (k = k0; k > 0; --k) { // from larger increments to smaller
inc = 2k
- 1
for (i = 0; i < inc; ++i) {
insertionSort( [ a[i], a[i+inc], a[i+2*inc], ... ] )
}
}
The fact that the last increment in the sequence is 1 means that regular insertion sort
is done at the last step and therefore the array is guaranteed to be sorted by this
procedure. The point is that when the increments are larger, there are fewer
elements and they will be moved further than simply interchanging adjacent
elements. At the last step, we do regular insertion sort and hopefully the array is
"nearly sorted" which makes insertion sort come close to its best case behavior of
running in linear time.
The notion that this is an speed improvement seems initially far-fetched. There are
two enclosing for loops to get to an insertion sort, thus this algorithm has four
enclosing loops.
Demo
The following is a demo of the sorting process of an array of size 11. Only 4
subarrays of 7 elements apart fit.

ShellSort Analysis
Stability
Shellsort is not stable. It can be readily demonstrated with an array of size 4 (the
smallest possible). Instability is to be expected because the increment-based sorts
move elements distances without examining of elements in between.
Shellsort has O(n*log(n)) best case time
The best case, like insertion sort, is when the array is already sorted. Then the
number of comparisons for each of the increment-based insertion sorts is the length
of the array. Therefore we can determine:
comparisons =
n, for 1 sort with elements 1-apart (last step)
+ 3 * n/3, for 3 sorts with elements 3-apart (next-to-last step)
+ 7 * n/7, for 7 sorts with elements 7-apart
+ 15 * n/15, for 15 sorts with elements 15-apart
+ ...
Each term is n. The question is how many terms are there? The number of terms is
the value k such that
2k
- l < n

So k < log(n+1), meaning that the sorting time in the best case is less than n *
log(n+1) = O(n*log(n)).
Shellsort worst case time is no worse than quadratic
The argument is similar as previous, but with a different overall computation.
comparisons ≤
n2
, for 1 sort with elements 1-apart (last step)
+ 3 * (n/3)2
, for 3 sorts with elements 3-apart (next-to-last step)
+ 7 * (n/7)2
, for 7 sorts with elements 7-apart
+ 15 * (n/15)2
, for 15 sorts with elements 15-apart
+ ...
And so, with a bit of arithmetic, we can see that the number of comparisons is
bounded by:
n2
* (1 + 1/3 + 1/7 + 1/15 + 1/31 + ...)
< n2
* (1 + 1/2 + 1/4 + 1/8 + 1/16 + ...)
= n2
* 2
The last step uses the sum of the geometric series.
Shellsort worst and average times
The point about this algorithm is that the initial sorts, acting on elements at
larger increments apart involve fewer elements, even considering a worst-case
scenario. At these larger increments, "far-away" elements are swapped so that the
number of inversions is dramatically reduced. At the later sorts of sorts at smaller
increments, the behavior then comes closer to optimal behavior.
It can be proved that the worst-case time is sub-quadratic at O(n3/2
) = O(n1.5
). As can
be expected, the proof is quite difficult. The textbook remarks that the average case
time is unknown although conjectured to be O(n5/4
) = O(n1.25
). The textbook also
mentions other increment sequences which have been studied and seen to produce
even better performance.
2.1.8. Heap sort
The heap sort algorithm looks very much like a selection sort but through
the use of a data structure called a "heap" the process is speeded up considerably.
A heap sort algorithm that works by first organizing the data to be sorted into a
special type of binary tree called a heap. The heap itself has, by definition, the
largest value at the top of the tree, so the heap sort algorithm must also reverse the
order. It does this with the following steps:
1. Remove the topmost item (the largest) and replace it with the rightmost
leaf. The topmost item is stored in an array.
2. Re-establish the heap.
3. Repeat steps 1 and 2 until there are no more items left in the heap.

4. The sorted elements are now stored in an array.
A heap sort is especially efficient for data that is already stored in a binary tree. In
many cases, however, the quick sort algorithm is more efficient.
Analysis
Time complexity of heapify is O(Logn). Time complexity of
createAndBuildHeap() is O(n) and overall time complexity of Heap Sort is
O(nLogn).
2.1.9. External sorting
External sorting is required when the data being sorted do not fit into the
main memory of a computing device (usually RAM) and instead they must reside in
the slower external memory (usually a hard drive).
Example of Two-Way Sorting:
N = 14, M = 3 (14 records on tape Ta1, memory capacity: 3 records.)
Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
Sorting of runs:
Step 1 : Read 3 records in main memory, sort them and store them on Tb1:
17, 3, 29  3, 17, 29
Tb1: 3, 17, 29
Step 2 :Read the next 3 records in main memory, sort them and store them on Tb2
56, 24, 18  18, 24, 56
Tb2: 18, 24, 56
4, 9, 10 4, 9, 10
Tb1: 3, 17, 29, 4, 9, 10
6, 45, 36  6, 36, 45
Tb2: 18, 24, 56, 6, 36, 45

(there are only two records left)
11, 43 -> 11, 43
Tb1: 3, 17, 29, 4, 9, 10, 11, 43
At the end of this process we will have three runs on Tb1 and two runs on Tb2:
Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
Tb2: 18, 24, 56 | 6, 36, 45 |
Merging of runs
B1: Merging runs of length 3 to obtain runs of length 6.
Source tapes: Tb1 and Tb2, result on Ta1 and Ta2.
Merge the first two runs (on Tb1 and Tb2) and store the result on Ta1.
Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
Tb2: 18, 24, 56 | 6, 36, 45 |

Thus we have the first two runs on Ta1 and Ta2, each twice the size of the original
runs:
Next we merge the third runs on Tb1 and Tb2 and store the result on Ta1. Since
only Tb1 contains a third run, it is copied onto Ta1:

B2. Merging runs of length 6 to obtain runs of length 12.
Source tapes: Ta1 and Ta2. Result on Tb1 and Tb2:
After merging the first two runs from Ta1 and Ta2, we get a run of length 12, stored
on Tb1:
The second set of runs is only one run, copied to Tb2
Now on each tape there is only one run. The last step is to merge these two runs and
to get the entire file sorted.
B3. Merging the last two runs.
The result is:
Analysis
Number of passes: log(N/M)

Daa chapter5

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Daa chapter5

Similar to Daa chapter5 (20)

More from B.Kirron Reddi

More from B.Kirron Reddi (18)

Recently uploaded

Recently uploaded (20)

Daa chapter5