SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
DOI : 10.5121/ijist.2013.3402 11
Space Efficient Suffix Array Construction using
Induced Sorting LMS Substrings
Rajesh. Yelchuri1
, Nagamalleswara Rao.N2
Department of Computer Science and Engineering, R.V.R & J.C College of Engg.
Chowdavaram, Guntur, Andhra Pradesh -522119,India
1
rajesh.yelchuri@gmail.com
2
nnmr_m@yahoo.com
ABSTRACT
This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm
uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from
the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the
suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient
usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm
KEYWORDS
Divide and Conquer, Suffix Array.
1. INTRODUCTION
This document describes, the concept of suffix arrays was introduced by Manber and Myers in
SODA’90 [4] and SICOMP’93 [3] as a space efficient alternative to suffix trees. It has been well
recognized as a fundamental data structure, useful for a broad range of applications, for e.g.,
string search, data indexing, searching for patterns in DNA or protein sequences, data
compression and also in Burrows-Wheeler transformation. For an n-character string, denoted by
STR, its suffix array, denoted by SAR(STR), is an array of indices pointing to all the suffixes of
STR, sorted according to their ascending(or descending) lexicographical order. The suffix array
of STR itself requires only n[log n]-bit space. However, different suffix array construction
algorithms may require different space and time complexities. During the past decade, a many
researches have been developing suffix array construction algorithms that are both time and space
efficient, for which we suggest a detailed survey from Puglisi [5]. Time and space efficient suffix
array construction algorithms has become popular because of their wide usage. Construction of
suffix arrays are needed for large scale applications, e.g., biological genome database and web
searching and, where the size of a huge data set is measured in billions of characters [6], [7], [8],
[9], [10].Time and space efficient linear time algorithms are crucial for large-scale applications to
have predictable worst-case performance. The three known algorithms are KSP [1],KA [12],
[13],KS [11], [2] all are reported in 2003.
2. BASIC NOTATIONS
In this section we bring out some basic terminology, used in the presentation of the algorithm. Let
STR be a string of n characters in an array [0..n-1], and ∑(STR) be the alphabet of STR. To
denote a substring in STR where i and j ranges from 0 to n-1,i<j, we denote it as STR[i..j]. For
simplicity assume, STR is supposed to be terminated by a character called as sentinel and
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
12
represented by $, which is the unique lexicographically smallest character in STR. Let
suffix(STR, i) be the suffix in STR starting at STR[i] and running to the end of the character
array i.e. to the sentinel.
A suffix suffix(STR,i) is called as S-type or L-type, if suffix(STR, i) < suffix(STR,i+1) or
suffix(STR, i) > suffix(STR,i+1), respectively. The last suffix suffix(STR,n-1) consisting of only
the single character $ (the sentinel) which is predefined as S-type. We can classify a character
STR[i] to be S-type or L-type. To store the type of every character/ suffix, we introduce an n-bit
Boolean array b, where b[i] records the type of character STR[i] as well as suffix(STR, i): 1 for S-
type and 0 for L-type. From the S-type and L-type descriptions, we observed the following
properties:
Property 1:STR[i] is S-type, if STR[i] < STR[i+1] or
STR[i]=STR[i+1] and suffix(STR,i+1) is S-type.
Property 2:STR[i] is L-type, if STR[i] > STR[i+1] or
STR[i]=STR[i+1] and suffix(STR,i+1) is L-type.
By reading STR once from right to left, we can store the type of each character/suffix into type
array ‘b’ in O(n) time.
As defined earlier, SAR(STR) (the notation of SAR is used for it when there is no confusion in
the context), i.e., the suffix array of STR, stores the indices of all the suffixes of STR according to
their lexicographical order. We observe that the pointers for all the suffixes beginning with a
same character must span successively. Let us call a sub array in SAR for all the suffixes with the
same first character as a bucket, where the head and the tail of a bucket refer to the first and the
last items of the bucket. There must be no tie between any two suffixes sharing the identical
character but of different types i.e., in the same bucket, all the suffixes of the same type are
grouped together and the S-type suffixes are to the right of the L-type suffixes [12], [13].
Therefore, each bucket can be divided into two sub-buckets with respect to the types of suffixes
inside i.e. the L and S-type buckets, where the S-type bucket is on the right of the L-type bucket.
3. Existing Algorithm: INDUCED SORTING VARIABLE LENGTH LMS
SUBSTRINGS
A. Algorithm Framework
The framework of existing linear time suffix array sorting algorithm SAR-IS[15] that samples
and sorts the variable-length LMS-substrings, is given in section III-C. Lines 1 to 4 give the
reduced problem, which is then again recursively solved by the lines 5-8, and finally from the
solution of the reduced problem, Line 9 induces the final solution for the original problem.
B. Basic Definitions
We start by introducing the terms of leftmost S-type (LMS) character, suffix, and substring as
follows:
Definition 1:(LMS Character/Suffix) A character STR[i], iЄ[1,n-1] is called LMS, if STR[i] is S-
type and STR[i-1] is L-type. A suffix suffix(STR,i) is called LMS, if STR[i] is an LMS character.
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
13
Definition 2: (LMS-Substring) An LMS-substring is (i) a substring STR[i..j] with both STR[i]
and STR[j] being LMS characters and there exists no other LMS character in the substring, for i
≠ j; or (ii) the sentinel itself. If we treat the LMS-substrings as elementary blocks of the string, we
can effortlessly sort all the LMS substrings, then by using the order index of each LMS substring
as its name, and replace all of the LMS-substrings in STR by their names. Therefore, the string
STR can be represented by a shortened string, denoted by R1, thus the problem size can be further
minimized to fast up solving the problem in divide-and-conquer manner
Definition 3: (Order of Substring) To find out the order of any two LMS-substrings, first
compare their corresponding characters from left to right. For each pair of characters, compare
their lexicographical values first and then their types, if the two characters are of the same
lexicographical value, where the S-type is taken as highest priority than the L-type. From this
definition ,we see that two LMS-substrings can be of the same order index, i.e., the same name, if
they have same, in terms of the lengths, and the characters, and the types. Assigning the S-type
character a higher priority is based on a property directly from the definitions of L-type and S-
type suffixes in [12]: suffix(STR, i)> suffix(STR, j), if (1) STR[i] > STR[j], or (2)
STR[i]=STR[j], suffix(STR, i)and suffix(STR, j) are S-type and L-type, respectively. To sort all
the LMS-substrings, no excess physical space is essential for storing them. We simply maintain a
pointer array, denoted by P1, which contains the pointers for all the LMS-substrings in STR and
can be made by scanning STR or by reading the Boolean array b once from right to left in O(n)
time.
Definition 4: (Pointer Array P1) is an array which has the pointers for all the LMS substrings in
STR with their original positional order being conserved. If we have all the LMS substrings
sorted in the buckets in their lexicographical order, where all the LMS substrings in a bucket are
identical, now we name each and every item of the pointer array P1 by the index of its bucket to
result in a revived string R1. We say the two equal size substrings STR[i..j] and STR[i′..j′] are
identical, if and only if STR[ i + k]=STR[i′ +k] and b[i +k]=b[i′ +k], for k Є [0,j-i].
C. Algorithm
SAR-IS(STR,SAR)
STR- is input string;
SAR-output of suffix array of STR;
b:array[0..n-1] of Boolean;
P1,R1:array[0....n1] of integer; n1=||R1||
BKT:array[0..||∑(STR)||-1] of integer;
Step 1. Scan STR once to classify all the characters as L-Type or S-Type into b;
Step 2. Scan b once to find all LMS –substrings in STR into P1;
Step 3. Induced sort all the LMS-substrings using P1 and BKT;
Step 4. Name each LMS-substring in STR by its bucket index to get a new shortened string R1;
Step 5. if each character in R1 is unique then
Step 6. Directly compute SAR1 from R1;
Step 7. else
Step 8. SAR-IS(R1,SAR1); //Recursive call
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
14
Step 9. Induce SAR from R1;
Step 10. Return
The above mentioned algorithm is the existing one.
4. Proposed Algorithm
In SA-IS, the additional working space is mainly composed of the bucket counter array ‘BKT’
and the type array ‘t’ at each recursion level. Our proposed algorithm differs from the existing
one in two cases. They are
1. We use the MSB bit of the suffix array to store the type of the character(S-type or L-
type) thereby avoiding the space needed for the type array ‘t’ suggested in the existing
algorithm.
2. We reuse the unused space in SAR for the bucket array BKT.
We have observed that the input STR has been reduced to at least n/3 at the initial level (level-0)
for the standard suffix array datasets .So, we can use of the unused space of SAR for the variable
BKT in deeper levels rather than creating memory using malloc. As, in the existing algorithm -1
is used as initialization (default) value for suffix array SAR. In the proposed algorithm we use
0X7FFFFFFF as initialization value for suffix array SAR as the MSB bit is used to classify the S-
type or L-type characters. Here we assume a 32-bit machine and the integer occupies 4-bytes.
The variable Buf_ptr is used which records the start address of the unused space of SAR at initial
level(i.e level-0) so that we can reuse this space in the next levels (i.e. from 1st Level) for the
bucket array (See Fig 1). We can also make use of this space for the L or S-type arrays if the
space is still available.
As we can see the space of SAR0 is reused for the level-1 because the size of the problem gets
decreased as the level progresses.
4.1 Algorithm
SAR-IS (STR, SAR)
STR- is input string;
SAR-output of suffix array of STR;
P1, R1: array [0...n1] of integer; n1=||R1||
BKT: array [0...||∑ (STR) ||-1] of integer; //uses unused space in subsequent iterations
Buf_ptr : pointer to unused space in SAR
Step 1. Scan STR once to classify all the characters as L-Type or S-Type into MSB bits of SAR;
Step 2. Scan MSB’s of SAR once to find all LMS substrings in STR into P1;
Step 3. if level Not Equal to 0 then
BKT=buf_ptr;//assign the start address of unused buffer
Step 4. Induced sort all the LMS-substrings using P1 and BKT;
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
15
Step 5. Name each LMS-substring in STR by its bucket index to get a new shortened string R1;
Step 6. If level Equal to 0 then assign the start address of
unused space of SAR to buf_ptr.
Step 7. if Each character in R1 is unique then
Step 8. Directly compute SAR1 from R1;
Step 9. else SAR-IS(R1,SAR1); //Recursive call
Step 10.Once again scan STR to classify all the characters as L-Type or S-Type into MSB bits
of SAR;
Step 11.Induce SAR from SAR1;
Step 12.return
Fig 1.Example for the re usage of the buffer SAR
The re usage of the buffer is illustrated in Fig 1.The notation L 0, L 1, L 2 stands for Level-0,
Level-1, Level-2.
4.2 Experimental Results
The algorithm was implemented in VC++ using the Microsoft Visual Studio under Windows XP
platform. The Table II and Fig 2 give the overview of the space consumed by the existing and the
proposed algorithms. The data sets in Table I used in our experiment are downloaded from
Canterbury [14] and Manzini-Ferragina[16].
Dataset ||∑||,Characters
bible.txt 63,4047392
chr22.dna 4,34553758
e.coli 4,4638690
howto 197,39422105
world192.txt 94.2473400
sprot34.dat 66,109617186
etext99 146,105277340
rfc 120,116,421,901
rctail196 93,114,711,151
linux-2.4.5.tar 256,21,508,430
w3c2 256,104,201,579
alphabet 26,100000
random 26,100000
TABLE I Datasets used in the Experiment
L 2
L 1
L 0 B0SAR0
R1 SAR1 BKT1
R2 SA R 2 BKT2R1
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
16
1
2
4
8
16
32
64
128
256
512
1024
Existing Algorithm
Proposed Algorithm
Dataset Space(in Mega Bytes)
Existing
Algorithm
Proposed
Algorithm
bible.txt 21.81 20.10
chr22.dna 179.10 165.85
e.coli 25.25 22.9
howto 204.47 189.11
world192.txt 13.61 12.58
sprot34.dat 556.57 524.48
etext99 544.14 503.74
rfc 590.53 556.99
rctail196 577.29 548.81
linux-2.4.5.tar 130.82 103.53
w3c2 521.11 498.60
alphabet 1.35 1.23
random 1.48 1.23
TABLE II Space Consumed by the Existing and Proposed Algorithm
Fig 2. Logarithmic graph (base 2) showing the comparison between Existing and Proposed Algorithm
The datasets that are in Table I are downloaded from the benchmark repositories for SACAs,
which includes Canterbury [14], Manzini-Ferragina[16].These datasets have constant alphabets
with sizes less than or equal to 256 and one byte is taken for each character.
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
17
4.3 Conclusions
The proposed algorithm makes the algorithm space efficient by using the MSB bit of SAR to
classify L-type and S-type characters and reuses the space of SAR for the bucket array at each
level there by reducing nearly 25% of the space needed when compared to the existing
algorithm. The results for the various data sets are shown in the Table II.
REFERENCES
[1] D.K. Kim, J.S. Sim, H. Park, and K. Park, “Linear-Time Construction of Suffix Arrays,” Proc.
Ann. Symp Combinatorial Pattern Matching (CPM ’03), pp. 186-199. 2003.
[2] J. Karkkainen, P. Sanders, and S. Burkhardt, “Linear Work Suffix Array Construction,” J. ACM,
no. 6, pp. 918-936, Nov. 2006.
[3] U. Manber and G. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” SIAM J.
Computing, vol. 22, no. 5, pp. 935-948, 1993.
[4] U. Manber and G. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” Proc.
First Ann. ACM-SIAM Symp. Discrete Algorithms (SODA ’90), pp. 319-327, 1990.
[5] S.J. Puglisi, W.F. Smyth, and A.H. Turpin, “A Taxonomy of Suffix Array Construction
Algorithms,” ACM Computing Surveys, vol. 39, no. 2, pp. 1-31, 2007.
[6] R. Grossi and J.S. Vitter, “Compressed Suffix Arrays and Suffix Trees with Applications to Text
Indexing and String Matching,” Proc. Symp. Theory of Computing (STOC ’00), pp. 397-406,
2000.
[7] T.W. Lam, K. Sadakane, W.K. Sung, and S.M. Yiu, “A Space and Time Efficient Algorithm for
Constructing Compressed Suffix Arrays,” Proc. Int’l Conf. Computing and Combinatorics, pp.
401-410, 2002.
[8] G. Manzini and P. Ferragina, “Engineering a Lightweight Suffix Array Construction Algorithm,”
Algorithmica, vol. 40, no. 1, pp. 33- 50, Sept. 2004.
[9] S. Kurtz, “Reducing the Space Requirement of Suffix Trees,” Software Practice and Experience,
vol. 29, pp. 1149-1171, 1999.
[10] W.K. Hon, K. Sadakane, and W.K. Sung, “Breaking a Time-and-Space Barrier for Constructing
Full-Text Indices,” Proc. 44th Ann. IEEE Symp. Foundations of Computer Science (FOCS ’03),
pp. 251-260, 2003.
[11] J. Karkkainen and P. Sanders, “Simple Linear Work Suffix Array Construction,” Proc. 30th Int’l
Conf. Automata, Languages, and Programming (ICALP ’03), pp. 943-955, 2003.
[12] P. Ko and S. Aluru, “Space Efficient Linear Time Construction of Suffix Arrays,” Proc. Ann.
Symp. Combinatorial Pattern Matching(CPM ’03), pp. 200-210. 2003.
[13] P. Ko and S. Aluru, “Space-Efficient Linear Time Construction of Suffix Arrays,” J. Discrete
Algorithms, vol. 3, nos. 2-4, pp. 143-156, 2005
[14] The Canterbury Corpus website. [Online]. Available: http://corpus.canterbury.ac.nz/.
[15] GeNong, Sen Zhang, Wai Hong Chan, “Two Efficient Algorithms for Linear Time Suffix Array
Construction”, IEE Transactions on Computers, vol. 60, pp.1471-1484,Oct.2011.
International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013
18
[16] Light weight corpus datasets [Online].Available:
http://people.unipmn.it/manzini/lightweight/corpus

Contenu connexe

Tendances

Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2shah zeb
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of AutomataFarooq Mian
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4shah zeb
 
Chapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryChapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryTsegazeab Asgedom
 
02 representing position and orientation
02 representing position and orientation02 representing position and orientation
02 representing position and orientationcairo university
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
Theory of Automata Lesson 02
Theory of Automata Lesson 02Theory of Automata Lesson 02
Theory of Automata Lesson 02hamzamughal39
 
Theory of computing
Theory of computingTheory of computing
Theory of computingRanjan Kumar
 
Generalized transition graphs
Generalized transition graphsGeneralized transition graphs
Generalized transition graphsArham Khan G
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler designRiazul Islam
 

Tendances (19)

Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Theory of computation Lec1
Theory of computation Lec1Theory of computation Lec1
Theory of computation Lec1
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of Automata
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Chapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryChapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata Theory
 
02 representing position and orientation
02 representing position and orientation02 representing position and orientation
02 representing position and orientation
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Language
LanguageLanguage
Language
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Theory of Automata Lesson 02
Theory of Automata Lesson 02Theory of Automata Lesson 02
Theory of Automata Lesson 02
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Generalized transition graphs
Generalized transition graphsGeneralized transition graphs
Generalized transition graphs
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
 

Similaire à Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings

32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefixSanjeev Gupta
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsunyil96
 
Finite Systems Handling Language (YAFOLL message 1)
Finite Systems Handling Language (YAFOLL message 1)Finite Systems Handling Language (YAFOLL message 1)
Finite Systems Handling Language (YAFOLL message 1)Alex Shkotin
 
C UNIT-3 PREPARED BY M V B REDDY
C UNIT-3 PREPARED BY M V B REDDYC UNIT-3 PREPARED BY M V B REDDY
C UNIT-3 PREPARED BY M V B REDDYRajeshkumar Reddy
 
09_java_DataTypesand associated class.ppt
09_java_DataTypesand associated class.ppt09_java_DataTypesand associated class.ppt
09_java_DataTypesand associated class.pptRameshKumarYadav29
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Iaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd Iaetsd
 
string functions in SQL ujjwal matoliya.pptx
string functions in SQL ujjwal matoliya.pptxstring functions in SQL ujjwal matoliya.pptx
string functions in SQL ujjwal matoliya.pptxujjwalmatoliya
 
Data Representation of Strings
Data Representation of StringsData Representation of Strings
Data Representation of StringsProf Ansari
 
3.ArraysandPointers.pptx
3.ArraysandPointers.pptx3.ArraysandPointers.pptx
3.ArraysandPointers.pptxFolkAdonis
 
Strings in c
Strings in cStrings in c
Strings in cvampugani
 
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT LATHA LAKSHMI
 
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...jaychoudhary37
 
8074.pdf
8074.pdf8074.pdf
8074.pdfBAna36
 

Similaire à Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings (20)

32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
 
Array and string
Array and stringArray and string
Array and string
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithms
 
Python data handling
Python data handlingPython data handling
Python data handling
 
Finite Systems Handling Language (YAFOLL message 1)
Finite Systems Handling Language (YAFOLL message 1)Finite Systems Handling Language (YAFOLL message 1)
Finite Systems Handling Language (YAFOLL message 1)
 
50120140502014
5012014050201450120140502014
50120140502014
 
C UNIT-3 PREPARED BY M V B REDDY
C UNIT-3 PREPARED BY M V B REDDYC UNIT-3 PREPARED BY M V B REDDY
C UNIT-3 PREPARED BY M V B REDDY
 
09_java_DataTypesand associated class.ppt
09_java_DataTypesand associated class.ppt09_java_DataTypesand associated class.ppt
09_java_DataTypesand associated class.ppt
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Iaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databases
 
string functions in SQL ujjwal matoliya.pptx
string functions in SQL ujjwal matoliya.pptxstring functions in SQL ujjwal matoliya.pptx
string functions in SQL ujjwal matoliya.pptx
 
Data Representation of Strings
Data Representation of StringsData Representation of Strings
Data Representation of Strings
 
3.ArraysandPointers.pptx
3.ArraysandPointers.pptx3.ArraysandPointers.pptx
3.ArraysandPointers.pptx
 
Strings in c
Strings in cStrings in c
Strings in c
 
Team 1
Team 1Team 1
Team 1
 
Ocs752 unit 3
Ocs752   unit 3Ocs752   unit 3
Ocs752 unit 3
 
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT
Arrays in C++ in Tamil - TNSCERT SYLLABUS PPT
 
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
 
8074.pdf
8074.pdf8074.pdf
8074.pdf
 
DS_PPT.ppt
DS_PPT.pptDS_PPT.ppt
DS_PPT.ppt
 

Plus de ijistjournal

A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINijistjournal
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDYijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...ijistjournal
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...ijistjournal
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...ijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...ijistjournal
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMijistjournal
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...ijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domainijistjournal
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...ijistjournal
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)ijistjournal
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...ijistjournal
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithmijistjournal
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...ijistjournal
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...ijistjournal
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
 
Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...ijistjournal
 

Plus de ijistjournal (20)

A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithm
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings

  • 1. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 DOI : 10.5121/ijist.2013.3402 11 Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings Rajesh. Yelchuri1 , Nagamalleswara Rao.N2 Department of Computer Science and Engineering, R.V.R & J.C College of Engg. Chowdavaram, Guntur, Andhra Pradesh -522119,India 1 rajesh.yelchuri@gmail.com 2 nnmr_m@yahoo.com ABSTRACT This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm KEYWORDS Divide and Conquer, Suffix Array. 1. INTRODUCTION This document describes, the concept of suffix arrays was introduced by Manber and Myers in SODA’90 [4] and SICOMP’93 [3] as a space efficient alternative to suffix trees. It has been well recognized as a fundamental data structure, useful for a broad range of applications, for e.g., string search, data indexing, searching for patterns in DNA or protein sequences, data compression and also in Burrows-Wheeler transformation. For an n-character string, denoted by STR, its suffix array, denoted by SAR(STR), is an array of indices pointing to all the suffixes of STR, sorted according to their ascending(or descending) lexicographical order. The suffix array of STR itself requires only n[log n]-bit space. However, different suffix array construction algorithms may require different space and time complexities. During the past decade, a many researches have been developing suffix array construction algorithms that are both time and space efficient, for which we suggest a detailed survey from Puglisi [5]. Time and space efficient suffix array construction algorithms has become popular because of their wide usage. Construction of suffix arrays are needed for large scale applications, e.g., biological genome database and web searching and, where the size of a huge data set is measured in billions of characters [6], [7], [8], [9], [10].Time and space efficient linear time algorithms are crucial for large-scale applications to have predictable worst-case performance. The three known algorithms are KSP [1],KA [12], [13],KS [11], [2] all are reported in 2003. 2. BASIC NOTATIONS In this section we bring out some basic terminology, used in the presentation of the algorithm. Let STR be a string of n characters in an array [0..n-1], and ∑(STR) be the alphabet of STR. To denote a substring in STR where i and j ranges from 0 to n-1,i<j, we denote it as STR[i..j]. For simplicity assume, STR is supposed to be terminated by a character called as sentinel and
  • 2. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 12 represented by $, which is the unique lexicographically smallest character in STR. Let suffix(STR, i) be the suffix in STR starting at STR[i] and running to the end of the character array i.e. to the sentinel. A suffix suffix(STR,i) is called as S-type or L-type, if suffix(STR, i) < suffix(STR,i+1) or suffix(STR, i) > suffix(STR,i+1), respectively. The last suffix suffix(STR,n-1) consisting of only the single character $ (the sentinel) which is predefined as S-type. We can classify a character STR[i] to be S-type or L-type. To store the type of every character/ suffix, we introduce an n-bit Boolean array b, where b[i] records the type of character STR[i] as well as suffix(STR, i): 1 for S- type and 0 for L-type. From the S-type and L-type descriptions, we observed the following properties: Property 1:STR[i] is S-type, if STR[i] < STR[i+1] or STR[i]=STR[i+1] and suffix(STR,i+1) is S-type. Property 2:STR[i] is L-type, if STR[i] > STR[i+1] or STR[i]=STR[i+1] and suffix(STR,i+1) is L-type. By reading STR once from right to left, we can store the type of each character/suffix into type array ‘b’ in O(n) time. As defined earlier, SAR(STR) (the notation of SAR is used for it when there is no confusion in the context), i.e., the suffix array of STR, stores the indices of all the suffixes of STR according to their lexicographical order. We observe that the pointers for all the suffixes beginning with a same character must span successively. Let us call a sub array in SAR for all the suffixes with the same first character as a bucket, where the head and the tail of a bucket refer to the first and the last items of the bucket. There must be no tie between any two suffixes sharing the identical character but of different types i.e., in the same bucket, all the suffixes of the same type are grouped together and the S-type suffixes are to the right of the L-type suffixes [12], [13]. Therefore, each bucket can be divided into two sub-buckets with respect to the types of suffixes inside i.e. the L and S-type buckets, where the S-type bucket is on the right of the L-type bucket. 3. Existing Algorithm: INDUCED SORTING VARIABLE LENGTH LMS SUBSTRINGS A. Algorithm Framework The framework of existing linear time suffix array sorting algorithm SAR-IS[15] that samples and sorts the variable-length LMS-substrings, is given in section III-C. Lines 1 to 4 give the reduced problem, which is then again recursively solved by the lines 5-8, and finally from the solution of the reduced problem, Line 9 induces the final solution for the original problem. B. Basic Definitions We start by introducing the terms of leftmost S-type (LMS) character, suffix, and substring as follows: Definition 1:(LMS Character/Suffix) A character STR[i], iЄ[1,n-1] is called LMS, if STR[i] is S- type and STR[i-1] is L-type. A suffix suffix(STR,i) is called LMS, if STR[i] is an LMS character.
  • 3. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 13 Definition 2: (LMS-Substring) An LMS-substring is (i) a substring STR[i..j] with both STR[i] and STR[j] being LMS characters and there exists no other LMS character in the substring, for i ≠ j; or (ii) the sentinel itself. If we treat the LMS-substrings as elementary blocks of the string, we can effortlessly sort all the LMS substrings, then by using the order index of each LMS substring as its name, and replace all of the LMS-substrings in STR by their names. Therefore, the string STR can be represented by a shortened string, denoted by R1, thus the problem size can be further minimized to fast up solving the problem in divide-and-conquer manner Definition 3: (Order of Substring) To find out the order of any two LMS-substrings, first compare their corresponding characters from left to right. For each pair of characters, compare their lexicographical values first and then their types, if the two characters are of the same lexicographical value, where the S-type is taken as highest priority than the L-type. From this definition ,we see that two LMS-substrings can be of the same order index, i.e., the same name, if they have same, in terms of the lengths, and the characters, and the types. Assigning the S-type character a higher priority is based on a property directly from the definitions of L-type and S- type suffixes in [12]: suffix(STR, i)> suffix(STR, j), if (1) STR[i] > STR[j], or (2) STR[i]=STR[j], suffix(STR, i)and suffix(STR, j) are S-type and L-type, respectively. To sort all the LMS-substrings, no excess physical space is essential for storing them. We simply maintain a pointer array, denoted by P1, which contains the pointers for all the LMS-substrings in STR and can be made by scanning STR or by reading the Boolean array b once from right to left in O(n) time. Definition 4: (Pointer Array P1) is an array which has the pointers for all the LMS substrings in STR with their original positional order being conserved. If we have all the LMS substrings sorted in the buckets in their lexicographical order, where all the LMS substrings in a bucket are identical, now we name each and every item of the pointer array P1 by the index of its bucket to result in a revived string R1. We say the two equal size substrings STR[i..j] and STR[i′..j′] are identical, if and only if STR[ i + k]=STR[i′ +k] and b[i +k]=b[i′ +k], for k Є [0,j-i]. C. Algorithm SAR-IS(STR,SAR) STR- is input string; SAR-output of suffix array of STR; b:array[0..n-1] of Boolean; P1,R1:array[0....n1] of integer; n1=||R1|| BKT:array[0..||∑(STR)||-1] of integer; Step 1. Scan STR once to classify all the characters as L-Type or S-Type into b; Step 2. Scan b once to find all LMS –substrings in STR into P1; Step 3. Induced sort all the LMS-substrings using P1 and BKT; Step 4. Name each LMS-substring in STR by its bucket index to get a new shortened string R1; Step 5. if each character in R1 is unique then Step 6. Directly compute SAR1 from R1; Step 7. else Step 8. SAR-IS(R1,SAR1); //Recursive call
  • 4. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 14 Step 9. Induce SAR from R1; Step 10. Return The above mentioned algorithm is the existing one. 4. Proposed Algorithm In SA-IS, the additional working space is mainly composed of the bucket counter array ‘BKT’ and the type array ‘t’ at each recursion level. Our proposed algorithm differs from the existing one in two cases. They are 1. We use the MSB bit of the suffix array to store the type of the character(S-type or L- type) thereby avoiding the space needed for the type array ‘t’ suggested in the existing algorithm. 2. We reuse the unused space in SAR for the bucket array BKT. We have observed that the input STR has been reduced to at least n/3 at the initial level (level-0) for the standard suffix array datasets .So, we can use of the unused space of SAR for the variable BKT in deeper levels rather than creating memory using malloc. As, in the existing algorithm -1 is used as initialization (default) value for suffix array SAR. In the proposed algorithm we use 0X7FFFFFFF as initialization value for suffix array SAR as the MSB bit is used to classify the S- type or L-type characters. Here we assume a 32-bit machine and the integer occupies 4-bytes. The variable Buf_ptr is used which records the start address of the unused space of SAR at initial level(i.e level-0) so that we can reuse this space in the next levels (i.e. from 1st Level) for the bucket array (See Fig 1). We can also make use of this space for the L or S-type arrays if the space is still available. As we can see the space of SAR0 is reused for the level-1 because the size of the problem gets decreased as the level progresses. 4.1 Algorithm SAR-IS (STR, SAR) STR- is input string; SAR-output of suffix array of STR; P1, R1: array [0...n1] of integer; n1=||R1|| BKT: array [0...||∑ (STR) ||-1] of integer; //uses unused space in subsequent iterations Buf_ptr : pointer to unused space in SAR Step 1. Scan STR once to classify all the characters as L-Type or S-Type into MSB bits of SAR; Step 2. Scan MSB’s of SAR once to find all LMS substrings in STR into P1; Step 3. if level Not Equal to 0 then BKT=buf_ptr;//assign the start address of unused buffer Step 4. Induced sort all the LMS-substrings using P1 and BKT;
  • 5. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 15 Step 5. Name each LMS-substring in STR by its bucket index to get a new shortened string R1; Step 6. If level Equal to 0 then assign the start address of unused space of SAR to buf_ptr. Step 7. if Each character in R1 is unique then Step 8. Directly compute SAR1 from R1; Step 9. else SAR-IS(R1,SAR1); //Recursive call Step 10.Once again scan STR to classify all the characters as L-Type or S-Type into MSB bits of SAR; Step 11.Induce SAR from SAR1; Step 12.return Fig 1.Example for the re usage of the buffer SAR The re usage of the buffer is illustrated in Fig 1.The notation L 0, L 1, L 2 stands for Level-0, Level-1, Level-2. 4.2 Experimental Results The algorithm was implemented in VC++ using the Microsoft Visual Studio under Windows XP platform. The Table II and Fig 2 give the overview of the space consumed by the existing and the proposed algorithms. The data sets in Table I used in our experiment are downloaded from Canterbury [14] and Manzini-Ferragina[16]. Dataset ||∑||,Characters bible.txt 63,4047392 chr22.dna 4,34553758 e.coli 4,4638690 howto 197,39422105 world192.txt 94.2473400 sprot34.dat 66,109617186 etext99 146,105277340 rfc 120,116,421,901 rctail196 93,114,711,151 linux-2.4.5.tar 256,21,508,430 w3c2 256,104,201,579 alphabet 26,100000 random 26,100000 TABLE I Datasets used in the Experiment L 2 L 1 L 0 B0SAR0 R1 SAR1 BKT1 R2 SA R 2 BKT2R1
  • 6. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 16 1 2 4 8 16 32 64 128 256 512 1024 Existing Algorithm Proposed Algorithm Dataset Space(in Mega Bytes) Existing Algorithm Proposed Algorithm bible.txt 21.81 20.10 chr22.dna 179.10 165.85 e.coli 25.25 22.9 howto 204.47 189.11 world192.txt 13.61 12.58 sprot34.dat 556.57 524.48 etext99 544.14 503.74 rfc 590.53 556.99 rctail196 577.29 548.81 linux-2.4.5.tar 130.82 103.53 w3c2 521.11 498.60 alphabet 1.35 1.23 random 1.48 1.23 TABLE II Space Consumed by the Existing and Proposed Algorithm Fig 2. Logarithmic graph (base 2) showing the comparison between Existing and Proposed Algorithm The datasets that are in Table I are downloaded from the benchmark repositories for SACAs, which includes Canterbury [14], Manzini-Ferragina[16].These datasets have constant alphabets with sizes less than or equal to 256 and one byte is taken for each character.
  • 7. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 17 4.3 Conclusions The proposed algorithm makes the algorithm space efficient by using the MSB bit of SAR to classify L-type and S-type characters and reuses the space of SAR for the bucket array at each level there by reducing nearly 25% of the space needed when compared to the existing algorithm. The results for the various data sets are shown in the Table II. REFERENCES [1] D.K. Kim, J.S. Sim, H. Park, and K. Park, “Linear-Time Construction of Suffix Arrays,” Proc. Ann. Symp Combinatorial Pattern Matching (CPM ’03), pp. 186-199. 2003. [2] J. Karkkainen, P. Sanders, and S. Burkhardt, “Linear Work Suffix Array Construction,” J. ACM, no. 6, pp. 918-936, Nov. 2006. [3] U. Manber and G. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” SIAM J. Computing, vol. 22, no. 5, pp. 935-948, 1993. [4] U. Manber and G. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” Proc. First Ann. ACM-SIAM Symp. Discrete Algorithms (SODA ’90), pp. 319-327, 1990. [5] S.J. Puglisi, W.F. Smyth, and A.H. Turpin, “A Taxonomy of Suffix Array Construction Algorithms,” ACM Computing Surveys, vol. 39, no. 2, pp. 1-31, 2007. [6] R. Grossi and J.S. Vitter, “Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching,” Proc. Symp. Theory of Computing (STOC ’00), pp. 397-406, 2000. [7] T.W. Lam, K. Sadakane, W.K. Sung, and S.M. Yiu, “A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays,” Proc. Int’l Conf. Computing and Combinatorics, pp. 401-410, 2002. [8] G. Manzini and P. Ferragina, “Engineering a Lightweight Suffix Array Construction Algorithm,” Algorithmica, vol. 40, no. 1, pp. 33- 50, Sept. 2004. [9] S. Kurtz, “Reducing the Space Requirement of Suffix Trees,” Software Practice and Experience, vol. 29, pp. 1149-1171, 1999. [10] W.K. Hon, K. Sadakane, and W.K. Sung, “Breaking a Time-and-Space Barrier for Constructing Full-Text Indices,” Proc. 44th Ann. IEEE Symp. Foundations of Computer Science (FOCS ’03), pp. 251-260, 2003. [11] J. Karkkainen and P. Sanders, “Simple Linear Work Suffix Array Construction,” Proc. 30th Int’l Conf. Automata, Languages, and Programming (ICALP ’03), pp. 943-955, 2003. [12] P. Ko and S. Aluru, “Space Efficient Linear Time Construction of Suffix Arrays,” Proc. Ann. Symp. Combinatorial Pattern Matching(CPM ’03), pp. 200-210. 2003. [13] P. Ko and S. Aluru, “Space-Efficient Linear Time Construction of Suffix Arrays,” J. Discrete Algorithms, vol. 3, nos. 2-4, pp. 143-156, 2005 [14] The Canterbury Corpus website. [Online]. Available: http://corpus.canterbury.ac.nz/. [15] GeNong, Sen Zhang, Wai Hong Chan, “Two Efficient Algorithms for Linear Time Suffix Array Construction”, IEE Transactions on Computers, vol. 60, pp.1471-1484,Oct.2011.
  • 8. International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4, July 2013 18 [16] Light weight corpus datasets [Online].Available: http://people.unipmn.it/manzini/lightweight/corpus