SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
International Journal of Research in Computer Science
eISSN 2249-8265 Volume 2 Issue 6 (2012) pp. 33-37
www.ijorcs.org, A Unit of White Globe Publications
doi: 10.7815/ijorcs. 26.2012.053



COMMENTZ-WALTER: ANY BETTER THAN AHO-
  CORASICK FOR PEPTIDE IDENTIFICATION?
                     S.M. Vidanagamachchi1, S.D. Dewasurendra 2, R.G. Ragel 2, M.Niranjan3
          1
            Department of Computer Science, Faculty of Science, University of Ruhuna, Matara, SRI LANKA
                                              Email: smv@dcs.ruh.ac.lk
        2
         Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, SRI LANKA
                                  Email: devapriyad@pdn.ac.lk, roshanr@pdn.ac.lk
             3
               Department of Electronics and Computer Science, Faculty of Physical and Applied Sciences,
                                    University of Southampton, UNITED KINGDOM
                                              Email: mn@ecs.soton.ac.uk

Abstract: An algorithm for locating all occurrences of      used in computational biology for matching biological
a finite number of keywords in an arbitrary string, also    sequences [1]. Multiple Sequence Alignment (MSA) is
known as multiple strings matching, is commonly             an alignment of three or more sequences that contain
required in information retrieval (such as sequence         thousands of residues (nucleotides or amino acids).
analysis, evolutionary biological studies, gene/protein     Multiple pattern matching plays a vital role in
identification and network intrusion detection) and text    identifying thousands of patterns simultaneously
editing applications. Although Aho-Corasick was one         within a given text. The results of MSA can be used to
of the commonly used exact multiple strings matching        infer sequence homology as well as conduct
algorithm, Commentz-Walter has been introduced as a         phylogenetic analysis and multiple sequence matching
better alternative in the recent past. Comments-Walter      is used for identifying proteins/ DNA/ Peptides/
algorithm combines ideas from both Aho-Corasick and         motifs/ domains, managing security (for example,
Boyer Moore. Large scale rapid and accurate peptide         intrusion detection), editing texts, etc. In multiple
identification is critical in computational proteomics.     sequence alignment we use a database of large sizes of
In this paper, we have critically analyzed the time         sequence data over a query sequence and it can be
complexity of Aho-Corasick and Commentz-Walter for          identified as one form of multiple string/pattern
their suitability in large scale peptide identification.    matching. In this paper, we have used a set of
According to the results we obtained for our dataset,       predefined peptides and located them over a given
we conclude that Aho-Corasick is performing better          arbitrary protein (the protein can be known/ unknown/
than Commentz-Walter as opposed to the common               predicted).
beliefs.
Keywords: Aho-Corasick, Commentz-Walter, Peptide               Exact (as opposed to approximate) string matching
Identification                                              is commonly required in many fields, as it provides
                                                            practical solutions to the real world problems. For
                                                            example, exact string matching can be used for
                 I. INTRODUCTION
                                                            sequence analysis, gene finding, evolutionary biology
                     21B




   String matching has been extensively studied in the      studies and protein expression analysis in the fields of
past three decades and is a classical problem               genetics and computational biology [8][15][16]. Other
fundamental to many applications that need processing       areas are network intrusion detection [17][18][19] and
of either text data or some sequence data. Many string      text processing [20].
matching algorithms are used in locating one or
several string patterns that are found within a larger         In this paper we basically compare time
text. For a given text T and a pattern set P, string        complexities for two multiple pattern matching
matching can be defined as finding all occurrences of       algorithms: Aho- Corasick and Commentz-Walter for
pattern P in text T.                                        the identification of large scale protein data.

   With the remarkable increase of data in biological          We start by giving brief introduction of string
databases (such as what is available in GenBank and         matching algorithms in section II. Section III includes
UniProt) there exists a bottleneck in analyzing             the objectives of the experiment. Related work,
biological sequences. Pair wise sequence alignment,         experimental set up and methods, results of the
multiple sequence alignment, global alignment and           experiments, conclusion and future scope are discussed
local alignment are types of string matching techniques

                                                                          www.ijorcs.org
34                                                  S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan

in section IV, section V, section VI, section VII and          Boyer-Moore algorithm [5] is an efficient, single
sections VIII respectively.                                 and exact pattern matching algorithm that preprocesses
                                                            the pattern being searched for, but not the text being
      II. STRING MATCHING ALGORITHMS
         2B
                                                            searched in. It searches for occurrences of pattern P in
                                                            text T by applying shift rules: bad character rule and
    String matching (either exact or approximate)           good suffix rule. Bad character rule deals with the
algorithms can be basically classified into three           amount of shifting along the pattern P to the left side
groups: single pattern matching algorithms (search for      when there is a mismatch character with the text T and
a single pattern within the text), algorithms that uses     good suffix rule deals with the amount of shift when
finite number of patterns (search finite number of          there is a mismatch with a reoccurring suffix of the
patterns within the text) and algorithms that uses          pattern. Shifting rules make this algorithm more
infinite number of patterns (search infinite number of      efficient than other single pattern matching algorithms.
patterns such as regular expressions within the text).
Single pattern matching algorithms include Rabin-              Commentz-Walter        multi    pattern    matching
Karp [21], finite state automation based search [22],       algorithm combines the shifting methods of Boyer
Knuth-Morris-Pratt [23] and Boyer-Moore [5]. Rabin-         Moore with Aho-Corasick and therefore theoretically
Karp algorithm can be used for both single pattern          faster in locating a string than Aho-Corasick. Like
matching and for matching a finite number of patterns.      Aho-Corasick, Commentz-Walter also consists of two
For Rabin-Karp, in a string of length n, if p patterns of   phases: FSM construction and shift calculation phase
combined length m are to be matched, the best and           (preprocessing) and matching phase. Here FSM is
worst case running time are O(n+m) and O(nm)                constructed reverse (reversed trie) (Figure 2) in order
respectively. Finite state automata based search is         to use the shifting methods of Boyer-Moore algorithm.
expensive to construct (power-set construction), but        We have used 3000 sets of peptides to construct the
very easy to use. Knuth-Morris-Pratt (KMP) algorithm        state machine and then matched them against the
turns the search string into a finite state machine, and    known/unknown set of proteins and calculated and
then runs the machine with the string to be searched as     compared the time consumed by both Aho-Corasick
the input string. Running time of KMP algorithm is          and Commentz-Walter algorithms.
O(n + m). Algorithms used for finite number of
patterns (multi patterns) include Aho-Corasick,
Commentz-Water, Rabin-Karp and Wu-Manber and
bit parallel algorithms [2]. They are common multiple
pattern matching algorithms and in this paper we
mainly focus on analyzing the time complexity among
two algorithms: Aho-Corasick [3] and Commentz-
Walter [4].
   Aho-Corasick algorithm is a classic and scalable
solution for exact string matching and is a widely used
multi pattern matching algorithm. It has the running
time of O(n+m+z) where n represents the length of the       Figure 1: Aho-Corasick trie implementation for patterns he,
text, m represents the length of patterns and z                                 she, his and hers
represents the number of pattern occurrences in the
text T (when the search tree is built off-line and
compiled, it has the worst-run-time complexity of O(n           For example if we have four patterns he, hers, his
+ z) in space O(m)). Therefore, the time complexity of      and she, in both algorithms we start FSM creation with
this algorithm is linear to the length of the patterns      state 0(root node). If we add pattern ‘he’ first, in Aho-
plus the length of the searched text plus the number of     Corasick we create a new node (state 1), which
matches in the text. This is the major advantage of this    represents character ‘h’, then we create another node
algorithm. Aho-Corasick algorithm consists of two           (state 2), which represents character ‘e’ (Figure 1).
main phases: Finite State Machine (FSM) construction        Then we add pattern ‘she’. Since there is no node
phase (Figure 1) and matching phase. In the FSM             representing character ‘s’, we add a new node (state 3),
construction phase this algorithm uses the pattern set      then add two more nodes representing character ‘h’
and first constructs the state machine and then             and ‘e’. Likewise we add other two patterns to the state
considers failure links to avoid backtracking to root       machine and finally we create failure links. In
node when there is a failure. In the second phase it        Commentz-Walter we add first pattern ‘he’ after
locates the pattern set within the given text if they       reversing the keyword (Figure 2). We use the same
occur in the string.                                        method in the Commentz-Walter but finally we don’t
                                                            create failure links (instead we use shifting whenever a
                                                            failure occurs).


                                                                           www.ijorcs.org
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?                                          35

                                                             Algorithm 1 and 2 are used in the implementations of
                                                             the two algorithms concerned.


                                                             Algorithm 1: Aho-Corasick Pseudocode
                                                                1. for each pattern P=1 to N
                                                                2.   addToTheTrie();
                                                                3. end for
                                                                4. for each Protein R= 1 to M
                                                                5.   searchAhoCorasickMultiple();
                                                                6. end for
    Figure 2: Commentz-Walter trie implementation for
               patterns he, she, his and hers                Algorithm 2: Commentz-Walter Pseudocode
                                                                1. createDepthShiftVector();
                  III. OBJECTIVES
                           23B                                  2. createSubStringShiftVector();
                                                                3. for each pattern P=1 to N
   As mentioned earlier Aho-Corasick is one of the
                                                                4.   reversePattern();
commonly used exact multiple string matching
                                                                5.   addToTheTrie();
algorithm and Commentz-Walter was introduced as a
                                                                6. end for
better alternative with a combination of ideas from
                                                                7. for each Protein R= 1 to M
both Aho-Corasick and Boyer-Moore. The main
                                                                8.   searchCommentzWalterMultiple();
objective of this paper is to compare the time
                                                                9. end for
complexity of the two algorithms with a large set of
peptide data for peptide identification.
                                                                 Both algorithms were implemented in Visual Studio
                IV. RELATED WORK
                     24B
                                                             2008 environment and run in Intel Core i7 2.2 GHz
                                                             machine with 4GB RAM and 32bit OS. In this
   A comparative analysis has been done by Zeeshan
                                                             implementation, Aho Corasick algorithm uses
et al. [8] where they have compared time complexity,
                                                             predefined, different sets (sizes from 500 to 3000) of
search type and approach of five string multi-pattern
                                                             peptides to create the FSM (trie) first. For this FSM we
matching algorithms: Aho-Corasick, Rabin-Karp, Bit
                                                             have used a trie data structure which has several nodes
Parallel,   Commentz-Walter      and    Wu-Manber.
                                                             representing states and each node consists of a failure
However, they have not used real data for their
                                                             node, a character, status of the node (whether a match
comparison. They have mentioned that because of the
                                                             can be found/not) and the depth from the root as
difficulty in understanding the Commentz-Walter
                                                             parameters. Then a set of 100 Drosophila proteins
algorithm, it has not emerged as a popular string
                                                             extracted from UniProt database [7] was used as the
matching algorithm.
                                                             input and this programme performed the multiple
   Experimental results of the Aho-Corasick, Wu-             patterns matching and found all the patterns occurring
Manber, SBOM and Salmela’s algorithms [19] were              within a protein in a single pass. When we have an
presented by Charalampos et al. where they run these         input protein to be processed, first our matching is
algorithms against SWISS PROT amino acid database,           started from the state 0 (root node) and it is matched
amino acid and nucleotides sequences of the A-               against the first amino acid in the protein. If it is
thaliana genome for the sets of size between 100 –           matched we proceed to the next state and else we
100000 patterns with the length of 8 and 32. Result          backtrack to the 0 state since 0 is the state of the
showed that the algorithms are performing differently        failure node. Wherever we find any matching state,
for different databases. According to that experiment,       algorithm indicates the match and proceeds if it is not
Commentz-Walter had the best performance on the              a leaf node. If it is a leaf node it indicates the match
FASTA nucleic acid database for more than 10000              and stops.
patterns. Wu-Manber has performed the best for the
                                                                In the Commentz-Walter implementation, we first
FASTA Amino acid database for more than 10000 to
                                                             created a reversed trie with predefined, different sizes
50000 patterns [14].
                                                             of sets (from 500 to 3000) of peptides to create the
                                                             reversed trie (FSM) where each node consists of state
   V. EXPERIMENTAL SETUP AND METHODS
      25B



                                                             of the node, depth from the root, status of the node
   We modified Komodia Aho-Corasick [6] for our              (whether a match can be found/not), and a character as
purpose and developed our own implementation of              parameters. Then we calculated the amount of shifts
Commentz-Walter algorithm in C++ programming                 for all suffixes of patterns when there is a mismatch of
language as we could not find any implementation in          a suffix (shift vector) and calculated the amount of
the literature. The pseudo-codes presented in                shift when there is a mismatched character (depth


                                                                            www.ijorcs.org
36                                                      S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan

vector). As mentioned earlier the same set of 100                                   VI. RESULTS  26B




proteins were used as the input and our program
                                                                   Results show that the average times of 85, 186, 234,
performed multiple pattern matching. In this algorithm
                                                               327, 359 and 542 milliseconds were needed for 500,
first we find the minimum size of a peptide (Min) and
                                                               1000, 1500, 2000, 2500 and 3000 peptides respectively
then we align Min+1th position of the protein
                                                               (Table 1) in machine construction/ preprocessing
sequence with the root node and then start the
                                                               phase of Aho-Corasick algorithm. In Commentz-
matching process from the root. If a child of the root
                                                               Walter algorithm, preprocessing phase consists of
node gets matched with the amino acid aligned (Min th
                                                               three steps: constructing shift vector, constructing
position) then we could proceed until mismatch or
                                                               depths vector in case of mismatching and constructing
match is found (if a match is found we can go to the
                                                               the finite state machine. To construct the state
root and start, if mismatch is found we have to find the
                                                               machine, this algorithm requires average times of 795,
amount of suffix shift required), else we shift by the
                                                               1156, 1954, 2579, 2474 and 3722 milliseconds (Table
amount that is included in the shift vector.
                                                               2). These times required for state machine construction
    In our experiments we used a maximum of 58                 is higher than the time required for Aho-Corasick
amino acids (a.a), a minimum of 4 a.a and an average           algorithm with the similar set of peptides. However the
of 14 a.a in length in the set of 500 peptides, a              trie constructed in Commentz-Walter algorithm is
maximum of 82 amino acids a.a, a minimum of 4 a.a              different than the trie developed in Aho-Corasick
and an average of 14 a.a in length in the set of 1000          algorithm because it is using reversed keywords. Shift
peptides, a maximum of 103 amino acids a.a, a                  vector construction is the critical step in Commentz-
minimum of 3 a.a and an average of 14 a.a in length in         Walter algorithm since it consumes considerably
the set of 1500 peptides, a maximum of 125 amino               higher amount of time (Table 2 column 3). It includes
acids a.a, a minimum of 3 a.a and an average of 14 a.a         all the suffixes of patterns and amount of shifts when
in length in the set of 2000 peptides, a maximum of 60         there is a mismatch of a suffix. Depth vector consists
amino acids a.a, a minimum of 4 a.a and an average of          of the amount of shift when there is a mismatched
12 a.a in length in the set of 2500 peptides and used a        character (Table 2 column 4). However, if we consider
maximum of 159 a.a, a minimum of 4 a.a and an                  the matching time (Table 1 column 3 and Table 2
average of 14 a.a in length in the set of 3000 peptides.       column 5) it also takes more time in Commentz-Walter
We test with the 100 input proteins which have an              algorithm than Aho-Corasick algorithm.
average length of 531 a.a. We performed the
experiments 10 times each in order to take the average.                         VII. CONCLUSIONS
                                                                                     27B




     Table 1: Time in milliseconds: Aho-Corasick Algorithm     According to the results presented, we could conclude
                                                               that the Aho-Corasick algorithm is performing better
Number       of         Average Time consumed (ms)             in time consumed than Commentz-Walter for large
patterns in the    Machine construction Matching process       sets of peptide data since Commentz-Walter algorithm
FSM                                                            needs more time for preprocessing (to calculate shifts,
500                                  85                  85    depths in case of mismatching and to construct the
1000                                186                 186    state machine).
1500                                234                 234
2000                                327                 327
                                                                               VIII. FUTURE SCOPE
2500                                359                 359
                                                                                           28B




3000                                542                 542       Parallelization of Aho-Corasick algorithm on an
                                                               FPGA has already been performed [9][10][11][2].
Table 2: Time in milliseconds: Commentz-Walter Algorithm       Parallelizing string matching using multi-core
                                                               architecture has been done by several research groups
Number                 Average Time consumed (ms)              [12][13]. As future work, we plan to reduce the
of            Machine            Shift   Depth Matching
                                                               number of states in the trie (Aho-Corasick and
patterns      construction      Vector   vector process
in     the                                                     Commentz-Walter) using optimization algorithms and
FSM                                                            parallelize these two algorithms in a multi-core
500                    795     112326        462        831    architecture to improve their performance.
1000                  1156     526018        147       2059
1500                  1954    1230238        238       4265                      IX. REFERENCES
                                                                                           29B




2000                  2579    2361684        331       5434
                                                               [1] Kal S., “Bioinformatics: Sequence alignment and
2500                  2474    3325337        325      14056
                                                                   Markov models”. McGraw-Hill Professional, 2008.
3000                  3722    6842905        543      15466
                                                               [2] Vidanagamachchi, S. M. Dewasurendra, S. D. Ragel R.
                                                                   G. Niranjan, M. "Tile optimization for area in FPGA
                                                                   based hardware acceleration of peptide identification".
                                                                   Industrial and Information Systems (ICIIS) 2011, 6th


                                                                              www.ijorcs.org
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?                                                    37

       IEEE International Conference, 16-19 Aug, pp: 140-                Matching Algorithms for Biological Sequences”.
       145. doi: 10.1109/ICIINFS.2011.6038056                            Bioinformatics'11, 2011, pp. 274-277.
[3]    A.V. AHO, M.J. CORASICK, “Efficient string                 [15]   Dan Gusfield, “Algorithms on Strings, Trees and
       matching: an aid to bibliographic search”.                        Sequences: Computer Science and Computational
       Communications of the ACM, Vol. 18, No. 6, 1979, pp:              Biology”. Cambridge University Press, 1997.
       333-340. doi: 10.1145/360825.360855                        [16]   B. RAJU, S. DYLN, “Exact Multiple Pattern Matching
[4]    Commentz-Walter, B. "A String Matching Algorithm                  Algorithm using DNA Sequence and Pattern Pair”.
       Fast on the Average". Automata, Languages and                     International Journal of Computer Applications, Vol.
       Programming, Springer Berlin / Heidelberg, 1979, pp:              17, No. 8, 2011, pp. 32-38
       118-132                                                    [17]   Fisk, M. and Varghese, G. (2004) “Applying Fast String
[5]    Boyer Robert S., Moore J. STROTHER, "A fast string                Matching to Intrusion Detection”. University of
       searching algorithm". Communications of the ACM,                  California San Diego, 2004
       Vol. 20, No. 10, 1977, pp: 762-772. doi:                   [18]   S. BENFANO, M. ATU, W. NING and W. HAIBO,
       10.1145/359842.359859                                             "High-speed string matching for network intrusion
[6]    Komodia (2012), "Aho-Corasick source code".                       detection". International Journal of Communication
       Available: http://www.komodia.com/aho-corasick                    Networks and Distributed Systems, Vol. 3, No. 4, 2009,
[7]    Uniprot        (2012),     "Uniprot",        Available:           pp. 319-339
       http://www.uniprot.org/                                    [19]   Leena Salmela, "Improved Algorithms for String
[8]    Zeeshan A. KHAN, R.K. PATERIYA, " Multiple                        Searching Problems", Helsinki University of
       Pattern String Matching Methodologies: A Comparative              Technology, 2009
       Analysis". International Journal of Scientific and         [20]   Textolution (2012), “FuzzySearch Library”. Available:
       Research Publications (IJSRP), Vol. 2, No. 7, 2012, pp:           http://www.textolution.com/fuzzysearch.asp
       498-504                                                    [21]   K. RICHARD and R. MICHAEL, "Efficient
[9]    Yoginder S. DANDASS, Shane C. BURGESS, Mark                       randomized pattern-matching algorithms". IBM Journal
       LAWRENCE and Susan M. BRIDGES, “Accelerating                      of Research and Development, Vol. 31, No. 2, 1987, pp.
       String Set Matching in FPGA Hardware for                          249-260
       Bioinformatics Research”. BMC Bioinformatics 2008,         [22]   String searching algorithm, "String searching
       Vol. 9, No. 197, 2008                                             algorithm". Available: http://en.wikipedia.org/wiki
[10]   Jung, H. J. Baker, Z. K and Prasanna, V. K.                       /String_searching_algorithm#Finite_state_automaton_b
       “Performance of FPGA Implementation of Bit-split                  ased_search
       Architecture for Intrusion Detection Systems”.             [23]   D. KNUTH, J. MORRIS and V. PRATT, "Fast pattern
       Proceedings of the 20th international conference on               matching in strings". SIAM Journal on Computing, Vol.
       Parallel and distributed processing, ser. IPDPS’06,               6, No. 2, 1977, pp.323–350. doi: 10.1137/0206024
       2006, Washington, DC, USA: IEEE Computer Society,
       pp. 189–189.
[11]   Liu, Y. Yang, Y. Liu, P. and Tan, J. “A table
       compression method for extended aho-corasick
       automaton”. Proceedings of the 14th International
       Conference on Implementation and Application of
       Automata, ser. CIAA ’09. 2009, Berlin, Heidelberg:
       Springer-Verlag, pp. 84–93. doi: 10.1007/978-3-642-
       02979-0_12
[12]   Soewito, B. and Weng, N. "Methodology for
       Evaluating DNA Pattern Searching Algorithms on
       Multiprocessor". Proceedings of the 7th IEEE
       International Conference on Bioinformatics and
       Bioengineering, 2007, pp. 570 – 577. doi:
       10.1109/BIBE.2007.4375618
[13]   Tumeo, A. Villa, O. Chavarrı´a-Miranda, D.G. "Aho-
       Corasick String Matching on Shared and Distributed-
       Memory Parallel Architectures". IEEE Transactions on
       Parallel and Distributed Systems, 2012, pp. 436-443.
       doi: 10.1109/TPDS.2011.181
[14]   Kouzinopoulos, C. S., Michailidis, P. D. and Margaritis,
       K. G. “Experimental Results on Multiple Pattern


                                                          How to cite
         S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan, " Commentz-Walter: Any Better than
         Aho-Corasick for Peptide Identification?". International Journal of Research in Computer Science, 2 (6): pp.
         33-37, November 2012. doi:10.7815/ijorcs.26.2012.053



                                                                                    www.ijorcs.org

Contenu connexe

Tendances

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications redpel dot com
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)CSCJournals
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text University of Bari (Italy)
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...AIST
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered HarmfulPrateek Singh
 
A Comparative Result Analysis of Text Based Steganographic Approaches
A Comparative Result Analysis of Text Based Steganographic Approaches A Comparative Result Analysis of Text Based Steganographic Approaches
A Comparative Result Analysis of Text Based Steganographic Approaches iosrjce
 
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...Akira Taniguchi
 

Tendances (18)

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
B046021319
B046021319B046021319
B046021319
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 
I0343047049
I0343047049I0343047049
I0343047049
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
String Match | Computer Science
String Match | Computer ScienceString Match | Computer Science
String Match | Computer Science
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered Harmful
 
A Comparative Result Analysis of Text Based Steganographic Approaches
A Comparative Result Analysis of Text Based Steganographic Approaches A Comparative Result Analysis of Text Based Steganographic Approaches
A Comparative Result Analysis of Text Based Steganographic Approaches
 
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...
 

En vedette

IJORCS, Volume 2 - Issue 6, Call for Papers
IJORCS, Volume 2 - Issue 6, Call for PapersIJORCS, Volume 2 - Issue 6, Call for Papers
IJORCS, Volume 2 - Issue 6, Call for PapersIJORCS
 
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012IJORCS,Call For Papers- Volume 3, Issue 1,December 2012
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012IJORCS
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
 
Introductory Approach on Ad-hoc Networks and its Paradigms
Introductory Approach on Ad-hoc Networks and its Paradigms Introductory Approach on Ad-hoc Networks and its Paradigms
Introductory Approach on Ad-hoc Networks and its Paradigms IJORCS
 
Intrusion Detection Techniques In Mobile Networks
Intrusion Detection Techniques In Mobile NetworksIntrusion Detection Techniques In Mobile Networks
Intrusion Detection Techniques In Mobile NetworksIOSR Journals
 
A Robust Cryptographic System using Neighborhood-Generated Keys
A Robust Cryptographic System using Neighborhood-Generated KeysA Robust Cryptographic System using Neighborhood-Generated Keys
A Robust Cryptographic System using Neighborhood-Generated KeysIJORCS
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4IJORCS
 

En vedette (9)

IJORCS, Volume 2 - Issue 6, Call for Papers
IJORCS, Volume 2 - Issue 6, Call for PapersIJORCS, Volume 2 - Issue 6, Call for Papers
IJORCS, Volume 2 - Issue 6, Call for Papers
 
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012IJORCS,Call For Papers- Volume 3, Issue 1,December 2012
IJORCS,Call For Papers- Volume 3, Issue 1,December 2012
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
 
Introductory Approach on Ad-hoc Networks and its Paradigms
Introductory Approach on Ad-hoc Networks and its Paradigms Introductory Approach on Ad-hoc Networks and its Paradigms
Introductory Approach on Ad-hoc Networks and its Paradigms
 
Intrusion Detection Techniques In Mobile Networks
Intrusion Detection Techniques In Mobile NetworksIntrusion Detection Techniques In Mobile Networks
Intrusion Detection Techniques In Mobile Networks
 
A Robust Cryptographic System using Neighborhood-Generated Keys
A Robust Cryptographic System using Neighborhood-Generated KeysA Robust Cryptographic System using Neighborhood-Generated Keys
A Robust Cryptographic System using Neighborhood-Generated Keys
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4
 

Similaire à Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?

Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherIAEME Publication
 
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming TELKOMNIKA JOURNAL
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomicsShyam Sarkar
 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithmszexin wan
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...CSCJournals
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaaEditor Jacotech
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemIRJET Journal
 
Bat Algorithm: Literature Review and Applications
Bat Algorithm: Literature Review and ApplicationsBat Algorithm: Literature Review and Applications
Bat Algorithm: Literature Review and ApplicationsXin-She Yang
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...rahulmonikasharma
 

Similaire à Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? (20)

Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcher
 
50620130101006
5062013010100650620130101006
50620130101006
 
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 
String Searching and Matching
String Searching and MatchingString Searching and Matching
String Searching and Matching
 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithms
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
4 report format
4 report format4 report format
4 report format
 
4 report format
4 report format4 report format
4 report format
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
 
Bat Algorithm: Literature Review and Applications
Bat Algorithm: Literature Review and ApplicationsBat Algorithm: Literature Review and Applications
Bat Algorithm: Literature Review and Applications
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
 

Plus de IJORCS

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemIJORCS
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2IJORCS
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1IJORCS
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3IJORCS
 
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks IJORCS
 
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing IJORCS
 
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...IJORCS
 

Plus de IJORCS (20)

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression Function
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETs
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3
 
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks
Dynamic Map and Diffserv Based AR Selection for Handoff in HMIPv6 Networks
 
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
 
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...
Prediction of Atmospheric Pressure at Ground Level using Artificial Neural Ne...
 

Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?

  • 1. International Journal of Research in Computer Science eISSN 2249-8265 Volume 2 Issue 6 (2012) pp. 33-37 www.ijorcs.org, A Unit of White Globe Publications doi: 10.7815/ijorcs. 26.2012.053 COMMENTZ-WALTER: ANY BETTER THAN AHO- CORASICK FOR PEPTIDE IDENTIFICATION? S.M. Vidanagamachchi1, S.D. Dewasurendra 2, R.G. Ragel 2, M.Niranjan3 1 Department of Computer Science, Faculty of Science, University of Ruhuna, Matara, SRI LANKA Email: smv@dcs.ruh.ac.lk 2 Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, SRI LANKA Email: devapriyad@pdn.ac.lk, roshanr@pdn.ac.lk 3 Department of Electronics and Computer Science, Faculty of Physical and Applied Sciences, University of Southampton, UNITED KINGDOM Email: mn@ecs.soton.ac.uk Abstract: An algorithm for locating all occurrences of used in computational biology for matching biological a finite number of keywords in an arbitrary string, also sequences [1]. Multiple Sequence Alignment (MSA) is known as multiple strings matching, is commonly an alignment of three or more sequences that contain required in information retrieval (such as sequence thousands of residues (nucleotides or amino acids). analysis, evolutionary biological studies, gene/protein Multiple pattern matching plays a vital role in identification and network intrusion detection) and text identifying thousands of patterns simultaneously editing applications. Although Aho-Corasick was one within a given text. The results of MSA can be used to of the commonly used exact multiple strings matching infer sequence homology as well as conduct algorithm, Commentz-Walter has been introduced as a phylogenetic analysis and multiple sequence matching better alternative in the recent past. Comments-Walter is used for identifying proteins/ DNA/ Peptides/ algorithm combines ideas from both Aho-Corasick and motifs/ domains, managing security (for example, Boyer Moore. Large scale rapid and accurate peptide intrusion detection), editing texts, etc. In multiple identification is critical in computational proteomics. sequence alignment we use a database of large sizes of In this paper, we have critically analyzed the time sequence data over a query sequence and it can be complexity of Aho-Corasick and Commentz-Walter for identified as one form of multiple string/pattern their suitability in large scale peptide identification. matching. In this paper, we have used a set of According to the results we obtained for our dataset, predefined peptides and located them over a given we conclude that Aho-Corasick is performing better arbitrary protein (the protein can be known/ unknown/ than Commentz-Walter as opposed to the common predicted). beliefs. Keywords: Aho-Corasick, Commentz-Walter, Peptide Exact (as opposed to approximate) string matching Identification is commonly required in many fields, as it provides practical solutions to the real world problems. For example, exact string matching can be used for I. INTRODUCTION sequence analysis, gene finding, evolutionary biology 21B String matching has been extensively studied in the studies and protein expression analysis in the fields of past three decades and is a classical problem genetics and computational biology [8][15][16]. Other fundamental to many applications that need processing areas are network intrusion detection [17][18][19] and of either text data or some sequence data. Many string text processing [20]. matching algorithms are used in locating one or several string patterns that are found within a larger In this paper we basically compare time text. For a given text T and a pattern set P, string complexities for two multiple pattern matching matching can be defined as finding all occurrences of algorithms: Aho- Corasick and Commentz-Walter for pattern P in text T. the identification of large scale protein data. With the remarkable increase of data in biological We start by giving brief introduction of string databases (such as what is available in GenBank and matching algorithms in section II. Section III includes UniProt) there exists a bottleneck in analyzing the objectives of the experiment. Related work, biological sequences. Pair wise sequence alignment, experimental set up and methods, results of the multiple sequence alignment, global alignment and experiments, conclusion and future scope are discussed local alignment are types of string matching techniques www.ijorcs.org
  • 2. 34 S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan in section IV, section V, section VI, section VII and Boyer-Moore algorithm [5] is an efficient, single sections VIII respectively. and exact pattern matching algorithm that preprocesses the pattern being searched for, but not the text being II. STRING MATCHING ALGORITHMS 2B searched in. It searches for occurrences of pattern P in text T by applying shift rules: bad character rule and String matching (either exact or approximate) good suffix rule. Bad character rule deals with the algorithms can be basically classified into three amount of shifting along the pattern P to the left side groups: single pattern matching algorithms (search for when there is a mismatch character with the text T and a single pattern within the text), algorithms that uses good suffix rule deals with the amount of shift when finite number of patterns (search finite number of there is a mismatch with a reoccurring suffix of the patterns within the text) and algorithms that uses pattern. Shifting rules make this algorithm more infinite number of patterns (search infinite number of efficient than other single pattern matching algorithms. patterns such as regular expressions within the text). Single pattern matching algorithms include Rabin- Commentz-Walter multi pattern matching Karp [21], finite state automation based search [22], algorithm combines the shifting methods of Boyer Knuth-Morris-Pratt [23] and Boyer-Moore [5]. Rabin- Moore with Aho-Corasick and therefore theoretically Karp algorithm can be used for both single pattern faster in locating a string than Aho-Corasick. Like matching and for matching a finite number of patterns. Aho-Corasick, Commentz-Walter also consists of two For Rabin-Karp, in a string of length n, if p patterns of phases: FSM construction and shift calculation phase combined length m are to be matched, the best and (preprocessing) and matching phase. Here FSM is worst case running time are O(n+m) and O(nm) constructed reverse (reversed trie) (Figure 2) in order respectively. Finite state automata based search is to use the shifting methods of Boyer-Moore algorithm. expensive to construct (power-set construction), but We have used 3000 sets of peptides to construct the very easy to use. Knuth-Morris-Pratt (KMP) algorithm state machine and then matched them against the turns the search string into a finite state machine, and known/unknown set of proteins and calculated and then runs the machine with the string to be searched as compared the time consumed by both Aho-Corasick the input string. Running time of KMP algorithm is and Commentz-Walter algorithms. O(n + m). Algorithms used for finite number of patterns (multi patterns) include Aho-Corasick, Commentz-Water, Rabin-Karp and Wu-Manber and bit parallel algorithms [2]. They are common multiple pattern matching algorithms and in this paper we mainly focus on analyzing the time complexity among two algorithms: Aho-Corasick [3] and Commentz- Walter [4]. Aho-Corasick algorithm is a classic and scalable solution for exact string matching and is a widely used multi pattern matching algorithm. It has the running time of O(n+m+z) where n represents the length of the Figure 1: Aho-Corasick trie implementation for patterns he, text, m represents the length of patterns and z she, his and hers represents the number of pattern occurrences in the text T (when the search tree is built off-line and compiled, it has the worst-run-time complexity of O(n For example if we have four patterns he, hers, his + z) in space O(m)). Therefore, the time complexity of and she, in both algorithms we start FSM creation with this algorithm is linear to the length of the patterns state 0(root node). If we add pattern ‘he’ first, in Aho- plus the length of the searched text plus the number of Corasick we create a new node (state 1), which matches in the text. This is the major advantage of this represents character ‘h’, then we create another node algorithm. Aho-Corasick algorithm consists of two (state 2), which represents character ‘e’ (Figure 1). main phases: Finite State Machine (FSM) construction Then we add pattern ‘she’. Since there is no node phase (Figure 1) and matching phase. In the FSM representing character ‘s’, we add a new node (state 3), construction phase this algorithm uses the pattern set then add two more nodes representing character ‘h’ and first constructs the state machine and then and ‘e’. Likewise we add other two patterns to the state considers failure links to avoid backtracking to root machine and finally we create failure links. In node when there is a failure. In the second phase it Commentz-Walter we add first pattern ‘he’ after locates the pattern set within the given text if they reversing the keyword (Figure 2). We use the same occur in the string. method in the Commentz-Walter but finally we don’t create failure links (instead we use shifting whenever a failure occurs). www.ijorcs.org
  • 3. Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? 35 Algorithm 1 and 2 are used in the implementations of the two algorithms concerned. Algorithm 1: Aho-Corasick Pseudocode 1. for each pattern P=1 to N 2. addToTheTrie(); 3. end for 4. for each Protein R= 1 to M 5. searchAhoCorasickMultiple(); 6. end for Figure 2: Commentz-Walter trie implementation for patterns he, she, his and hers Algorithm 2: Commentz-Walter Pseudocode 1. createDepthShiftVector(); III. OBJECTIVES 23B 2. createSubStringShiftVector(); 3. for each pattern P=1 to N As mentioned earlier Aho-Corasick is one of the 4. reversePattern(); commonly used exact multiple string matching 5. addToTheTrie(); algorithm and Commentz-Walter was introduced as a 6. end for better alternative with a combination of ideas from 7. for each Protein R= 1 to M both Aho-Corasick and Boyer-Moore. The main 8. searchCommentzWalterMultiple(); objective of this paper is to compare the time 9. end for complexity of the two algorithms with a large set of peptide data for peptide identification. Both algorithms were implemented in Visual Studio IV. RELATED WORK 24B 2008 environment and run in Intel Core i7 2.2 GHz machine with 4GB RAM and 32bit OS. In this A comparative analysis has been done by Zeeshan implementation, Aho Corasick algorithm uses et al. [8] where they have compared time complexity, predefined, different sets (sizes from 500 to 3000) of search type and approach of five string multi-pattern peptides to create the FSM (trie) first. For this FSM we matching algorithms: Aho-Corasick, Rabin-Karp, Bit have used a trie data structure which has several nodes Parallel, Commentz-Walter and Wu-Manber. representing states and each node consists of a failure However, they have not used real data for their node, a character, status of the node (whether a match comparison. They have mentioned that because of the can be found/not) and the depth from the root as difficulty in understanding the Commentz-Walter parameters. Then a set of 100 Drosophila proteins algorithm, it has not emerged as a popular string extracted from UniProt database [7] was used as the matching algorithm. input and this programme performed the multiple Experimental results of the Aho-Corasick, Wu- patterns matching and found all the patterns occurring Manber, SBOM and Salmela’s algorithms [19] were within a protein in a single pass. When we have an presented by Charalampos et al. where they run these input protein to be processed, first our matching is algorithms against SWISS PROT amino acid database, started from the state 0 (root node) and it is matched amino acid and nucleotides sequences of the A- against the first amino acid in the protein. If it is thaliana genome for the sets of size between 100 – matched we proceed to the next state and else we 100000 patterns with the length of 8 and 32. Result backtrack to the 0 state since 0 is the state of the showed that the algorithms are performing differently failure node. Wherever we find any matching state, for different databases. According to that experiment, algorithm indicates the match and proceeds if it is not Commentz-Walter had the best performance on the a leaf node. If it is a leaf node it indicates the match FASTA nucleic acid database for more than 10000 and stops. patterns. Wu-Manber has performed the best for the In the Commentz-Walter implementation, we first FASTA Amino acid database for more than 10000 to created a reversed trie with predefined, different sizes 50000 patterns [14]. of sets (from 500 to 3000) of peptides to create the reversed trie (FSM) where each node consists of state V. EXPERIMENTAL SETUP AND METHODS 25B of the node, depth from the root, status of the node We modified Komodia Aho-Corasick [6] for our (whether a match can be found/not), and a character as purpose and developed our own implementation of parameters. Then we calculated the amount of shifts Commentz-Walter algorithm in C++ programming for all suffixes of patterns when there is a mismatch of language as we could not find any implementation in a suffix (shift vector) and calculated the amount of the literature. The pseudo-codes presented in shift when there is a mismatched character (depth www.ijorcs.org
  • 4. 36 S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan vector). As mentioned earlier the same set of 100 VI. RESULTS 26B proteins were used as the input and our program Results show that the average times of 85, 186, 234, performed multiple pattern matching. In this algorithm 327, 359 and 542 milliseconds were needed for 500, first we find the minimum size of a peptide (Min) and 1000, 1500, 2000, 2500 and 3000 peptides respectively then we align Min+1th position of the protein (Table 1) in machine construction/ preprocessing sequence with the root node and then start the phase of Aho-Corasick algorithm. In Commentz- matching process from the root. If a child of the root Walter algorithm, preprocessing phase consists of node gets matched with the amino acid aligned (Min th three steps: constructing shift vector, constructing position) then we could proceed until mismatch or depths vector in case of mismatching and constructing match is found (if a match is found we can go to the the finite state machine. To construct the state root and start, if mismatch is found we have to find the machine, this algorithm requires average times of 795, amount of suffix shift required), else we shift by the 1156, 1954, 2579, 2474 and 3722 milliseconds (Table amount that is included in the shift vector. 2). These times required for state machine construction In our experiments we used a maximum of 58 is higher than the time required for Aho-Corasick amino acids (a.a), a minimum of 4 a.a and an average algorithm with the similar set of peptides. However the of 14 a.a in length in the set of 500 peptides, a trie constructed in Commentz-Walter algorithm is maximum of 82 amino acids a.a, a minimum of 4 a.a different than the trie developed in Aho-Corasick and an average of 14 a.a in length in the set of 1000 algorithm because it is using reversed keywords. Shift peptides, a maximum of 103 amino acids a.a, a vector construction is the critical step in Commentz- minimum of 3 a.a and an average of 14 a.a in length in Walter algorithm since it consumes considerably the set of 1500 peptides, a maximum of 125 amino higher amount of time (Table 2 column 3). It includes acids a.a, a minimum of 3 a.a and an average of 14 a.a all the suffixes of patterns and amount of shifts when in length in the set of 2000 peptides, a maximum of 60 there is a mismatch of a suffix. Depth vector consists amino acids a.a, a minimum of 4 a.a and an average of of the amount of shift when there is a mismatched 12 a.a in length in the set of 2500 peptides and used a character (Table 2 column 4). However, if we consider maximum of 159 a.a, a minimum of 4 a.a and an the matching time (Table 1 column 3 and Table 2 average of 14 a.a in length in the set of 3000 peptides. column 5) it also takes more time in Commentz-Walter We test with the 100 input proteins which have an algorithm than Aho-Corasick algorithm. average length of 531 a.a. We performed the experiments 10 times each in order to take the average. VII. CONCLUSIONS 27B Table 1: Time in milliseconds: Aho-Corasick Algorithm According to the results presented, we could conclude that the Aho-Corasick algorithm is performing better Number of Average Time consumed (ms) in time consumed than Commentz-Walter for large patterns in the Machine construction Matching process sets of peptide data since Commentz-Walter algorithm FSM needs more time for preprocessing (to calculate shifts, 500 85 85 depths in case of mismatching and to construct the 1000 186 186 state machine). 1500 234 234 2000 327 327 VIII. FUTURE SCOPE 2500 359 359 28B 3000 542 542 Parallelization of Aho-Corasick algorithm on an FPGA has already been performed [9][10][11][2]. Table 2: Time in milliseconds: Commentz-Walter Algorithm Parallelizing string matching using multi-core architecture has been done by several research groups Number Average Time consumed (ms) [12][13]. As future work, we plan to reduce the of Machine Shift Depth Matching number of states in the trie (Aho-Corasick and patterns construction Vector vector process in the Commentz-Walter) using optimization algorithms and FSM parallelize these two algorithms in a multi-core 500 795 112326 462 831 architecture to improve their performance. 1000 1156 526018 147 2059 1500 1954 1230238 238 4265 IX. REFERENCES 29B 2000 2579 2361684 331 5434 [1] Kal S., “Bioinformatics: Sequence alignment and 2500 2474 3325337 325 14056 Markov models”. McGraw-Hill Professional, 2008. 3000 3722 6842905 543 15466 [2] Vidanagamachchi, S. M. Dewasurendra, S. D. Ragel R. G. Niranjan, M. "Tile optimization for area in FPGA based hardware acceleration of peptide identification". Industrial and Information Systems (ICIIS) 2011, 6th www.ijorcs.org
  • 5. Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? 37 IEEE International Conference, 16-19 Aug, pp: 140- Matching Algorithms for Biological Sequences”. 145. doi: 10.1109/ICIINFS.2011.6038056 Bioinformatics'11, 2011, pp. 274-277. [3] A.V. AHO, M.J. CORASICK, “Efficient string [15] Dan Gusfield, “Algorithms on Strings, Trees and matching: an aid to bibliographic search”. Sequences: Computer Science and Computational Communications of the ACM, Vol. 18, No. 6, 1979, pp: Biology”. Cambridge University Press, 1997. 333-340. doi: 10.1145/360825.360855 [16] B. RAJU, S. DYLN, “Exact Multiple Pattern Matching [4] Commentz-Walter, B. "A String Matching Algorithm Algorithm using DNA Sequence and Pattern Pair”. Fast on the Average". Automata, Languages and International Journal of Computer Applications, Vol. Programming, Springer Berlin / Heidelberg, 1979, pp: 17, No. 8, 2011, pp. 32-38 118-132 [17] Fisk, M. and Varghese, G. (2004) “Applying Fast String [5] Boyer Robert S., Moore J. STROTHER, "A fast string Matching to Intrusion Detection”. University of searching algorithm". Communications of the ACM, California San Diego, 2004 Vol. 20, No. 10, 1977, pp: 762-772. doi: [18] S. BENFANO, M. ATU, W. NING and W. HAIBO, 10.1145/359842.359859 "High-speed string matching for network intrusion [6] Komodia (2012), "Aho-Corasick source code". detection". International Journal of Communication Available: http://www.komodia.com/aho-corasick Networks and Distributed Systems, Vol. 3, No. 4, 2009, [7] Uniprot (2012), "Uniprot", Available: pp. 319-339 http://www.uniprot.org/ [19] Leena Salmela, "Improved Algorithms for String [8] Zeeshan A. KHAN, R.K. PATERIYA, " Multiple Searching Problems", Helsinki University of Pattern String Matching Methodologies: A Comparative Technology, 2009 Analysis". International Journal of Scientific and [20] Textolution (2012), “FuzzySearch Library”. Available: Research Publications (IJSRP), Vol. 2, No. 7, 2012, pp: http://www.textolution.com/fuzzysearch.asp 498-504 [21] K. RICHARD and R. MICHAEL, "Efficient [9] Yoginder S. DANDASS, Shane C. BURGESS, Mark randomized pattern-matching algorithms". IBM Journal LAWRENCE and Susan M. BRIDGES, “Accelerating of Research and Development, Vol. 31, No. 2, 1987, pp. String Set Matching in FPGA Hardware for 249-260 Bioinformatics Research”. BMC Bioinformatics 2008, [22] String searching algorithm, "String searching Vol. 9, No. 197, 2008 algorithm". Available: http://en.wikipedia.org/wiki [10] Jung, H. J. Baker, Z. K and Prasanna, V. K. /String_searching_algorithm#Finite_state_automaton_b “Performance of FPGA Implementation of Bit-split ased_search Architecture for Intrusion Detection Systems”. [23] D. KNUTH, J. MORRIS and V. PRATT, "Fast pattern Proceedings of the 20th international conference on matching in strings". SIAM Journal on Computing, Vol. Parallel and distributed processing, ser. IPDPS’06, 6, No. 2, 1977, pp.323–350. doi: 10.1137/0206024 2006, Washington, DC, USA: IEEE Computer Society, pp. 189–189. [11] Liu, Y. Yang, Y. Liu, P. and Tan, J. “A table compression method for extended aho-corasick automaton”. Proceedings of the 14th International Conference on Implementation and Application of Automata, ser. CIAA ’09. 2009, Berlin, Heidelberg: Springer-Verlag, pp. 84–93. doi: 10.1007/978-3-642- 02979-0_12 [12] Soewito, B. and Weng, N. "Methodology for Evaluating DNA Pattern Searching Algorithms on Multiprocessor". Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 2007, pp. 570 – 577. doi: 10.1109/BIBE.2007.4375618 [13] Tumeo, A. Villa, O. Chavarrı´a-Miranda, D.G. "Aho- Corasick String Matching on Shared and Distributed- Memory Parallel Architectures". IEEE Transactions on Parallel and Distributed Systems, 2012, pp. 436-443. doi: 10.1109/TPDS.2011.181 [14] Kouzinopoulos, C. S., Michailidis, P. D. and Margaritis, K. G. “Experimental Results on Multiple Pattern How to cite S.M. Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M.Niranjan, " Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?". International Journal of Research in Computer Science, 2 (6): pp. 33-37, November 2012. doi:10.7815/ijorcs.26.2012.053 www.ijorcs.org