SlideShare a Scribd company logo
1 of 6
Download to read offline
International Journal of Research in Computer Science
eISSN 2249-8265 Volume 2 Issue 3 (2012) pp. 1-6
© White Globe Publications
www.ijorcs.org


    A COMPARISON OF COMPUTATION TECHNIQUES
         FOR DNA SEQUENCE COMPARISON
                                         Harshita G. Patil1, Manish Narnaware2
                 *M-Tech (CSE) G.H.Raisoni College of Engineering, Hingana Road, Nagpur, India
                                         1
                                          Email: patil.harshita27@rediffmail.com,
                                          2
                                          Email: manish.narnaware@yahoo.com

  Abstract: This Project shows a comparison survey           or bases known as adenine, thymine, cytosine and
done on DNA sequence comparison techniques. The              guanine. Since the bases are paired, they are referred to
various techniques implemented are sequential                as base pairs. All the DNA of a living organism is
comparison, multithreading on a single computer and          called its genome. The size of a genome can vary from
multithreading using parallel processing. This Project       millions of base pairs for bacteria to several billions
shows the issues involved in implementing a dynamic          base pairs in the case of mammals.
programming algorithm for biological sequence
                                                                The nearly exponential growth rate of biological
comparison on a general purpose parallel computing
                                                             sequence database threatens to overwhelm existing
platform Tiling is an important technique for
                                                             computational methods for biological data analysis and
extraction of parallelism. Informally, tiling consists of
                                                             searching. The computational demand needed to
partitioning the iteration space into several chunks of
                                                             explore and analyze the data contained in these
computation called tiles (blocks) such that sequential
                                                             databases is quickly becoming a great concern. To
traversal of the tiles covers the entire iteration space.
                                                             meet these demands, we must use high performance
The idea behind tiling is to increase the granularity of
                                                             computing systems, such as parallel computers.
computation and decrease the amount of
                                                             Biological sequences can be treated as strings over a
communication incurred between processors. This
                                                             fixed alphabet of characters, a, c, t and g. An
makes tiling more suitable for distributed memory
                                                             alignment is a way of stacking one sequence above
architectures where communication startup costs are
                                                             the other and matching characters from the two
very high and hence frequent communication is
                                                             sequences that lie in the same position. From the
undesirable. Our work to develop sequence-
                                                             alignment, we can find two subsequences contained
comparison mechanism and software supports the
identification of sequences of DNA.                          respectively in two sequences that have the most
                                                             similarity. The problem of the biological sequence
Keywords: Dynamic Programming                 Algorithms,    alignment is the most faced in the exploring and
FASTA, Sequences Alignment, Tiling.                          analyzing the data, no matter in sequence assembly,
                                                             comparison of homology, finding of gene coding
                 I. INTRODUCTION                             region and prediction of protein’s structure and
                                                             function.
   Comparing DNA sequences is one of the basic
tasks in computational biology. In bioinformatics, a            In this paper we consider the parallelization of this
sequence alignment is a way of arranging the                 implementation, since parallelization of an iterative
sequences of DNA, RNA, or protein to identify                implementation of the algorithm would not be feasible.
regions of similarity that may be a consequence of           There has been significant recent work on the
functional, structural, or evolutionary relationships        parallelization of dynamic programming algorithms in
between the sequences. Aligned sequences of                  computational biology including implementations
nucleotide or amino acid residues are typically              suitable for computational grids. What distinguishes
represented as rows within a matrix. Gaps are inserted       this work is the data-driven recursive implementation,
between the residues so that identical or similar            with resulting dynamically allocated tasks. The rest of
characters are aligned in successive columns.                this paper is organized as follows. In section 2, we
                                                             provide Needleman-Wunsch and Smith Waterman
    DNA (deoxyribonucleic acid) is the chemical              algorithm. Section 3 provides parallel processing
material in a cell that carries the genetic codes for        techniques. The parallel computation technique using
living organisms. Its structure is a double helix            multi-core architecture for DNA sequence comparison
consisting of two sequences of letters from a four-          is shown in section 4, and conclusions about the
letter alphabet (A, T, C, G), such that A is paired with     completed work are discussed in section 5.
T, and C with G. The letters represent the nucleotides


                                                                               www.ijorcs.org
2                                                                                    Harshita G. Patil, Manish Narnaware

      II. NEEDLEMAN-WUNSCH AND SMITH
             WATERMAN ALGORITHM
                                                               Input Sequence File1                             Final Score
   There are several methods for alignment of two                                           Sequential
biological sequences. The dynamic programming is               Input Sequence File2         Processing
probably the most popular programming method in
sequences alignment. The Needleman-Wunsch
                                                                              Figure 1: Sequential Processing
algorithm, the first algorithm applying the dynamic
programming to comparing biological sequences, was
proposed by Needleman and Wunsch in 1970. Later,              A. Sequence comparison using similarity matrix
Smith and Waterman improved the Needleman-                        This consists of two parts: the calculation of the
Wunsch algorithm and proposed the well-known                  total score indicating the similarity between the two
Smith-Waterman algorithm. The time complexity of              given sequences, and the identification of the
these algorithms is O(mn), where m, n are the lengths         alignment(s) that lead to the score. In this paper we
of the two sequences respectively. Because the cores          will concentrate on the calculation of the score, since
of these algorithms are dynamic programming, all              this is the most computationally expensive part. The
algorithms need to manipulate an (n+1) (m+1) matrix,          idea behind using dynamic programming is to build up
named dynamic programming matrix. The most time               the solution by using previous solutions for smaller
spent in these algorithms is calculating the dynamic          subsequences. The comparison of the two sequences X
programming matrix, so research work on                       and Y, using the dynamic programming mechanism, is
parallelization of two sequences alignment focuses            illustrated in Figure 2. This finds global alignments by
mostly on the calculation of the matrix. As the growth        comparing entire sequences. The sequences are placed
of biological sequence database, the length of                along the left margin (X) and on the top (Y). A
sequences often becomes very long, and the size of the        similarity matrix is initialized with decreasing values
matrix becomes very large. Thus, not only the                 (0,-1,-2,-3,…) along the first row and first column to
execution time of these algorithms needs to be very           penalize for consecutive gaps (insertions or deletions).
long, the memory space needed in the algorithm
becomes very large. Even in some cases the size of the           The other elements of the matrix are calculated by
matrix is bigger than the size of memory space in one         finding the maximum value among the following three
processor.                                                    values: the left element plus gap penalty, the upper-left
                                                              element plus the score of substituting the horizontal
    Two input sequence files are compared sequentially and
final score is computed.
                                                              symbol for the vertical symbol, and the upper element
                                                              plus the gap penalty.




                                                Figure 2: Similarity Matrix
     For the general case where X = x1,…, xi and Y =                  In our example, gp is -1, and ss is 1 if the
 y1,…, yj, for i = 1,.., n and j = 1,…,m, the similarity       elements match and 0 otherwise. However, other
 matrix SM[n;m] is built by applying the following             general values can be used instead. Following this
 recurrence equation, where gp is the gap penalty and ss       recurrence equation, the matrix is filled from top left to
 is the substitution score:                                    bottom right with entry [i; j] requiring the entries [i, j -
                                                               1], [i – 1, j -1], and [i-1, j]. Notice that SM[i; j]
                                                               corresponds to the best score of the subsequences
                                                        (1)
                                                               x1,…, xi and y1,…, yj. Since global alignment takes


                                                                                 www.ijorcs.org
A Comparison of Computation Techniques for DNA Sequence Comparison                                                                 3

into account the entire sequences, the final score will      all processors busy through most of the computation.
always be found in the bottom right hand corner of the       For example, processor 1 initially works with the first
matrix. In our example, the final score 4 gives us a         strip, then simultaneously with the first and fifth strip,
measure of how similar the two sequences is. Figure 2        then finally only with the fifth strip. The processor
shows the similarity matrix and the two possible             utilization rises to 75%.
alignments (arrows going up and left).

            III. PARALLEL COMPUTATION

   A parallel version of the sequence comparison
algorithm using dynamic programming must handle
the data dependences presented by this method, yet it
should perform as many operations as possible
independently. This may present a serious challenge
for efficient parallel execution on current general
purpose parallel computers, i.e., MIMD (Multiple                      Figure 4: Partition of the similarity matrix
Instruction stream, Multiple Data stream computers).

                                                              IV. PARALLEL COMPUTATION TECHNIQUE
                            Multi-                                 USING MULTI-CORE ARCHITECTURE
Input Sequence File1                     Final Score
                           Threading                           The input DNA sequences are collected in FASTA
Input Sequence File2          on
                                                             format. The FASTA files are converted to sequential
                            Single
                                                             sequence file using the convertor. The converted files
Tile Size                  Computer     Time T1              are then compared on the distributed environment to
                                                             compute the final score. This process is shown in
   Figure 3: Multi -Threading on Single Computer             Figure 5.
   Input is two sequence files and tile size. Two input                                  Convertor
sequence files are compared. The final score is
computed using the concept of multithreading on a
                                                                 Sequence file
single computer.                                                                                        Sequential sequence file

    One solution to this problem is to divide the
similarity matrix into rectangular blocks, as shown in
                                                                    >Seq1                                          AGCTCC
Figure 4(a). In this example, the program would                                                                    CCTAAT
compute block 1 first, followed by 2 and 5, etc. If each            AGCTCC                                         AGGGCT
block has q rows and r columns, then the computation                CCTAAT                                         TTTGCC
of a given block requires only the row segment                      AGGGCT
                                                                    TTTGCC
immediately above the block, the column segment to
its immediate left, and the element above and to the
left … a total of q +r +1 elements. For instance, if each     Final Score           Parallel Sequence Comparison
block has 4 rows and 4 columns, then each block has
                                                                      Output
to compute 16 maxima after receiving 9 input values.
                                                                                                              Two Input Files
The communication-to-computation ratio drops from
3:1 to 9:16, an 81% reduction!
                                                                 Figure 5: Parallel Computation Technique for DNA
   Note that this blocking will decrease the maximum                           Sequence Comparison
achievable parallelism somewhat, by introducing some
sequential dependence in the code. However, given the        A. FASTA format
sizes of the current problems and the parallel machines         In bioinformatics, FASTA format is a text-based
currently used, this potential loss will not be a limiting   format for representing either nucleotide sequences or
factor.                                                      peptide sequences, in which base pairs or amino acids
   The load-balancing problem can be addressed by            are represented using single-letter codes. The format
putting several rows of blocks (or “strips”) on the same     also allows for sequence names and comments to
processor. Figure 4(b) illustrates this approach when        precede the sequences. The format originates from the
four processors are used. The first and fifth strips are     FASTA software package.
assigned to processor 1, the second and sixth strips are
assigned to processor 2 and so on. This helps to keep


                                                                                 www.ijorcs.org
4                                                                                                      Harshita G. Patil, Manish Narnaware

Example of a simple FASTA file                                                       Sequence File
                                                                                     (Sequence Y file)
                                                                                     3. Input file size i.e. tile
> seq1 This is the description of my first sequence.                                 width and tile height.
AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCT                                                    4. Calculate the number
AGTACGACGTAGATGCTAGCTGACTCGATGC                                                      of tiles required to
> seq2 This is a description of my second sequence.                                  calculate the final
CGATCGATCGTACGTCGACTGATCGTAGCTAC                                                     score.
                                                                                     5. Calculate the number
GTCGTCATCGTCAGTTACTGCATGCTCG                                                         of diagonal rows
                                                                                     required to calculate.
B. Parallel Algorithm Models                                                         6. Read sequence X
                                                                                     data for tile(0,0) and
                                                                                     send request to slave 1.
                                                                                                                    1. Calculate the matrix
                                      Parallel                                                                      of required tile in
Input Sequence File1                                     Final Score
                                    Processing                                                                      different thread.
                                       Using                                         7. Repeat step 6 for n
Input Sequence File2                Multi-core                                       number of slave and
 Tile Size                                               Time T2                     wait for the tile
                                    Architecture
                                                                                     response.
                                                                                                                    2. Return the file to the
                                                                                                                    master.
Figure 6: Parallel processing using multi-core architecture                          8. Accept tile response
       Complete project will take the input as two                                   from the slave.
                                                                                     9. Collect all tiles of
sequence file and tile size from the user and system                                 each diagonal row.
will calculate the final score by comparing sequences                                10. Repeat step 6 to 9
of given file by using multiple machine connected in                                 for next diagonal row.
network.
                                                                               Figure 8: Interaction between Master and Slave
      Master-Slave Model: One or more processes
generate work and allocate it to worker processes                              C. Similarity Matrix
shown in Figure 7.                                                               The similarity matrix used for sequential
                                                                               comparison is used in this approach also for DNA
                                    *Master/Slave              *Master/Slave
              *Master/Slave                                                    comparison but it is performed parallel on different
                                        Application             Application
               Application                                                     machines using tiling technique.
                                                                               D. Tiling Technique
                                                                                  Tiling is an important technique for extraction of
                              Tile                Tile
                                                                               parallelism. Informally, tiling consists of partitioning
   Sequence                   Request
   File                                           Response                     the iteration space into several chunks of computation
                                                                               called tiles (blocks) such that sequential traversal of
                                                                               the tiles covers the entire iteration space. The idea
      *Master/Slave            *Master/Slave                 *Master/Slave
                                                                               behind tiling is to increase the granularity of
       Application               Application                  Application      computation and decrease the amount of
                                                                               communication incurred between processors. This
                                               USER                            makes tiling more suitable for distributed memory
       Figure 7: Distributed System used for comparison                        architectures where communication startup costs are
       When user operates the application, the                                 very high and hence frequent communication is
application acts as master to perform the task of                              undesirable.
sequence file comparison and other machines in which                             Tiling is a well-established technique to enhance
the application is installed will act as slave- returns the                    data locality or coarsen the grain of parallelism in loop
result of tile matrix requested by master. Any machine                         programs. The iteration space of the program is
connected in network can act as master or slave. The                           covered with (usually congruent) tiles and the
interaction between master and slave is shown in                               enumeration of the iteration points is changed so as to
Figure 8.                                                                      enumerate the tiles in the outer dimensions (i.e. loops)
                 Master                               Slave
                                                                               and the points within each such tile in the inner
         1. Load Sequential
                                                                               dimensions. The shape and size of the tiles is usually
         Sequence File                                                         chosen dependent on the dependence vectors to
         (Sequence X file)
         2. Load next Sequential



                                                                                                   www.ijorcs.org
A Comparison of Computation Techniques for DNA Sequence Comparison                                                        5

minimize communication startups and the volume of
the data communicated, especially in the context of
distributed-memory architectures. For shared-memory
systems, the number of startups and the volume are
less of a concern, as long as the transfer time of the
data between cores stays small compared to the
computation time for each tile.
    The Sequential Sequence File will be arranged in
similarity matrix and the matrix will be divided into
tiles. One tile will be allocated to one machine for       Figure10 (a): Computation on Single machine using Parallel
computation. The computations will be done                                        Processing
diagonally. Number of tiles coming under one
diagonal line will be allocated to machine in network
depending upon the number of machines available in
the network. When computation of one diagonal row is
complete then only the computation for the next
diagonal line will start. In this way the complete
matrix will be filled and the last cell of the matrix is
the final score which will be equal to the size of
sequential sequence file in case of match and if final
score is half of the size we can say 50% of the DNA is
matched and if final score is negative it indicates
mismatch.                                                        Figure 10 (b): Computation on distributed system using
                                                                                 mullti-core achitecture
    B(0,0)    B(0,1)                B(0,m-1)


                                                                               VI. REFERENCES
                                                           Conferences
    B(1,0)                                     B(1,m-1)
                                                           [1] Sudha Gunturu*, Xiaolin Li*, and Laurence Tianruo
                                                                 Yang** “Load Scheduling Strategies for Parallel DNA
                                                                 Sequencing Applications” 11th IEEE International
                                                                 Conference on High Performance Computing and
                                                                 Communications 2009
                       B(m-1,m-1)                          [2]   Armin Gr¨oßlinger “Some Experiments on Tiling Loop
                                                                 Programs for Shared-Memory Multicore Architectures”
                                                                 Dagstuhl Seminar Proceedings 07361 Programming
               Figure 9: Tiling Technique                        Models for Ubiquitous Parallelism 2008
                                                           [3]   Nasreddine Hireche, J.M. Pierre Langlois and Gabriela
                 V. CONCLUSION                                   Nicolescu Département de Génie Informatique, École
   The aim of this study is to show the difference in            Polytechnique        de       Montréal ‘‘Survey        of
computation to communication ratio using three                   Biological High Performance Computing: Algorithms,
                                                                 Implementations and Outlook Research’’ IEEE
different techniques. This study shows the
                                                                 CCECE/CCGEI, Ottawa, May 2006.
computational power of parallel computers to speed up
the process of comparing sequences. We looked at the       [4]   Friman S´anchez, Esther Salam´ı, Alex Ramirez and
                                                                 Mateo Valero HiPEAC European Network of Excellence
dynamic programming mechanism and presented a                    Universitat Polit`ecnica de Catalunya (UPC), Barcelona,
multithreaded     parallel    implementation.     The            Spain “Parallel Processing in Biological Sequence
implementation uses similarity matrix method but                 Comparison Using General Purpose Processors” 2005
takes advantage of the tiling technique under                    IEEE
multithreading model. The result of implementation of      [5]   Matteo Canella - Filippo Miglioli Universit`a di Ferrara
parallel processing on a single machine is shown in              (Italy) Alessandro Bogliolo Universit`a di Urbino (Italy)
Figure 10 (a) and using parallel processing in multi-            Enrico Petraglio - Eduardo Sanchez Ecole Polytechnique
core multithreading architecture is shown in Figure 10           F´ed´erale     de     Lausanne     EPFL-LSL,Lausanne
(b) which shows computation performed per sec.                   (Switzerland)” Performing DNA Comparison on a Bio-
                                                                 Inspired Tissue of FPGAs” Proceedings of the
                                                                 International Parallel and Distributed Processing
                                                                 Symposium (IPDPS’03) 2003 IEEE.




                                                                                www.ijorcs.org
6                                                              Harshita G. Patil, Manish Narnaware

[6] N. F. Almeida Jr ,C. E. R. Alves, E. N. Caceres, S.
    W.Song ”Comparison of Genomes using High-
    Performance Parallel Computing” Proceedings of the
    15th Symposium on Computer Architecture and High
    Performance Computing (SBAC-PAD’03) 2003 IEEE.
[7] Fa Zhang, Xiang-Zhen Qiao and Zhi-Yong Liu Institute
    of Computing Technology, Chinese Academy of
    Sciences, Beijing 100080, National Natural Science
    Foundation of China, Beijing, 100083”A Parallel Smith-
    Waterman Algorithm Based on Divide and Conquer”
    Proceedings of the Fifth International Conference on
    Algorithms and Architectures for Parallel Processing
    (ICA3PP.02) 2002 IEEE
[8] W.S Martins, J.B Del Cuvillo, F.J.Useche, K.B
    Theobald, G.R.Gao Department of Electrical and
    Computer Engineering University of Delaware, Newark
    DE19716,      USA”     A      Multithreaded    Parallel
    Implementation of a Dynamic Programming Algorithm
    for Sequence Comparison” Pacific Symposium on
    Biocomputing 6:311-322 (2001)
[9] Subhra Sundar Bandyopadhyay, Somnath Paul and Amit
    Konar Electronics and Telecommunication Department
    Jadavpur University,       Kolkata, India “Improved
    Algorithms for DNA Sequence Alignment and Revision
    of Scoring




                                                              www.ijorcs.org

More Related Content

What's hot

Convolutional Neural Network and Feature Transformation for Distant Speech Re...
Convolutional Neural Network and Feature Transformation for Distant Speech Re...Convolutional Neural Network and Feature Transformation for Distant Speech Re...
Convolutional Neural Network and Feature Transformation for Distant Speech Re...IJECEIAES
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
K means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testK means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testJoão Gabriel Lima
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsIJERA Editor
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
Dimensionality Reduction Techniques for Document Clustering- A Survey
Dimensionality Reduction Techniques for Document Clustering- A SurveyDimensionality Reduction Techniques for Document Clustering- A Survey
Dimensionality Reduction Techniques for Document Clustering- A SurveyIJTET Journal
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...CSCJournals
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...ijcsit
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networkseSAT Publishing House
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataeSAT Journals
 
Paper id 252014146
Paper id 252014146Paper id 252014146
Paper id 252014146IJRAT
 

What's hot (20)

Convolutional Neural Network and Feature Transformation for Distant Speech Re...
Convolutional Neural Network and Feature Transformation for Distant Speech Re...Convolutional Neural Network and Feature Transformation for Distant Speech Re...
Convolutional Neural Network and Feature Transformation for Distant Speech Re...
 
Cr25555560
Cr25555560Cr25555560
Cr25555560
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Et25897899
Et25897899Et25897899
Et25897899
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
E04423133
E04423133E04423133
E04423133
 
K means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testK means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout test
 
Jj2416341637
Jj2416341637Jj2416341637
Jj2416341637
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
Dimensionality Reduction Techniques for Document Clustering- A Survey
Dimensionality Reduction Techniques for Document Clustering- A SurveyDimensionality Reduction Techniques for Document Clustering- A Survey
Dimensionality Reduction Techniques for Document Clustering- A Survey
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networks
 
Three classes of deep learning networks
Three classes of deep learning networksThree classes of deep learning networks
Three classes of deep learning networks
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
Paper id 252014146
Paper id 252014146Paper id 252014146
Paper id 252014146
 

Similar to A Comparison of Computation Techniques for DNA Sequence Comparison

International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomicsShyam Sarkar
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IRamez Abdalla, M.Sc
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingIJERA Editor
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
An Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmAn Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmIDES Editor
 
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? IJORCS
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn GraphAshwani kumar
 

Similar to A Comparison of Computation Techniques for DNA Sequence Comparison (20)

Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part I
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
An Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmAn Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching Algorithm
 
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 

More from IJORCS

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4IJORCS
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemIJORCS
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2IJORCS
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1IJORCS
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 

More from IJORCS (20)

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression Function
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETs
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

A Comparison of Computation Techniques for DNA Sequence Comparison

  • 1. International Journal of Research in Computer Science eISSN 2249-8265 Volume 2 Issue 3 (2012) pp. 1-6 © White Globe Publications www.ijorcs.org A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON Harshita G. Patil1, Manish Narnaware2 *M-Tech (CSE) G.H.Raisoni College of Engineering, Hingana Road, Nagpur, India 1 Email: patil.harshita27@rediffmail.com, 2 Email: manish.narnaware@yahoo.com Abstract: This Project shows a comparison survey or bases known as adenine, thymine, cytosine and done on DNA sequence comparison techniques. The guanine. Since the bases are paired, they are referred to various techniques implemented are sequential as base pairs. All the DNA of a living organism is comparison, multithreading on a single computer and called its genome. The size of a genome can vary from multithreading using parallel processing. This Project millions of base pairs for bacteria to several billions shows the issues involved in implementing a dynamic base pairs in the case of mammals. programming algorithm for biological sequence The nearly exponential growth rate of biological comparison on a general purpose parallel computing sequence database threatens to overwhelm existing platform Tiling is an important technique for computational methods for biological data analysis and extraction of parallelism. Informally, tiling consists of searching. The computational demand needed to partitioning the iteration space into several chunks of explore and analyze the data contained in these computation called tiles (blocks) such that sequential databases is quickly becoming a great concern. To traversal of the tiles covers the entire iteration space. meet these demands, we must use high performance The idea behind tiling is to increase the granularity of computing systems, such as parallel computers. computation and decrease the amount of Biological sequences can be treated as strings over a communication incurred between processors. This fixed alphabet of characters, a, c, t and g. An makes tiling more suitable for distributed memory alignment is a way of stacking one sequence above architectures where communication startup costs are the other and matching characters from the two very high and hence frequent communication is sequences that lie in the same position. From the undesirable. Our work to develop sequence- alignment, we can find two subsequences contained comparison mechanism and software supports the identification of sequences of DNA. respectively in two sequences that have the most similarity. The problem of the biological sequence Keywords: Dynamic Programming Algorithms, alignment is the most faced in the exploring and FASTA, Sequences Alignment, Tiling. analyzing the data, no matter in sequence assembly, comparison of homology, finding of gene coding I. INTRODUCTION region and prediction of protein’s structure and function. Comparing DNA sequences is one of the basic tasks in computational biology. In bioinformatics, a In this paper we consider the parallelization of this sequence alignment is a way of arranging the implementation, since parallelization of an iterative sequences of DNA, RNA, or protein to identify implementation of the algorithm would not be feasible. regions of similarity that may be a consequence of There has been significant recent work on the functional, structural, or evolutionary relationships parallelization of dynamic programming algorithms in between the sequences. Aligned sequences of computational biology including implementations nucleotide or amino acid residues are typically suitable for computational grids. What distinguishes represented as rows within a matrix. Gaps are inserted this work is the data-driven recursive implementation, between the residues so that identical or similar with resulting dynamically allocated tasks. The rest of characters are aligned in successive columns. this paper is organized as follows. In section 2, we provide Needleman-Wunsch and Smith Waterman DNA (deoxyribonucleic acid) is the chemical algorithm. Section 3 provides parallel processing material in a cell that carries the genetic codes for techniques. The parallel computation technique using living organisms. Its structure is a double helix multi-core architecture for DNA sequence comparison consisting of two sequences of letters from a four- is shown in section 4, and conclusions about the letter alphabet (A, T, C, G), such that A is paired with completed work are discussed in section 5. T, and C with G. The letters represent the nucleotides www.ijorcs.org
  • 2. 2 Harshita G. Patil, Manish Narnaware II. NEEDLEMAN-WUNSCH AND SMITH WATERMAN ALGORITHM Input Sequence File1 Final Score There are several methods for alignment of two Sequential biological sequences. The dynamic programming is Input Sequence File2 Processing probably the most popular programming method in sequences alignment. The Needleman-Wunsch Figure 1: Sequential Processing algorithm, the first algorithm applying the dynamic programming to comparing biological sequences, was proposed by Needleman and Wunsch in 1970. Later, A. Sequence comparison using similarity matrix Smith and Waterman improved the Needleman- This consists of two parts: the calculation of the Wunsch algorithm and proposed the well-known total score indicating the similarity between the two Smith-Waterman algorithm. The time complexity of given sequences, and the identification of the these algorithms is O(mn), where m, n are the lengths alignment(s) that lead to the score. In this paper we of the two sequences respectively. Because the cores will concentrate on the calculation of the score, since of these algorithms are dynamic programming, all this is the most computationally expensive part. The algorithms need to manipulate an (n+1) (m+1) matrix, idea behind using dynamic programming is to build up named dynamic programming matrix. The most time the solution by using previous solutions for smaller spent in these algorithms is calculating the dynamic subsequences. The comparison of the two sequences X programming matrix, so research work on and Y, using the dynamic programming mechanism, is parallelization of two sequences alignment focuses illustrated in Figure 2. This finds global alignments by mostly on the calculation of the matrix. As the growth comparing entire sequences. The sequences are placed of biological sequence database, the length of along the left margin (X) and on the top (Y). A sequences often becomes very long, and the size of the similarity matrix is initialized with decreasing values matrix becomes very large. Thus, not only the (0,-1,-2,-3,…) along the first row and first column to execution time of these algorithms needs to be very penalize for consecutive gaps (insertions or deletions). long, the memory space needed in the algorithm becomes very large. Even in some cases the size of the The other elements of the matrix are calculated by matrix is bigger than the size of memory space in one finding the maximum value among the following three processor. values: the left element plus gap penalty, the upper-left element plus the score of substituting the horizontal Two input sequence files are compared sequentially and final score is computed. symbol for the vertical symbol, and the upper element plus the gap penalty. Figure 2: Similarity Matrix For the general case where X = x1,…, xi and Y = In our example, gp is -1, and ss is 1 if the y1,…, yj, for i = 1,.., n and j = 1,…,m, the similarity elements match and 0 otherwise. However, other matrix SM[n;m] is built by applying the following general values can be used instead. Following this recurrence equation, where gp is the gap penalty and ss recurrence equation, the matrix is filled from top left to is the substitution score: bottom right with entry [i; j] requiring the entries [i, j - 1], [i – 1, j -1], and [i-1, j]. Notice that SM[i; j] corresponds to the best score of the subsequences (1) x1,…, xi and y1,…, yj. Since global alignment takes www.ijorcs.org
  • 3. A Comparison of Computation Techniques for DNA Sequence Comparison 3 into account the entire sequences, the final score will all processors busy through most of the computation. always be found in the bottom right hand corner of the For example, processor 1 initially works with the first matrix. In our example, the final score 4 gives us a strip, then simultaneously with the first and fifth strip, measure of how similar the two sequences is. Figure 2 then finally only with the fifth strip. The processor shows the similarity matrix and the two possible utilization rises to 75%. alignments (arrows going up and left). III. PARALLEL COMPUTATION A parallel version of the sequence comparison algorithm using dynamic programming must handle the data dependences presented by this method, yet it should perform as many operations as possible independently. This may present a serious challenge for efficient parallel execution on current general purpose parallel computers, i.e., MIMD (Multiple Figure 4: Partition of the similarity matrix Instruction stream, Multiple Data stream computers). IV. PARALLEL COMPUTATION TECHNIQUE Multi- USING MULTI-CORE ARCHITECTURE Input Sequence File1 Final Score Threading The input DNA sequences are collected in FASTA Input Sequence File2 on format. The FASTA files are converted to sequential Single sequence file using the convertor. The converted files Tile Size Computer Time T1 are then compared on the distributed environment to compute the final score. This process is shown in Figure 3: Multi -Threading on Single Computer Figure 5. Input is two sequence files and tile size. Two input Convertor sequence files are compared. The final score is computed using the concept of multithreading on a Sequence file single computer. Sequential sequence file One solution to this problem is to divide the similarity matrix into rectangular blocks, as shown in >Seq1 AGCTCC Figure 4(a). In this example, the program would CCTAAT compute block 1 first, followed by 2 and 5, etc. If each AGCTCC AGGGCT block has q rows and r columns, then the computation CCTAAT TTTGCC of a given block requires only the row segment AGGGCT TTTGCC immediately above the block, the column segment to its immediate left, and the element above and to the left … a total of q +r +1 elements. For instance, if each Final Score Parallel Sequence Comparison block has 4 rows and 4 columns, then each block has Output to compute 16 maxima after receiving 9 input values. Two Input Files The communication-to-computation ratio drops from 3:1 to 9:16, an 81% reduction! Figure 5: Parallel Computation Technique for DNA Note that this blocking will decrease the maximum Sequence Comparison achievable parallelism somewhat, by introducing some sequential dependence in the code. However, given the A. FASTA format sizes of the current problems and the parallel machines In bioinformatics, FASTA format is a text-based currently used, this potential loss will not be a limiting format for representing either nucleotide sequences or factor. peptide sequences, in which base pairs or amino acids The load-balancing problem can be addressed by are represented using single-letter codes. The format putting several rows of blocks (or “strips”) on the same also allows for sequence names and comments to processor. Figure 4(b) illustrates this approach when precede the sequences. The format originates from the four processors are used. The first and fifth strips are FASTA software package. assigned to processor 1, the second and sixth strips are assigned to processor 2 and so on. This helps to keep www.ijorcs.org
  • 4. 4 Harshita G. Patil, Manish Narnaware Example of a simple FASTA file Sequence File (Sequence Y file) 3. Input file size i.e. tile > seq1 This is the description of my first sequence. width and tile height. AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCT 4. Calculate the number AGTACGACGTAGATGCTAGCTGACTCGATGC of tiles required to > seq2 This is a description of my second sequence. calculate the final CGATCGATCGTACGTCGACTGATCGTAGCTAC score. 5. Calculate the number GTCGTCATCGTCAGTTACTGCATGCTCG of diagonal rows required to calculate. B. Parallel Algorithm Models 6. Read sequence X data for tile(0,0) and send request to slave 1. 1. Calculate the matrix Parallel of required tile in Input Sequence File1 Final Score Processing different thread. Using 7. Repeat step 6 for n Input Sequence File2 Multi-core number of slave and Tile Size Time T2 wait for the tile Architecture response. 2. Return the file to the master. Figure 6: Parallel processing using multi-core architecture 8. Accept tile response Complete project will take the input as two from the slave. 9. Collect all tiles of sequence file and tile size from the user and system each diagonal row. will calculate the final score by comparing sequences 10. Repeat step 6 to 9 of given file by using multiple machine connected in for next diagonal row. network. Figure 8: Interaction between Master and Slave Master-Slave Model: One or more processes generate work and allocate it to worker processes C. Similarity Matrix shown in Figure 7. The similarity matrix used for sequential comparison is used in this approach also for DNA *Master/Slave *Master/Slave *Master/Slave comparison but it is performed parallel on different Application Application Application machines using tiling technique. D. Tiling Technique Tiling is an important technique for extraction of Tile Tile parallelism. Informally, tiling consists of partitioning Sequence Request File Response the iteration space into several chunks of computation called tiles (blocks) such that sequential traversal of the tiles covers the entire iteration space. The idea *Master/Slave *Master/Slave *Master/Slave behind tiling is to increase the granularity of Application Application Application computation and decrease the amount of communication incurred between processors. This USER makes tiling more suitable for distributed memory Figure 7: Distributed System used for comparison architectures where communication startup costs are When user operates the application, the very high and hence frequent communication is application acts as master to perform the task of undesirable. sequence file comparison and other machines in which Tiling is a well-established technique to enhance the application is installed will act as slave- returns the data locality or coarsen the grain of parallelism in loop result of tile matrix requested by master. Any machine programs. The iteration space of the program is connected in network can act as master or slave. The covered with (usually congruent) tiles and the interaction between master and slave is shown in enumeration of the iteration points is changed so as to Figure 8. enumerate the tiles in the outer dimensions (i.e. loops) Master Slave and the points within each such tile in the inner 1. Load Sequential dimensions. The shape and size of the tiles is usually Sequence File chosen dependent on the dependence vectors to (Sequence X file) 2. Load next Sequential www.ijorcs.org
  • 5. A Comparison of Computation Techniques for DNA Sequence Comparison 5 minimize communication startups and the volume of the data communicated, especially in the context of distributed-memory architectures. For shared-memory systems, the number of startups and the volume are less of a concern, as long as the transfer time of the data between cores stays small compared to the computation time for each tile. The Sequential Sequence File will be arranged in similarity matrix and the matrix will be divided into tiles. One tile will be allocated to one machine for Figure10 (a): Computation on Single machine using Parallel computation. The computations will be done Processing diagonally. Number of tiles coming under one diagonal line will be allocated to machine in network depending upon the number of machines available in the network. When computation of one diagonal row is complete then only the computation for the next diagonal line will start. In this way the complete matrix will be filled and the last cell of the matrix is the final score which will be equal to the size of sequential sequence file in case of match and if final score is half of the size we can say 50% of the DNA is matched and if final score is negative it indicates mismatch. Figure 10 (b): Computation on distributed system using mullti-core achitecture B(0,0) B(0,1) B(0,m-1) VI. REFERENCES Conferences B(1,0) B(1,m-1) [1] Sudha Gunturu*, Xiaolin Li*, and Laurence Tianruo Yang** “Load Scheduling Strategies for Parallel DNA Sequencing Applications” 11th IEEE International Conference on High Performance Computing and Communications 2009 B(m-1,m-1) [2] Armin Gr¨oßlinger “Some Experiments on Tiling Loop Programs for Shared-Memory Multicore Architectures” Dagstuhl Seminar Proceedings 07361 Programming Figure 9: Tiling Technique Models for Ubiquitous Parallelism 2008 [3] Nasreddine Hireche, J.M. Pierre Langlois and Gabriela V. CONCLUSION Nicolescu Département de Génie Informatique, École The aim of this study is to show the difference in Polytechnique de Montréal ‘‘Survey of computation to communication ratio using three Biological High Performance Computing: Algorithms, Implementations and Outlook Research’’ IEEE different techniques. This study shows the CCECE/CCGEI, Ottawa, May 2006. computational power of parallel computers to speed up the process of comparing sequences. We looked at the [4] Friman S´anchez, Esther Salam´ı, Alex Ramirez and Mateo Valero HiPEAC European Network of Excellence dynamic programming mechanism and presented a Universitat Polit`ecnica de Catalunya (UPC), Barcelona, multithreaded parallel implementation. The Spain “Parallel Processing in Biological Sequence implementation uses similarity matrix method but Comparison Using General Purpose Processors” 2005 takes advantage of the tiling technique under IEEE multithreading model. The result of implementation of [5] Matteo Canella - Filippo Miglioli Universit`a di Ferrara parallel processing on a single machine is shown in (Italy) Alessandro Bogliolo Universit`a di Urbino (Italy) Figure 10 (a) and using parallel processing in multi- Enrico Petraglio - Eduardo Sanchez Ecole Polytechnique core multithreading architecture is shown in Figure 10 F´ed´erale de Lausanne EPFL-LSL,Lausanne (b) which shows computation performed per sec. (Switzerland)” Performing DNA Comparison on a Bio- Inspired Tissue of FPGAs” Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’03) 2003 IEEE. www.ijorcs.org
  • 6. 6 Harshita G. Patil, Manish Narnaware [6] N. F. Almeida Jr ,C. E. R. Alves, E. N. Caceres, S. W.Song ”Comparison of Genomes using High- Performance Parallel Computing” Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’03) 2003 IEEE. [7] Fa Zhang, Xiang-Zhen Qiao and Zhi-Yong Liu Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, National Natural Science Foundation of China, Beijing, 100083”A Parallel Smith- Waterman Algorithm Based on Divide and Conquer” Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP.02) 2002 IEEE [8] W.S Martins, J.B Del Cuvillo, F.J.Useche, K.B Theobald, G.R.Gao Department of Electrical and Computer Engineering University of Delaware, Newark DE19716, USA” A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison” Pacific Symposium on Biocomputing 6:311-322 (2001) [9] Subhra Sundar Bandyopadhyay, Somnath Paul and Amit Konar Electronics and Telecommunication Department Jadavpur University, Kolkata, India “Improved Algorithms for DNA Sequence Alignment and Revision of Scoring www.ijorcs.org