SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Approximate matching in grammar-compressed strings


                                      Alexander Tiskin

                                 Department of Computer Science
                                      University of Warwick
                             http://www.dcs.warwick.ac.uk/~tiskin




Alexander Tiskin (Warwick)         Approximate matching in GC-strings   1 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   2 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   3 / 52
Introduction

String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .




  Alexander Tiskin (Warwick)   Approximate matching in GC-strings   4 / 52
Introduction

String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .
Standard types of string comparison:
     global: whole string vs whole string
     local: substrings vs substrings
Main focus of this work:
     semi-local: whole string vs substrings; prefixes vs suffixes
Closely related to approximate string matching (no relation to
approximation algorithms!)
Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices)

  Alexander Tiskin (Warwick)   Approximate matching in GC-strings   4 / 52
Introduction
Terminology and notation


Integers: . . . − 2, −1, 0, 1, 2, . . .
Odd half-integers: . . . − 2 , − 3 , − 1 , 2 , 2 , 5 , . . .
                           5
                                 2     2
                                           1 3
                                                   2
(i, j)      (i , j ) iff i < i and j < j             (i, j)       (i , j ) iff i < i and j > j
We consider finite and infinite integer matrices over integer and odd
half-integer indices. For simplicity, index range will usually be ignored.
A permutation matrix is a 0/1 matrix with exactly one nonzero per row
and per column
        
 0 1 0
1 0 0
 0 0 1




   Alexander Tiskin (Warwick)     Approximate matching in GC-strings                           5 / 52
Introduction
Terminology and notation


Given matrix D, its distribution matrix is D Σ (i, j) =               ˆ>i,ˆ<j
                                                                      ı        D(ˆ, )
                                                                                  ı ˆ
In other words, D Σ (i, j) =        D(ˆ, ), where (ˆ, ) is
                                      ı ˆ           ı ˆ              -dominated by (i, j)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                        6 / 52
Introduction
Terminology and notation


Given matrix D, its distribution matrix is D Σ (i, j) =               ˆ>i,ˆ<j
                                                                      ı        D(ˆ, )
                                                                                  ı ˆ
In other words, D Σ (i, j) =        D(ˆ, ), where (ˆ, ) is
                                      ı ˆ           ı ˆ              -dominated by (i, j)
Given matrix E , its density matrix is
E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 )
    ı ˆ       ı 2 ˆ 2             ı 2 ˆ 1             ı 2 ˆ 1             ı 2 ˆ 2
where D Σ , E over integers; D, E            over odd half-integers




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                        6 / 52
Introduction
Terminology and notation


Given matrix D, its distribution matrix is D Σ (i, j) =               ˆ>i,ˆ<j
                                                                      ı        D(ˆ, )
                                                                                  ı ˆ
In other words, D Σ (i, j) =        D(ˆ, ), where (ˆ, ) is
                                      ı ˆ           ı ˆ              -dominated by (i, j)
Given matrix E , its density matrix is
E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 )
    ı ˆ       ı 2 ˆ 2             ı 2 ˆ 1             ı 2 ˆ 1             ı 2 ˆ 2
where D Σ , E over integers; D, E over odd half-integers
                                           
        Σ      0 1 2 3           0 1 2 3                  
 0 1 0          0 1 1 2         0 1 1 2            0 1 0
1 0 0 = 
                                  0 0 0 1 = 1 0 0
                                                         
                0 0 0 1
 0 0 1                                                 0 0 1
                 0 0 0 0           0 0 0 0




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                        6 / 52
Introduction
Terminology and notation


Given matrix D, its distribution matrix is D Σ (i, j) =               ˆ>i,ˆ<j
                                                                      ı        D(ˆ, )
                                                                                  ı ˆ
In other words, D Σ (i, j) =        D(ˆ, ), where (ˆ, ) is
                                      ı ˆ           ı ˆ              -dominated by (i, j)
Given matrix E , its density matrix is
E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 )
    ı ˆ       ı 2 ˆ 2             ı 2 ˆ 1             ı 2 ˆ 1             ı 2 ˆ 2
where D Σ , E over integers; D, E over odd half-integers
                                           
        Σ      0 1 2 3           0 1 2 3                  
 0 1 0          0 1 1 2         0 1 1 2            0 1 0
1 0 0 = 
                                  0 0 0 1 = 1 0 0
                                                         
                0 0 0 1
 0 0 1                                                 0 0 1
                 0 0 0 0           0 0 0 0
(D Σ ) = D for all D
Matrix E is simple, if (E )Σ = E


   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                        6 / 52
Introduction
Terminology and notation


Matrix E is Monge, if E         is nonnegative
Intuition: border-to-border distances in a (weighted) planar graph
Matrix E is unit-Monge, if E           is a permutation matrix
Intuition: border-to-border distances in a grid-like graph




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   7 / 52
Introduction
Terminology and notation


Matrix E is Monge, if E         is nonnegative
Intuition: border-to-border distances in a (weighted) planar graph
Matrix E is unit-Monge, if E           is a permutation matrix
Intuition: border-to-border distances in a grid-like graph
Simple unit-Monge matrix: P Σ , where P is a permutation matrix
Seaweed matrix: P Σ , represented implicitly by P
                            
       Σ      0 1 2 3
 0 1 0
1 0 0 = 0 1 1 2
                            
              0 0 0 1
 0 0 1
                0 0 0 0




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   7 / 52
Introduction
Implicit unit-Monge matrices


Efficient P Σ queries: range tree on nonzeros of P                                   [Bentley: 1980]
         binary search tree by i-coordinate
         under every node, binary search tree by j-coordinate
     •                            •                          •
 •               −→         •                 −→         •
         •                            •                          •
             •                            •                          •
      ↓
     •                            •                          •
 •               −→         •                 −→         •
         •                            •                          •
             •                            •                          •
      ↓
     •                            •                          •
 •               −→         •                 −→         •
         •                            •                          •
             •                            •                          •

     Alexander Tiskin (Warwick)               Approximate matching in GC-strings                8 / 52
Introduction
Implicit unit-Monge matrices


Efficient P Σ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A P Σ query means dominance counting: how many nonzeros are
dominated by query point? Answered by decomposing query range into
≤ log2 n disjoint canonical ranges.
Total size O(n log n), query time O(log2 n)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings       9 / 52
Introduction
Implicit unit-Monge matrices


Efficient P Σ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A P Σ query means dominance counting: how many nonzeros are
dominated by query point? Answered by decomposing query range into
≤ log2 n disjoint canonical ranges.
Total size O(n log n), query time O(log2 n)
There are asymptotically more efficient (but less practical) data structures
                                         log n
Total size O(n), query time O          log log n                             [J´J´+: 2004]
                                                                               a a
                                                                     [Chan, Pˇtra¸cu: 2010]
                                                                             a s


   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                      9 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   10 / 52
Semi-local string comparison
Semi-local LCS and edit distance


Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguous
subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   11 / 52
Semi-local string comparison
Semi-local LCS and edit distance


Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguous
subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
The longest common subsequence (LCS) score:
      length of longest string that is a subsequence of both a and b
      equivalently, alignment score, where score(match) = 1 and
      score(mismatch) = 0
In biological terms, “loss-free alignment” (unlike “lossy” BLAST)

   Alexander Tiskin (Warwick)   Approximate matching in GC-strings     11 / 52
Semi-local string comparison
Semi-local LCS and edit distance


The LCS problem
Give the LCS score for a vs b




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   12 / 52
Semi-local string comparison
Semi-local LCS and edit distance


The LCS problem
Give the LCS score for a vs b

LCS: running time
O(mn)                                                                   [Wagner, Fischer:    1974]
    mn
O log2 n                    σ = O(1)                                   [Masek, Paterson:     1980]
                                                                           [Crochemore+:     2003]
     mn(log log n)2
O       log2 n
                                                                      [Paterson, Danˇ´cık:   1994]
                                                                   [Bille, Farach-Colton:    2008]

Running time varies depending on the RAM model
We assume word-RAM with word size log n


    Alexander Tiskin (Warwick)     Approximate matching in GC-strings                          12 / 52
Semi-local string comparison
Semi-local LCS and edit distance


LCS on the alignment graph (directed, acyclic)
  b a a b c a b c a b a c a                                          blue = 0
b                                                                     red = 1
a
a
b
c
b
c
a
LCS(“baabcbca”, “baabcabcabaca”) = “baabcbca”
LCS = highest-score corner-to-corner path


   Alexander Tiskin (Warwick)   Approximate matching in GC-strings        13 / 52
Semi-local string comparison
Semi-local LCS and edit distance


LCS: dynamic programming                                             [WF: 1974]
Sweep alignment graph by cells
Cell update: time O(1)
Overall time O(mn)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings          14 / 52
Semi-local string comparison
Semi-local LCS and edit distance


LCS: micro-block dynamic programming                                      [MP: 1980; BF: 2008]
Sweep alignment graph by micro-blocks
Micro-block size:
      t = O(log n) when σ = O(1)
                    log n
      t=O         log log n     otherwise
Micro-block interface:
      O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
      O(t) small integers, each O(1) bits
Micro-block update: time O(1), via table of all possible interfaces
                          mn                                   mn(log log n)2
Overall time O          log2 n
                                  when σ = O(1), O                log2 n
                                                                                otherwise

   Alexander Tiskin (Warwick)        Approximate matching in GC-strings                     15 / 52
Semi-local string comparison
Semi-local LCS and edit distance


The semi-local LCS problem
Give the (implicit) matrix of O(m2 + n2 ) LCS scores:
      string-substring LCS: string a vs every substring of b
      prefix-suffix LCS: every prefix of a vs every suffix of b
      symmetrically, substring-string and suffix-prefix LCS




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   16 / 52
Semi-local string comparison
Semi-local LCS and edit distance


The semi-local LCS problem
Give the (implicit) matrix of O(m2 + n2 ) LCS scores:
      string-substring LCS: string a vs every substring of b
      prefix-suffix LCS: every prefix of a vs every suffix of b
      symmetrically, substring-string and suffix-prefix LCS

The three-way semi-local LCS problem
Give the (implicit) matrix of O(n2 ) LCS scores:
      string-substring, prefix-suffix, suffix-prefix LCS
      no substring-string LCS
Suitable for m             n


   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   16 / 52
Semi-local string comparison
Semi-local LCS and edit distance


The semi-local LCS problem
Give the (implicit) matrix of O(m2 + n2 ) LCS scores:
      string-substring LCS: string a vs every substring of b
      prefix-suffix LCS: every prefix of a vs every suffix of b
      symmetrically, substring-string and suffix-prefix LCS

The three-way semi-local LCS problem
Give the (implicit) matrix of O(n2 ) LCS scores:
      string-substring, prefix-suffix, suffix-prefix LCS
      no substring-string LCS
Suitable for m             n

Cf.: dynamic programming gives prefix-prefix LCS
   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   16 / 52
Semi-local string comparison
Semi-local LCS and edit distance


Semi-local LCS on the alignment graph
  b a a b c a b c a b a c a                                          blue = 0
b                                                                     red = 1
a
a
b
c
b
c
a
score(“baabcbca”, “cabcaba”) = 5 (“abcba”)
Semi-local LCS = all highest-score border-to-border paths
(string-substring = top-to-bottom, etc.)

   Alexander Tiskin (Warwick)   Approximate matching in GC-strings        17 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H
 0     1   2   3    4   5    6    6   7   8   8   8   8    8          a = “baabcbca”
 −1 0      1   2    3   4    5    5   6   7   7   7   7    7
 −2 −1 0       1    2   3    4    4   5   6   6   6   6    7
                                                                      b = “baabcabcabaca”
 −3 −2 −1 0         1   2    3    3   4   5   5   6   6    7          b = b 4 : 11 = “cabcaba”
 −4 −3 −2 −1 0          1    2    2   3   4   4   5   5    6
                                                                      H(4, 11) = LCS(a, b ) = 5
 −5 −4 −3 −2 −1 0            1    2   3   4   4   5   5    6
 −6 −5 −4 −3 −2 −1 0              1   2   3   3   4   4    5          H(i, j) = j − i if i > j
 −7 −6 −5 −4 −3 −2 −1 0               1   2   2   3   3    4
 −8 −7 −6 −5 −4 −3 −2 −1 0                1   2   3   3    4
 −9 −8 −7 −6 −5 −4 −3 −2 −1 0                 1   2   3    4
−10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                   1   2    3
−11
  −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                     1    2
−12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0
  −11                                                      1
−13 −11
  −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0


     Alexander Tiskin (Warwick)           Approximate matching in GC-strings                      18 / 52
Semi-local string comparison
Score matrices and seaweed matrices


Semi-local LCS: output representation and running time
size               query time
O(n2 )             O(1)                                                                trivial
O(m1/2 n)          O(log n)    string-substring                              [Alves+: 2003]
O(n)               O(n)        string-substring                              [Alves+: 2005]
O(n log n)         O(log2 n)                                                       [T: 2006]
  . . . or any    2D orthogonal range counting data structure
running time
O(mn2 )                                                                                naive
O(mn)                       string-substring                  [Schmidt: 1998; Alves+: 2005]
O(mn)                                                                              [T: 2006]
     mn
O log0.5 n                                                                         [T: 2006]
     mn(log log n)2
O       log2 n
                                                                                   [T: 2007]

    Alexander Tiskin (Warwick)      Approximate matching in GC-strings                    19 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H and the seaweed matrix P
H(i, j): the number of matched characters for a vs substring b i : j
j − i − H(i, j): the number of unmatched characters
Properties of matrix j − i − H(i, j):
      simple unit-Monge
      therefore, = P Σ , where P = −H                is a permutation matrix
P is the seaweed matrix, giving an implicit representation of H
Range tree for P: memory O(n log n), query time O(log2 n)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings             20 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H and the seaweed matrix P
 0     1   2   3    4   5    6        6   7   8       8       8       8       8   a = “baabcbca”
 −1 0      1   2    3   4    5        5   6   7       7       7       7       7
                                                                          •       b = “baabcabcabaca”
 −2 −1 0       1    2   3    4        4   5   6       6       6       6       7
                                                          •
 −3 −2 −1 0         1   2    3        3   4   5       5       6       6       7   b = b 4 : 11 = “cabcaba”
 −4 −3 −2 −1 0          1    2        2   3   4       4       5       5       6
                                  •                                               H(4, 11) = LCS(a, b ) = 5
 −5 −4 −3 −2 −1 0            1        2   3   4       4       5       5       6
 −6 −5 −4 −3 −2 −1 0                  1   2   3       3       4       4       5   H(i, j) = j − i if i > j
 −7 −6 −5 −4 −3 −2 −1 0                   1   2       2       3       3       4
                                                  •
 −8 −7 −6 −5 −4 −3 −2 −1 0                    1       2       3       3       4
                                                                  •
 −9 −8 −7 −6 −5 −4 −3 −2 −1 0                         1       2       3       4
−10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                               1       2       3
−11
  −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                                     1       2
−12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0
  −11                                                                         1
−13 −11
  −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0


     Alexander Tiskin (Warwick)               Approximate matching in GC-strings                              21 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H and the seaweed matrix P
 0     1   2   3    4   5    6        6   7   8       8       8       8       8   a = “baabcbca”
 −1 0      1   2    3   4    5        5   6   7       7       7       7       7
                                                                          •       b = “baabcabcabaca”
 −2 −1 0       1    2   3    4        4   5   6       6       6       6       7
                                                          •
 −3 −2 −1 0         1   2    3        3   4   5       5       6       6       7   b = b 4 : 11 = “cabcaba”
 −4 −3 −2 −1 0          1    2        2   3   4       4       5       5       6
                                  •                                               H(4, 11) = LCS(a, b ) = 5
 −5 −4 −3 −2 −1 0            1        2   3   4       4       5       5       6
 −6 −5 −4 −3 −2 −1 0                  1   2   3       3       4       4       5   H(i, j) = j − i if i > j
 −7 −6 −5 −4 −3 −2 −1 0                   1   2       2       3       3       4
                                                  •                               blue: difference in H is 0
 −8 −7 −6 −5 −4 −3 −2 −1 0                    1       2       3       3       4
                                                                  •               red: difference in H is 1
 −9 −8 −7 −6 −5 −4 −3 −2 −1 0                         1       2       3       4
−10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                               1       2       3
−11
  −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                                     1       2
−12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0
  −11                                                                         1
−13 −11
  −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0


     Alexander Tiskin (Warwick)               Approximate matching in GC-strings                              21 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H and the seaweed matrix P
 0     1   2   3    4   5    6        6   7   8       8       8       8       8   a = “baabcbca”
 −1 0      1   2    3   4    5        5   6   7       7       7       7       7
                                                                          •       b = “baabcabcabaca”
 −2 −1 0       1    2   3    4        4   5   6       6       6       6       7
                                                          •
 −3 −2 −1 0         1   2    3        3   4   5       5       6       6       7   b = b 4 : 11 = “cabcaba”
 −4 −3 −2 −1 0          1    2        2   3   4       4       5       5       6
                                  •                                               H(4, 11) = LCS(a, b ) = 5
 −5 −4 −3 −2 −1 0            1        2   3   4       4       5       5       6
 −6 −5 −4 −3 −2 −1 0                  1   2   3       3       4       4       5   H(i, j) = j − i if i > j
 −7 −6 −5 −4 −3 −2 −1 0                   1   2       2       3       3       4
                                                  •                               blue: difference in H is 0
 −8 −7 −6 −5 −4 −3 −2 −1 0                    1       2       3       3       4
                                                                  •               red: difference in H is 1
 −9 −8 −7 −6 −5 −4 −3 −2 −1 0                         1       2       3       4
−10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                               1       2       3   green: P(i, j) = 1
−11
  −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0                                     1       2
−12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0
  −11                                                                         1   H(i, j) = j − i − P Σ (i, j)
−13 −11
  −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0


     Alexander Tiskin (Warwick)               Approximate matching in GC-strings                                 21 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The score matrix H and the seaweed matrix P
                                                                a = “baabcbca”
                                                  •             b = “baabcabcabaca”
                                         •
                                                                b = b 4 : 11 = “cabcaba”
                                •                               H(4, 11) = LCS(a, b ) =
                                                                11 − 4 − P Σ (i, j) =
                                                                11 − 4 − 2 = 5
                                     •
                                              •




   Alexander Tiskin (Warwick)       Approximate matching in GC-strings                    22 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The seaweeds in the alignment graph
  b a a b c a b c a b a c a                                 a = “baabcbca”
              •
b
a                                                           b = “baabcabcabaca”
a                                                           b = b 4 : 11 = “cabcaba”
b                                                           H(4, 11) = LCS(a, b ) =
c                                                           11 − 4 − P Σ (i, j) =
b                                                           11 − 4 − 2 = 5
c
a
                                   •
P(i, j) = 1 corresponds to seaweed (top, i)                     (bottom, j)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings                    23 / 52
Semi-local string comparison
Score matrices and seaweed matrices


The seaweeds in the alignment graph
  b a a b c a b c a b a c a                                      a = “baabcbca”
              •
b
a                                                                b = “baabcabcabaca”
a                                                                b = b 4 : 11 = “cabcaba”
b                                                                H(4, 11) = LCS(a, b ) =
c                                                                11 − 4 − P Σ (i, j) =
b                                                                11 − 4 − 2 = 5
c
a
                                   •
P(i, j) = 1 corresponds to seaweed (top, i)                          (bottom, j)
Also define top              right, left      right, left           bottom seaweeds
Gives bijection between top-left and bottom-right borders

   Alexander Tiskin (Warwick)        Approximate matching in GC-strings                    23 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   24 / 52
Matrix distance multiplication
Seaweed braids


Distance algebra (a.k.a (min, +) or tropical algebra): ⊕ is min,                    is +
Matrix        -multiplication
A      B=C               C (i, k) =      j   A(i, j)      B(j, k) = minj A(i, j) + B(j, k)




    Alexander Tiskin (Warwick)        Approximate matching in GC-strings                   25 / 52
Matrix distance multiplication
Seaweed braids


Distance algebra (a.k.a (min, +) or tropical algebra): ⊕ is min,                    is +
Matrix        -multiplication
A      B=C               C (i, k) =      j   A(i, j)      B(j, k) = minj A(i, j) + B(j, k)
Matrix classes closed under                  -multiplication (for given n):
       general numerical (integer, real) matrices
       Monge matrices
       simple unit-Monge matrices
 Σ
PA        Σ    Σ
         PB = PC written as PA                  PB = PC




    Alexander Tiskin (Warwick)        Approximate matching in GC-strings                   25 / 52
Matrix distance multiplication
Seaweed braids


The seaweed monoid Tn :
      simple unit-Monge matrices under                 -multiplication
      permutation matrices under             -multiplication
Identity: 1        x =x                          Zero: 0         x =0
                                                                     
                                                       ·        · · •
           
   • · · ·
  · • · ·                                          ·         · • ·
1=                                              0= ·
                                                                      
                                                                • · ·
            
  · · • ·
    · · · •                                            •        · · ·




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings        26 / 52
Matrix distance multiplication
Seaweed braids


PA           PB = PC can be seen as                       -multiplication of seaweed braids

     •                             •                           •
         •                •                               •
                 •            •                                        •
 •                                     •              •
                     •                     •                       •
             •                    •                           •
         PA                       PB                          PC




     Alexander Tiskin (Warwick)                Approximate matching in GC-strings             27 / 52
Matrix distance multiplication
Seaweed braids


PA           PB = PC can be seen as                       -multiplication of seaweed braids

     •                             •                           •
         •                •                               •
                 •            •                                        •
 •                                     •              •
                     •                     •                       •
             •                    •                           •
         PA                       PB                          PC


PA



PB



     Alexander Tiskin (Warwick)                Approximate matching in GC-strings             27 / 52
Matrix distance multiplication
Seaweed braids


PA           PB = PC can be seen as                       -multiplication of seaweed braids

     •                             •                           •
         •                •                               •
                 •            •                                        •
 •                                     •              •
                     •                     •                       •
             •                    •                           •
         PA                       PB                          PC


PA



PB



     Alexander Tiskin (Warwick)                Approximate matching in GC-strings             27 / 52
Matrix distance multiplication
Seaweed braids


PA           PB = PC can be seen as                       -multiplication of seaweed braids

     •                             •                           •
         •                •                               •
                 •            •                                        •
 •                                     •              •
                     •                     •                       •
             •                    •                           •
         PA                       PB                          PC


PA

                                                                                              PC

PB



     Alexander Tiskin (Warwick)                Approximate matching in GC-strings              27 / 52
Matrix distance multiplication
Seaweed braids


Seaweed braids: similar to standard braids, generated by crossings
Unlike in standard braids, all seaweed crossings are
       transversal, i.e. on one level (not underpass/overpass)
       idempotent, i.e. two seaweeds can cross at most once
Seaweed braid -multiplication: associative, no inverse (a crossing cannot
be undone)
Identity: 1 x = x                  Zero: 0 x = 0


1                                                 0




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   28 / 52
Matrix distance multiplication
Seaweed braids


The seaweed monoid Tn :
      n! elements (permutations of size n)
      n − 1 generators g1 , g2 , . . . , gn−1 (elementary crossings)
idempotence:
gi2 = gi for all i                                                              =


far commutativity:
gi gj = gj gi j − i > 1                                         ···     =       ···

braid relations:
gi gj gi = gj gi gj      j −i =1                                            =



   Alexander Tiskin (Warwick)      Approximate matching in GC-strings                 29 / 52
Matrix distance multiplication
Seaweed braids


The seaweed monoid Tn
Also known as the 0-Hecke monoid of the symmetric group H0 (Sn )
Generalisations:
      general 0-Hecke monoids                    [Fomin, Greene: 1998; Buch+: 2008]
      Coxeter monoids              [Tsaranov: 1990; Richardson, Springer: 1990]




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings              30 / 52
Matrix distance multiplication
Seaweed braids


Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   31 / 52
Matrix distance multiplication
Seaweed braids


Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0

aa → a                          bb → b                     bab → 0            aba → 0




   Alexander Tiskin (Warwick)            Approximate matching in GC-strings             31 / 52
Matrix distance multiplication
Seaweed braids


Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0

aa → a                          bb → b                     bab → 0            aba → 0

T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0

aa → a                          ca → ac                    bab → aba          cbac → bcba
bb → b                          cc → c                     cbc → bcb          abacba → 0




   Alexander Tiskin (Warwick)            Approximate matching in GC-strings                 31 / 52
Matrix distance multiplication
Seaweed braids


Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0

aa → a                          bb → b                     bab → 0            aba → 0

T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0

aa → a                          ca → ac                    bab → aba          cbac → bcba
bb → b                          cc → c                     cbc → bcb          abacba → 0

Easy to use, but not an efficient algorithm


   Alexander Tiskin (Warwick)            Approximate matching in GC-strings                 31 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication


The implicit unit-Monge matrix                      -multiplication problem
Given permutation matrices PA , PB , compute PC , such that
 Σ     Σ     Σ
PA PB = PC (equivalently, PA PB = PC )




   Alexander Tiskin (Warwick)        Approximate matching in GC-strings       32 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication


The implicit unit-Monge matrix                      -multiplication problem
Given permutation matrices PA , PB , compute PC , such that
 Σ     Σ     Σ
PA PB = PC (equivalently, PA PB = PC )

Matrix        -multiplication: running time
type                            time
general                         O(n3 )                                                  standard
                                    3          3
                                O n (log log n)
                                       log2 n
                                                                                   [Chan: 2007]
Monge                           O(n2 )                                    via [Aggarwal+: 1987]
implicit unit-Monge             O(n1.5 )                                               [T: 2006]
                                O(n log n)                                             [T: 2010]



   Alexander Tiskin (Warwick)        Approximate matching in GC-strings                     32 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                             PB
                                     ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •
   •      • •
     • •
                  •
           ••
 ••
      •
                ••                            ?
    •          •
       •     •
         PA                                  PC


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   33 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                     ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •
   •      • •
     • •
                  •
           ••
 ••             ••
      •        •
    •        •
       •
   PA,lo , PA,hi


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   34 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                     ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •                                    ••
   •      • •                            •
     • •                                            •       •
                  •
           ••
 ••             ••              • •
      •        •                  •
    •        •                   •
       •                                                •
   PA,lo , PA,hi


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   34 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                     ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •                               •
   •      • •                                      •
     • •                                                  •
                  •                                     • •
           ••                   •
 ••             ••                                     • •
      •        •                              •
    •        •                               •
       •
   PA,lo , PA,hi


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   34 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                      ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •                               •• •
                                             •
   •      • •                                 •
     • •                                       •         •
                  •                                       •
           ••                    •                      • •
 ••             ••              • •                    • •
      •        •                   •
    •        •                    • ••
       •                                                •
   PA,lo , PA,hi                    PC ,lo + PC ,hi


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   34 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication




                                   vs




   Alexander Tiskin (Warwick)        Approximate matching in GC-strings   35 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                      ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •                               •• •
                                             •
   •      • •                                 •
     • •                                       •         •
                  •                                       •
           ••                    •                      • •
 ••             ••              • •                    • •
      •        •                   •
    •        •                    • ••
       •                                                •
   PA,lo , PA,hi                    PC ,lo + PC ,hi


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   36 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication

                                    PB,lo , PB,hi
                                     ••
                                •             ••
                                    ••              •
                                                        • •
                                         •         •
                                •                        •
                                             ••        • •
                                                          • •

    ••   •                                    ••           •
   •      • •                            •              • •
     • •                                           ••
                  •                                      • •
           ••                        •
 ••             ••              • •                    • •
      •        •                   •
    •        •                    • ••
       •                         •
   PA,lo , PA,hi                             PC


   Alexander Tiskin (Warwick)                     Approximate matching in GC-strings   36 / 52
Matrix distance multiplication
Implicit unit-Monge        -multiplication


Implicit unit-Monge matrix                   -multiplication: the algorithm
 Σ                Σ           Σ
PC (i, k) = minj PA (i, j) + PB (j, k)
Divide-and-conquer on the range of j
Divide PA horizontally, PB vertically; two subproblems of effective size n/2:
 Σ
PA,lo       Σ       Σ
           PB,lo = PC ,lo            Σ
                                    PA,hi        Σ       Σ
                                                PB,hi = PC ,hi
Conquer: most (but not all!) nonzeros of PC ,lo , PC ,hi appear in PC
Missing nonzeros can be obtained in time O(n) using the Monge property
Overall time O(n log n)




   Alexander Tiskin (Warwick)        Approximate matching in GC-strings       37 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   38 / 52
Compressed string comparison
Grammar compression


Notation: text t of length n; pattern p of length m
A GC-string (grammar-compressed string) t is a straight-line program
(context-free grammar) generating t = tn by n assignments of the form
                                       ¯    ¯
      tk = α, where α is an alphabet character
      tk = ti tj , where i, j < k
In general, n = O(2n )
                   ¯

Example: Fibonacci string “abaababaabaab”
t1 = ‘b’          t2 = ‘a’
t3 = t2 t1          t4 = t3 t2   t5 = t4 t3            t6 = t5 t4     t7 = t6 t5




   Alexander Tiskin (Warwick)    Approximate matching in GC-strings                39 / 52
Compressed string comparison
Grammar compression


Grammar-compression covers various compression types, e.g. LZ78, LZW
(not LZ77 directly)
Simplifying assumption: arithmetic up to n runs in O(1)
This assumption can be removed by careful index remapping




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   40 / 52
Compressed string comparison
Three-way semi-local LCS on GC-strings


LCS: running time
t         p
plain     plain       O(mn)                                              [Wagner, Fischer: 1974]
                          mn
                      O log2 m                                          [Masek, Paterson: 1980]
                                                                           [Crochemore+: 2003]
GC        plain       O(m3 n + . . .)
                             ¯             general CFG                            [Myers: 1995]
                      O(m1.5 n)¯           3-way semi                                   [T: 2008]
                      O(m log m · n) ¯     3-way semi                                  [T: NEW]
GC        GC          NP-hard                                                    [Lifshits: 2005]
                      O(r 1.2¯1.4 )
                             r                                                [Hermelin+: 2009]
                      O(r log r · ¯)
                                   r                                                   [T: NEW]

r =m+n                ¯=m+n
                      r ¯ ¯


   Alexander Tiskin (Warwick)      Approximate matching in GC-strings                        41 / 52
Compressed string comparison
Three-way semi-local LCS on GC-strings


Three-way semi-local LCS (GC text, plain pattern): the algorithm
For every k, compute by recursion the three-way seaweed matrix for p vs
tk , using seaweed matrix -multiplication: time O(m log m · n)
                                                            ¯
Overall time O(m log m · n)
                         ¯




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   42 / 52
Compressed string comparison
Subsequence recognition on GC-strings


The global subsequence recognition problem
Does text t contain pattern p as a subsequence?

Global subsequence recognition: running time
t         p
plain     plain       O(n)                                                     greedy
GC        plain       O(m¯)
                          n                                                    greedy
GC        GC          NP-hard                                        [Lifshits: 2005]




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings               43 / 52
Compressed string comparison
Subsequence recognition on GC-strings


The local subsequence recognition problem
Find all minimally matching substrings of t with respect to p

Substring of t is matching, if p is a subsequence of t
Matching substring of t is minimally matching, if none of its proper
substrings are matching




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings     44 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Local subsequence recognition: running time ( + output)
t         p
plain     plain       O(mn)                                            [Mannila+: 1995]
                          mn
                      O log m                                                [Das+: 1997]
                      O(c m + n)                                       [Boasson+: 2001]
                      O(m + nσ)                                          [Troniˇek: 2001]
                                                                                c
GC        plain       O(m2 log m¯)
                                 n                                    [C´gielski+: 2006]
                                                                         e
                          1.5 n)
                      O(m ¯                                                       [T: 2008]
                      O(m log m · n)
                                  ¯                                              [T: NEW]
GC        GC          NP-hard                                              [Lifshits: 2005]




   Alexander Tiskin (Warwick)    Approximate matching in GC-strings                    45 / 52
Compressed string comparison
Subsequence recognition on GC-strings


 0        ˆ1
          ı        ˆ3
                   ı       ˆ5
                           ı        n                                   n
           2        2       2




                                               1
                                               ˆ       3
                                                       ˆ           5
                                                                   ˆ
                                                 2         2        2



b i : j matching iff box [i : j] not pierced left-to-right
  -maximal seaweeds:                    -chain ˆ1 ,  1
                                               ı ˆ                      ˆ3 ,  3
                                                                        ı ˆ              ···     ˆs− 1 , s− 1
                                                                                                 ı       ˆ
                                                       2       2            2    2                      2    2

b i : j minimally matching iff (i, j) is in the interleaved                                     -chain
  ˆ3 ,  1
  ı     ˆ         ˆ5 ,  3
                  ı    ˆ       ···        ˆs− 1 , s− 3
                                          ı       ˆ
      2        2                2       2                               2            2




     Alexander Tiskin (Warwick)             Approximate matching in GC-strings                                   46 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Local subsequence recognition (GC text, plain pattern): the algorithm
For every k, compute by recursion the three-way seaweed matrix for p vs
tk , using seaweed matrix -multiplication: time O(m log m · n)
                                                            ¯




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   47 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Local subsequence recognition (GC text, plain pattern): the algorithm
For every k, compute by recursion the three-way seaweed matrix for p vs
tk , using seaweed matrix -multiplication: time O(m log m · n)
                                                            ¯
Given an assignment t = t t , count by recursion
      minimally matching substrings in t
      minimally matching substrings in t




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings   47 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Local subsequence recognition (GC text, plain pattern): the algorithm
For every k, compute by recursion the three-way seaweed matrix for p vs
tk , using seaweed matrix -multiplication: time O(m log m · n)
                                                            ¯
Given an assignment t = t t , count by recursion
      minimally matching substrings in t
      minimally matching substrings in t
Then, find           -chain of   -maximal seaweeds in time n · O(m) = O(m¯)
                                                          ¯             n
The interleaved -chain defines minimally matching substrings in t
overlapping both t and t
Overall time O(m log m · n) + O(m¯) = O(m log m · n)
                         ¯       n                ¯



   Alexander Tiskin (Warwick)    Approximate matching in GC-strings          47 / 52
Compressed string comparison
Subsequence recognition on GC-strings


The threshold approximate matching problem
Find all matching substrings of t with respect to p, according to a
threshold k

Substring of t is matching, if the edit distance for p vs t is at most k




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings         48 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Threshold approximate matching: running time ( + output)
t         p
plain     plain       O(mn)                                                   [Sellers: 1980]
                      O(mk)                                         [Landau, Vishkin: 1989]
                                    4
                      O(m + n + nk )
                                  m                                  [Cole, Hariharan: 2002]
GC        plain       O(m¯k 2 )
                          n                                            [K¨rkk¨inen+: 2003]
                                                                          a a
                      O(m¯k + n log n)
                          n     ¯                              [LV: 1989] via [Bille+: 2010]
                      O(m¯ + nk 4 + n log n)
                          n ¯         ¯                       [CH: 2002] via [Bille+: 2010]
                      O(m log m · n)
                                  ¯                                                [T: NEW]
GC        GC          NP-hard                                                [Lifshits: 2005]

(Also many specialised variants for LZ compression)



   Alexander Tiskin (Warwick)    Approximate matching in GC-strings                      49 / 52
Compressed string comparison
Subsequence recognition on GC-strings


Threshold approximate matching (GC text, plain pattern): the
algorithm
Algorithm structure similar to local subsequence recognition by seaweed
matrix -multiplication and seaweed -chains
Extra ingredients:
      the blow-up technique: reduction of edit distances to LCS scores
      the “implicit SMAWK” technique: row minima in an implicit Monge
      matrix by an extension of the classical “SMAWK” algorithm; replaces
        -chain interleaving
Overall time O(m log m · n) + O(m¯) = O(m log m · n)
                         ¯       n                ¯




   Alexander Tiskin (Warwick)   Approximate matching in GC-strings       50 / 52
1   Introduction

2   Semi-local string comparison

3   Matrix distance multiplication

4   Compressed string comparison

5   Conclusions and future work




    Alexander Tiskin (Warwick)   Approximate matching in GC-strings   51 / 52
Conclusions and future work

A powerful alternative to dynamic programming
Implicit unit-Monge matrices:
     the seaweed monoid
     distance multiplication in time O(n log n)
     next: lower bound?




  Alexander Tiskin (Warwick)   Approximate matching in GC-strings   52 / 52
Conclusions and future work

A powerful alternative to dynamic programming
Implicit unit-Monge matrices:
     the seaweed monoid
     distance multiplication in time O(n log n)
     next: lower bound?
Semi-local LCS problem:
     representation by implicit unit-Monge matrices
     generalisation to rational alignment scores
     next: real alignment scores?




  Alexander Tiskin (Warwick)   Approximate matching in GC-strings   52 / 52
Conclusions and future work

A powerful alternative to dynamic programming
Implicit unit-Monge matrices:
     the seaweed monoid
     distance multiplication in time O(n log n)
     next: lower bound?
Semi-local LCS problem:
     representation by implicit unit-Monge matrices
     generalisation to rational alignment scores
     next: real alignment scores?
Approximate matching in GC-text in time O(m log m · n)
                                                    ¯
Other applications:
     maximum clique in a circle graph in time O(n log2 n)
     parallel LCS in time O mn , comm O m+n per processor
                              p              p 1/2
     identification of evolutionary-conserved regions in DNA
  Alexander Tiskin (Warwick)   Approximate matching in GC-strings   52 / 52

Contenu connexe

Tendances

IIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2018 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2018 Question Paper | Sourav Sir's ClassesSOURAV DAS
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015Edhole.com
 
IIT JAM MATH 2019 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2019 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2019 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2019 Question Paper | Sourav Sir's ClassesSOURAV DAS
 
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)ENGR. KADIRI K. O. Ph.D
 
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضلanasKhalaf4
 
Lecture 13 gram-schmidt inner product spaces - 6.4 6.7
Lecture 13   gram-schmidt  inner product spaces - 6.4 6.7Lecture 13   gram-schmidt  inner product spaces - 6.4 6.7
Lecture 13 gram-schmidt inner product spaces - 6.4 6.7njit-ronbrown
 
Ponencia en el XVIII CCM
Ponencia en el XVIII CCMPonencia en el XVIII CCM
Ponencia en el XVIII CCMrcanom
 
Modulus and argand diagram
Modulus and argand diagramModulus and argand diagram
Modulus and argand diagramTarun Gehlot
 
E-Cordial Labeling of Some Mirror Graphs
E-Cordial Labeling of Some Mirror GraphsE-Cordial Labeling of Some Mirror Graphs
E-Cordial Labeling of Some Mirror GraphsWaqas Tariq
 
Prerna Jee Advanced 2013 maths- Paper 2
Prerna Jee Advanced 2013 maths- Paper 2Prerna Jee Advanced 2013 maths- Paper 2
Prerna Jee Advanced 2013 maths- Paper 2askiitians
 
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...11.fully invariant and characteristic interval valued intuitionistic fuzzy du...
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...Alexander Decker
 
IIT JAM MATH 2021 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2021 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2021 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2021 Question Paper | Sourav Sir's ClassesSOURAV DAS
 
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...Alexander Decker
 
2 7 summary of usage of lcm
2 7 summary of usage of lcm2 7 summary of usage of lcm
2 7 summary of usage of lcmmath123b
 
Discrete-Chapter 02 Functions and Sequences
Discrete-Chapter 02 Functions and SequencesDiscrete-Chapter 02 Functions and Sequences
Discrete-Chapter 02 Functions and SequencesWongyos Keardsri
 
1 2 3
1 2 31 2 3
1 2 3wfke
 

Tendances (20)

IIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2018 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
IIT JAM MATH 2019 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2019 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2019 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2019 Question Paper | Sourav Sir's Classes
 
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)
INTRODUCTION TO CIRCUIT THEORY (A PRACTICAL APPROACH)
 
735
735735
735
 
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
 
Lecture 13 gram-schmidt inner product spaces - 6.4 6.7
Lecture 13   gram-schmidt  inner product spaces - 6.4 6.7Lecture 13   gram-schmidt  inner product spaces - 6.4 6.7
Lecture 13 gram-schmidt inner product spaces - 6.4 6.7
 
Ponencia en el XVIII CCM
Ponencia en el XVIII CCMPonencia en el XVIII CCM
Ponencia en el XVIII CCM
 
Modulus and argand diagram
Modulus and argand diagramModulus and argand diagram
Modulus and argand diagram
 
Graph
GraphGraph
Graph
 
E-Cordial Labeling of Some Mirror Graphs
E-Cordial Labeling of Some Mirror GraphsE-Cordial Labeling of Some Mirror Graphs
E-Cordial Labeling of Some Mirror Graphs
 
Prerna Jee Advanced 2013 maths- Paper 2
Prerna Jee Advanced 2013 maths- Paper 2Prerna Jee Advanced 2013 maths- Paper 2
Prerna Jee Advanced 2013 maths- Paper 2
 
Composite functions
Composite functionsComposite functions
Composite functions
 
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...11.fully invariant and characteristic interval valued intuitionistic fuzzy du...
11.fully invariant and characteristic interval valued intuitionistic fuzzy du...
 
IIT JAM MATH 2021 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2021 Question Paper | Sourav Sir's ClassesIIT JAM MATH 2021 Question Paper | Sourav Sir's Classes
IIT JAM MATH 2021 Question Paper | Sourav Sir's Classes
 
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...
Fully invariant and characteristic interval valued intuitionistic fuzzy dual ...
 
2 7 summary of usage of lcm
2 7 summary of usage of lcm2 7 summary of usage of lcm
2 7 summary of usage of lcm
 
Discrete-Chapter 02 Functions and Sequences
Discrete-Chapter 02 Functions and SequencesDiscrete-Chapter 02 Functions and Sequences
Discrete-Chapter 02 Functions and Sequences
 
1 2 3
1 2 31 2 3
1 2 3
 
EE gate-2016-set-1
EE gate-2016-set-1EE gate-2016-set-1
EE gate-2016-set-1
 

En vedette

Csr2011 june14 16_30_ibsen-jensen
Csr2011 june14 16_30_ibsen-jensenCsr2011 june14 16_30_ibsen-jensen
Csr2011 june14 16_30_ibsen-jensenCSR2011
 
Y7 Module 1 Vocabulary
Y7 Module 1 VocabularyY7 Module 1 Vocabulary
Y7 Module 1 Vocabularycurriea
 
Csr2011 june16 12_00_wagner
Csr2011 june16 12_00_wagnerCsr2011 june16 12_00_wagner
Csr2011 june16 12_00_wagnerCSR2011
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCSR2011
 
Csr2011 june14 11_30_winzen_rotated
Csr2011 june14 11_30_winzen_rotatedCsr2011 june14 11_30_winzen_rotated
Csr2011 june14 11_30_winzen_rotatedCSR2011
 
Nadia y Coraima violence against women
Nadia y Coraima violence against womenNadia y Coraima violence against women
Nadia y Coraima violence against womenGloria Osuna Velasco
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCSR2011
 
Layla的故事(下)
Layla的故事(下)Layla的故事(下)
Layla的故事(下)shihfang Ma
 

En vedette (8)

Csr2011 june14 16_30_ibsen-jensen
Csr2011 june14 16_30_ibsen-jensenCsr2011 june14 16_30_ibsen-jensen
Csr2011 june14 16_30_ibsen-jensen
 
Y7 Module 1 Vocabulary
Y7 Module 1 VocabularyY7 Module 1 Vocabulary
Y7 Module 1 Vocabulary
 
Csr2011 june16 12_00_wagner
Csr2011 june16 12_00_wagnerCsr2011 june16 12_00_wagner
Csr2011 june16 12_00_wagner
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanov
 
Csr2011 june14 11_30_winzen_rotated
Csr2011 june14 11_30_winzen_rotatedCsr2011 june14 11_30_winzen_rotated
Csr2011 june14 11_30_winzen_rotated
 
Nadia y Coraima violence against women
Nadia y Coraima violence against womenNadia y Coraima violence against women
Nadia y Coraima violence against women
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
 
Layla的故事(下)
Layla的故事(下)Layla的故事(下)
Layla的故事(下)
 

Similaire à Csr2011 june18 11_00_tiskin

IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's ClassesIIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's ClassesSOURAV DAS
 
Application of Cylindrical and Spherical coordinate system in double-triple i...
Application of Cylindrical and Spherical coordinate system in double-triple i...Application of Cylindrical and Spherical coordinate system in double-triple i...
Application of Cylindrical and Spherical coordinate system in double-triple i...Sonendra Kumar Gupta
 
IIT Jam math 2016 solutions BY Trajectoryeducation
IIT Jam math 2016 solutions BY TrajectoryeducationIIT Jam math 2016 solutions BY Trajectoryeducation
IIT Jam math 2016 solutions BY TrajectoryeducationDev Singh
 
Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.SEENET-MTP
 
Last+minute+revision(+Final)+(1) (1).pptx
Last+minute+revision(+Final)+(1) (1).pptxLast+minute+revision(+Final)+(1) (1).pptx
Last+minute+revision(+Final)+(1) (1).pptxAryanMishra860130
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...grssieee
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster AnalysisSSA KPI
 
09 p.t (straight line + circle) solution
09 p.t (straight line + circle) solution09 p.t (straight line + circle) solution
09 p.t (straight line + circle) solutionAnamikaRoy39
 
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...Alexander Decker
 
1.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-61.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-6Alexander Decker
 
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdf
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdfApplied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdf
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdfGeetanjaliRao6
 
3 4 character tables
3 4 character tables3 4 character tables
3 4 character tablesAlfiya Ali
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
Diffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metricDiffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metricGota Morota
 

Similaire à Csr2011 june18 11_00_tiskin (20)

IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's ClassesIIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
 
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
 
Application of Cylindrical and Spherical coordinate system in double-triple i...
Application of Cylindrical and Spherical coordinate system in double-triple i...Application of Cylindrical and Spherical coordinate system in double-triple i...
Application of Cylindrical and Spherical coordinate system in double-triple i...
 
Graph
GraphGraph
Graph
 
IIT Jam math 2016 solutions BY Trajectoryeducation
IIT Jam math 2016 solutions BY TrajectoryeducationIIT Jam math 2016 solutions BY Trajectoryeducation
IIT Jam math 2016 solutions BY Trajectoryeducation
 
Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.
 
Last+minute+revision(+Final)+(1) (1).pptx
Last+minute+revision(+Final)+(1) (1).pptxLast+minute+revision(+Final)+(1) (1).pptx
Last+minute+revision(+Final)+(1) (1).pptx
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
 
Expo
ExpoExpo
Expo
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
09 p.t (straight line + circle) solution
09 p.t (straight line + circle) solution09 p.t (straight line + circle) solution
09 p.t (straight line + circle) solution
 
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
 
1.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-61.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-6
 
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdf
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdfApplied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdf
Applied Mathematics Multiple Integration by Mrs. Geetanjali P.Kale.pdf
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
3 4 character tables
3 4 character tables3 4 character tables
3 4 character tables
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Diffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metricDiffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metric
 

Plus de CSR2011

Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCSR2011
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCSR2011
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCSR2011
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCSR2011
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCSR2011
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCSR2011
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCSR2011
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCSR2011
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCSR2011
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCSR2011
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCSR2011
 

Plus de CSR2011 (20)

Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigoriev
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudan
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avron
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilka
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remila
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminski
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 

Csr2011 june18 11_00_tiskin

  • 1. Approximate matching in grammar-compressed strings Alexander Tiskin Department of Computer Science University of Warwick http://www.dcs.warwick.ac.uk/~tiskin Alexander Tiskin (Warwick) Approximate matching in GC-strings 1 / 52
  • 2. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 2 / 52
  • 3. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 3 / 52
  • 4. Introduction String matching: finding an exact pattern in a string String comparison: finding similar patterns in two strings Applications: computational biology, image recognition, . . . Alexander Tiskin (Warwick) Approximate matching in GC-strings 4 / 52
  • 5. Introduction String matching: finding an exact pattern in a string String comparison: finding similar patterns in two strings Applications: computational biology, image recognition, . . . Standard types of string comparison: global: whole string vs whole string local: substrings vs substrings Main focus of this work: semi-local: whole string vs substrings; prefixes vs suffixes Closely related to approximate string matching (no relation to approximation algorithms!) Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices) Alexander Tiskin (Warwick) Approximate matching in GC-strings 4 / 52
  • 6. Introduction Terminology and notation Integers: . . . − 2, −1, 0, 1, 2, . . . Odd half-integers: . . . − 2 , − 3 , − 1 , 2 , 2 , 5 , . . . 5 2 2 1 3 2 (i, j) (i , j ) iff i < i and j < j (i, j) (i , j ) iff i < i and j > j We consider finite and infinite integer matrices over integer and odd half-integer indices. For simplicity, index range will usually be ignored. A permutation matrix is a 0/1 matrix with exactly one nonzero per row and per column   0 1 0 1 0 0 0 0 1 Alexander Tiskin (Warwick) Approximate matching in GC-strings 5 / 52
  • 7. Introduction Terminology and notation Given matrix D, its distribution matrix is D Σ (i, j) = ˆ>i,ˆ<j ı  D(ˆ, ) ı ˆ In other words, D Σ (i, j) = D(ˆ, ), where (ˆ, ) is ı ˆ ı ˆ -dominated by (i, j) Alexander Tiskin (Warwick) Approximate matching in GC-strings 6 / 52
  • 8. Introduction Terminology and notation Given matrix D, its distribution matrix is D Σ (i, j) = ˆ>i,ˆ<j ı  D(ˆ, ) ı ˆ In other words, D Σ (i, j) = D(ˆ, ), where (ˆ, ) is ı ˆ ı ˆ -dominated by (i, j) Given matrix E , its density matrix is E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 ) ı ˆ ı 2 ˆ 2 ı 2 ˆ 1 ı 2 ˆ 1 ı 2 ˆ 2 where D Σ , E over integers; D, E over odd half-integers Alexander Tiskin (Warwick) Approximate matching in GC-strings 6 / 52
  • 9. Introduction Terminology and notation Given matrix D, its distribution matrix is D Σ (i, j) = ˆ>i,ˆ<j ı  D(ˆ, ) ı ˆ In other words, D Σ (i, j) = D(ˆ, ), where (ˆ, ) is ı ˆ ı ˆ -dominated by (i, j) Given matrix E , its density matrix is E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 ) ı ˆ ı 2 ˆ 2 ı 2 ˆ 1 ı 2 ˆ 1 ı 2 ˆ 2 where D Σ , E over integers; D, E over odd half-integers      Σ 0 1 2 3 0 1 2 3   0 1 0 0 1 1 2 0 1 1 2 0 1 0 1 0 0 =  0 0 0 1 = 1 0 0      0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 6 / 52
  • 10. Introduction Terminology and notation Given matrix D, its distribution matrix is D Σ (i, j) = ˆ>i,ˆ<j ı  D(ˆ, ) ı ˆ In other words, D Σ (i, j) = D(ˆ, ), where (ˆ, ) is ı ˆ ı ˆ -dominated by (i, j) Given matrix E , its density matrix is E (ˆ, ) = E (ˆ− 1 ,  + 1 ) − E (ˆ− 1 ,  − 2 ) − E (ˆ+ 1 ,  + 2 ) + E (ˆ+ 1 ,  − 1 ) ı ˆ ı 2 ˆ 2 ı 2 ˆ 1 ı 2 ˆ 1 ı 2 ˆ 2 where D Σ , E over integers; D, E over odd half-integers      Σ 0 1 2 3 0 1 2 3   0 1 0 0 1 1 2 0 1 1 2 0 1 0 1 0 0 =  0 0 0 1 = 1 0 0      0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 (D Σ ) = D for all D Matrix E is simple, if (E )Σ = E Alexander Tiskin (Warwick) Approximate matching in GC-strings 6 / 52
  • 11. Introduction Terminology and notation Matrix E is Monge, if E is nonnegative Intuition: border-to-border distances in a (weighted) planar graph Matrix E is unit-Monge, if E is a permutation matrix Intuition: border-to-border distances in a grid-like graph Alexander Tiskin (Warwick) Approximate matching in GC-strings 7 / 52
  • 12. Introduction Terminology and notation Matrix E is Monge, if E is nonnegative Intuition: border-to-border distances in a (weighted) planar graph Matrix E is unit-Monge, if E is a permutation matrix Intuition: border-to-border distances in a grid-like graph Simple unit-Monge matrix: P Σ , where P is a permutation matrix Seaweed matrix: P Σ , represented implicitly by P    Σ 0 1 2 3 0 1 0 1 0 0 = 0 1 1 2   0 0 0 1 0 0 1 0 0 0 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 7 / 52
  • 13. Introduction Implicit unit-Monge matrices Efficient P Σ queries: range tree on nonzeros of P [Bentley: 1980] binary search tree by i-coordinate under every node, binary search tree by j-coordinate • • • • −→ • −→ • • • • • • • ↓ • • • • −→ • −→ • • • • • • • ↓ • • • • −→ • −→ • • • • • • • Alexander Tiskin (Warwick) Approximate matching in GC-strings 8 / 52
  • 14. Introduction Implicit unit-Monge matrices Efficient P Σ queries: (contd.) Every node of the range tree represents a canonical range (rectangular region), and stores its nonzero count Overall, ≤ n log n canonical ranges are non-empty A P Σ query means dominance counting: how many nonzeros are dominated by query point? Answered by decomposing query range into ≤ log2 n disjoint canonical ranges. Total size O(n log n), query time O(log2 n) Alexander Tiskin (Warwick) Approximate matching in GC-strings 9 / 52
  • 15. Introduction Implicit unit-Monge matrices Efficient P Σ queries: (contd.) Every node of the range tree represents a canonical range (rectangular region), and stores its nonzero count Overall, ≤ n log n canonical ranges are non-empty A P Σ query means dominance counting: how many nonzeros are dominated by query point? Answered by decomposing query range into ≤ log2 n disjoint canonical ranges. Total size O(n log n), query time O(log2 n) There are asymptotically more efficient (but less practical) data structures log n Total size O(n), query time O log log n [J´J´+: 2004] a a [Chan, Pˇtra¸cu: 2010] a s Alexander Tiskin (Warwick) Approximate matching in GC-strings 9 / 52
  • 16. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 10 / 52
  • 17. Semi-local string comparison Semi-local LCS and edit distance Consider strings (= sequences) over an alphabet of size σ Distinguish contiguous substrings and not necessarily contiguous subsequences Special cases of substring: prefix, suffix Notation: strings a, b of length m, n respectively Assume where necessary: m ≤ n; m, n reasonably close Alexander Tiskin (Warwick) Approximate matching in GC-strings 11 / 52
  • 18. Semi-local string comparison Semi-local LCS and edit distance Consider strings (= sequences) over an alphabet of size σ Distinguish contiguous substrings and not necessarily contiguous subsequences Special cases of substring: prefix, suffix Notation: strings a, b of length m, n respectively Assume where necessary: m ≤ n; m, n reasonably close The longest common subsequence (LCS) score: length of longest string that is a subsequence of both a and b equivalently, alignment score, where score(match) = 1 and score(mismatch) = 0 In biological terms, “loss-free alignment” (unlike “lossy” BLAST) Alexander Tiskin (Warwick) Approximate matching in GC-strings 11 / 52
  • 19. Semi-local string comparison Semi-local LCS and edit distance The LCS problem Give the LCS score for a vs b Alexander Tiskin (Warwick) Approximate matching in GC-strings 12 / 52
  • 20. Semi-local string comparison Semi-local LCS and edit distance The LCS problem Give the LCS score for a vs b LCS: running time O(mn) [Wagner, Fischer: 1974] mn O log2 n σ = O(1) [Masek, Paterson: 1980] [Crochemore+: 2003] mn(log log n)2 O log2 n [Paterson, Danˇ´cık: 1994] [Bille, Farach-Colton: 2008] Running time varies depending on the RAM model We assume word-RAM with word size log n Alexander Tiskin (Warwick) Approximate matching in GC-strings 12 / 52
  • 21. Semi-local string comparison Semi-local LCS and edit distance LCS on the alignment graph (directed, acyclic) b a a b c a b c a b a c a blue = 0 b red = 1 a a b c b c a LCS(“baabcbca”, “baabcabcabaca”) = “baabcbca” LCS = highest-score corner-to-corner path Alexander Tiskin (Warwick) Approximate matching in GC-strings 13 / 52
  • 22. Semi-local string comparison Semi-local LCS and edit distance LCS: dynamic programming [WF: 1974] Sweep alignment graph by cells Cell update: time O(1) Overall time O(mn) Alexander Tiskin (Warwick) Approximate matching in GC-strings 14 / 52
  • 23. Semi-local string comparison Semi-local LCS and edit distance LCS: micro-block dynamic programming [MP: 1980; BF: 2008] Sweep alignment graph by micro-blocks Micro-block size: t = O(log n) when σ = O(1) log n t=O log log n otherwise Micro-block interface: O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits O(t) small integers, each O(1) bits Micro-block update: time O(1), via table of all possible interfaces mn mn(log log n)2 Overall time O log2 n when σ = O(1), O log2 n otherwise Alexander Tiskin (Warwick) Approximate matching in GC-strings 15 / 52
  • 24. Semi-local string comparison Semi-local LCS and edit distance The semi-local LCS problem Give the (implicit) matrix of O(m2 + n2 ) LCS scores: string-substring LCS: string a vs every substring of b prefix-suffix LCS: every prefix of a vs every suffix of b symmetrically, substring-string and suffix-prefix LCS Alexander Tiskin (Warwick) Approximate matching in GC-strings 16 / 52
  • 25. Semi-local string comparison Semi-local LCS and edit distance The semi-local LCS problem Give the (implicit) matrix of O(m2 + n2 ) LCS scores: string-substring LCS: string a vs every substring of b prefix-suffix LCS: every prefix of a vs every suffix of b symmetrically, substring-string and suffix-prefix LCS The three-way semi-local LCS problem Give the (implicit) matrix of O(n2 ) LCS scores: string-substring, prefix-suffix, suffix-prefix LCS no substring-string LCS Suitable for m n Alexander Tiskin (Warwick) Approximate matching in GC-strings 16 / 52
  • 26. Semi-local string comparison Semi-local LCS and edit distance The semi-local LCS problem Give the (implicit) matrix of O(m2 + n2 ) LCS scores: string-substring LCS: string a vs every substring of b prefix-suffix LCS: every prefix of a vs every suffix of b symmetrically, substring-string and suffix-prefix LCS The three-way semi-local LCS problem Give the (implicit) matrix of O(n2 ) LCS scores: string-substring, prefix-suffix, suffix-prefix LCS no substring-string LCS Suitable for m n Cf.: dynamic programming gives prefix-prefix LCS Alexander Tiskin (Warwick) Approximate matching in GC-strings 16 / 52
  • 27. Semi-local string comparison Semi-local LCS and edit distance Semi-local LCS on the alignment graph b a a b c a b c a b a c a blue = 0 b red = 1 a a b c b c a score(“baabcbca”, “cabcaba”) = 5 (“abcba”) Semi-local LCS = all highest-score border-to-border paths (string-substring = top-to-bottom, etc.) Alexander Tiskin (Warwick) Approximate matching in GC-strings 17 / 52
  • 28. Semi-local string comparison Score matrices and seaweed matrices The score matrix H 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “baabcbca” −1 0 1 2 3 4 5 5 6 7 7 7 7 7 −2 −1 0 1 2 3 4 4 5 6 6 6 6 7 b = “baabcabcabaca” −3 −2 −1 0 1 2 3 3 4 5 5 6 6 7 b = b 4 : 11 = “cabcaba” −4 −3 −2 −1 0 1 2 2 3 4 4 5 5 6 H(4, 11) = LCS(a, b ) = 5 −5 −4 −3 −2 −1 0 1 2 3 4 4 5 5 6 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j −7 −6 −5 −4 −3 −2 −1 0 1 2 2 3 3 4 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 −11 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 −11 1 −13 −11 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 18 / 52
  • 29. Semi-local string comparison Score matrices and seaweed matrices Semi-local LCS: output representation and running time size query time O(n2 ) O(1) trivial O(m1/2 n) O(log n) string-substring [Alves+: 2003] O(n) O(n) string-substring [Alves+: 2005] O(n log n) O(log2 n) [T: 2006] . . . or any 2D orthogonal range counting data structure running time O(mn2 ) naive O(mn) string-substring [Schmidt: 1998; Alves+: 2005] O(mn) [T: 2006] mn O log0.5 n [T: 2006] mn(log log n)2 O log2 n [T: 2007] Alexander Tiskin (Warwick) Approximate matching in GC-strings 19 / 52
  • 30. Semi-local string comparison Score matrices and seaweed matrices The score matrix H and the seaweed matrix P H(i, j): the number of matched characters for a vs substring b i : j j − i − H(i, j): the number of unmatched characters Properties of matrix j − i − H(i, j): simple unit-Monge therefore, = P Σ , where P = −H is a permutation matrix P is the seaweed matrix, giving an implicit representation of H Range tree for P: memory O(n log n), query time O(log2 n) Alexander Tiskin (Warwick) Approximate matching in GC-strings 20 / 52
  • 31. Semi-local string comparison Score matrices and seaweed matrices The score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “baabcbca” −1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “baabcabcabaca” −2 −1 0 1 2 3 4 4 5 6 6 6 6 7 • −3 −2 −1 0 1 2 3 3 4 5 5 6 6 7 b = b 4 : 11 = “cabcaba” −4 −3 −2 −1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = LCS(a, b ) = 5 −5 −4 −3 −2 −1 0 1 2 3 4 4 5 5 6 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j −7 −6 −5 −4 −3 −2 −1 0 1 2 2 3 3 4 • −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 • −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 −11 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 −11 1 −13 −11 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 21 / 52
  • 32. Semi-local string comparison Score matrices and seaweed matrices The score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “baabcbca” −1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “baabcabcabaca” −2 −1 0 1 2 3 4 4 5 6 6 6 6 7 • −3 −2 −1 0 1 2 3 3 4 5 5 6 6 7 b = b 4 : 11 = “cabcaba” −4 −3 −2 −1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = LCS(a, b ) = 5 −5 −4 −3 −2 −1 0 1 2 3 4 4 5 5 6 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j −7 −6 −5 −4 −3 −2 −1 0 1 2 2 3 3 4 • blue: difference in H is 0 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 • red: difference in H is 1 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 −11 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 −11 1 −13 −11 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 21 / 52
  • 33. Semi-local string comparison Score matrices and seaweed matrices The score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “baabcbca” −1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “baabcabcabaca” −2 −1 0 1 2 3 4 4 5 6 6 6 6 7 • −3 −2 −1 0 1 2 3 3 4 5 5 6 6 7 b = b 4 : 11 = “cabcaba” −4 −3 −2 −1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = LCS(a, b ) = 5 −5 −4 −3 −2 −1 0 1 2 3 4 4 5 5 6 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j −7 −6 −5 −4 −3 −2 −1 0 1 2 2 3 3 4 • blue: difference in H is 0 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 3 4 • red: difference in H is 1 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 green: P(i, j) = 1 −11 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 −11 1 H(i, j) = j − i − P Σ (i, j) −13 −11 −12 −10−9 −8 −7 −6 −5 −4 −3 −2 −1 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 21 / 52
  • 34. Semi-local string comparison Score matrices and seaweed matrices The score matrix H and the seaweed matrix P a = “baabcbca” • b = “baabcabcabaca” • b = b 4 : 11 = “cabcaba” • H(4, 11) = LCS(a, b ) = 11 − 4 − P Σ (i, j) = 11 − 4 − 2 = 5 • • Alexander Tiskin (Warwick) Approximate matching in GC-strings 22 / 52
  • 35. Semi-local string comparison Score matrices and seaweed matrices The seaweeds in the alignment graph b a a b c a b c a b a c a a = “baabcbca” • b a b = “baabcabcabaca” a b = b 4 : 11 = “cabcaba” b H(4, 11) = LCS(a, b ) = c 11 − 4 − P Σ (i, j) = b 11 − 4 − 2 = 5 c a • P(i, j) = 1 corresponds to seaweed (top, i) (bottom, j) Alexander Tiskin (Warwick) Approximate matching in GC-strings 23 / 52
  • 36. Semi-local string comparison Score matrices and seaweed matrices The seaweeds in the alignment graph b a a b c a b c a b a c a a = “baabcbca” • b a b = “baabcabcabaca” a b = b 4 : 11 = “cabcaba” b H(4, 11) = LCS(a, b ) = c 11 − 4 − P Σ (i, j) = b 11 − 4 − 2 = 5 c a • P(i, j) = 1 corresponds to seaweed (top, i) (bottom, j) Also define top right, left right, left bottom seaweeds Gives bijection between top-left and bottom-right borders Alexander Tiskin (Warwick) Approximate matching in GC-strings 23 / 52
  • 37. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 24 / 52
  • 38. Matrix distance multiplication Seaweed braids Distance algebra (a.k.a (min, +) or tropical algebra): ⊕ is min, is + Matrix -multiplication A B=C C (i, k) = j A(i, j) B(j, k) = minj A(i, j) + B(j, k) Alexander Tiskin (Warwick) Approximate matching in GC-strings 25 / 52
  • 39. Matrix distance multiplication Seaweed braids Distance algebra (a.k.a (min, +) or tropical algebra): ⊕ is min, is + Matrix -multiplication A B=C C (i, k) = j A(i, j) B(j, k) = minj A(i, j) + B(j, k) Matrix classes closed under -multiplication (for given n): general numerical (integer, real) matrices Monge matrices simple unit-Monge matrices Σ PA Σ Σ PB = PC written as PA PB = PC Alexander Tiskin (Warwick) Approximate matching in GC-strings 25 / 52
  • 40. Matrix distance multiplication Seaweed braids The seaweed monoid Tn : simple unit-Monge matrices under -multiplication permutation matrices under -multiplication Identity: 1 x =x Zero: 0 x =0   · · · •   • · · · · • · · · · • · 1= 0= ·  • · ·  · · • · · · · • • · · · Alexander Tiskin (Warwick) Approximate matching in GC-strings 26 / 52
  • 41. Matrix distance multiplication Seaweed braids PA PB = PC can be seen as -multiplication of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PC Alexander Tiskin (Warwick) Approximate matching in GC-strings 27 / 52
  • 42. Matrix distance multiplication Seaweed braids PA PB = PC can be seen as -multiplication of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PC PA PB Alexander Tiskin (Warwick) Approximate matching in GC-strings 27 / 52
  • 43. Matrix distance multiplication Seaweed braids PA PB = PC can be seen as -multiplication of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PC PA PB Alexander Tiskin (Warwick) Approximate matching in GC-strings 27 / 52
  • 44. Matrix distance multiplication Seaweed braids PA PB = PC can be seen as -multiplication of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PC PA PC PB Alexander Tiskin (Warwick) Approximate matching in GC-strings 27 / 52
  • 45. Matrix distance multiplication Seaweed braids Seaweed braids: similar to standard braids, generated by crossings Unlike in standard braids, all seaweed crossings are transversal, i.e. on one level (not underpass/overpass) idempotent, i.e. two seaweeds can cross at most once Seaweed braid -multiplication: associative, no inverse (a crossing cannot be undone) Identity: 1 x = x Zero: 0 x = 0 1 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 28 / 52
  • 46. Matrix distance multiplication Seaweed braids The seaweed monoid Tn : n! elements (permutations of size n) n − 1 generators g1 , g2 , . . . , gn−1 (elementary crossings) idempotence: gi2 = gi for all i = far commutativity: gi gj = gj gi j − i > 1 ··· = ··· braid relations: gi gj gi = gj gi gj j −i =1 = Alexander Tiskin (Warwick) Approximate matching in GC-strings 29 / 52
  • 47. Matrix distance multiplication Seaweed braids The seaweed monoid Tn Also known as the 0-Hecke monoid of the symmetric group H0 (Sn ) Generalisations: general 0-Hecke monoids [Fomin, Greene: 1998; Buch+: 2008] Coxeter monoids [Tsaranov: 1990; Richardson, Springer: 1990] Alexander Tiskin (Warwick) Approximate matching in GC-strings 30 / 52
  • 48. Matrix distance multiplication Seaweed braids Computation in the seaweed monoid: a confluent rewriting system can be obtained by software (Semigroupe, GAP) Alexander Tiskin (Warwick) Approximate matching in GC-strings 31 / 52
  • 49. Matrix distance multiplication Seaweed braids Computation in the seaweed monoid: a confluent rewriting system can be obtained by software (Semigroupe, GAP) T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0 aa → a bb → b bab → 0 aba → 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 31 / 52
  • 50. Matrix distance multiplication Seaweed braids Computation in the seaweed monoid: a confluent rewriting system can be obtained by software (Semigroupe, GAP) T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0 aa → a bb → b bab → 0 aba → 0 T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac, bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0 aa → a ca → ac bab → aba cbac → bcba bb → b cc → c cbc → bcb abacba → 0 Alexander Tiskin (Warwick) Approximate matching in GC-strings 31 / 52
  • 51. Matrix distance multiplication Seaweed braids Computation in the seaweed monoid: a confluent rewriting system can be obtained by software (Semigroupe, GAP) T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0 aa → a bb → b bab → 0 aba → 0 T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac, bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0 aa → a ca → ac bab → aba cbac → bcba bb → b cc → c cbc → bcb abacba → 0 Easy to use, but not an efficient algorithm Alexander Tiskin (Warwick) Approximate matching in GC-strings 31 / 52
  • 52. Matrix distance multiplication Implicit unit-Monge -multiplication The implicit unit-Monge matrix -multiplication problem Given permutation matrices PA , PB , compute PC , such that Σ Σ Σ PA PB = PC (equivalently, PA PB = PC ) Alexander Tiskin (Warwick) Approximate matching in GC-strings 32 / 52
  • 53. Matrix distance multiplication Implicit unit-Monge -multiplication The implicit unit-Monge matrix -multiplication problem Given permutation matrices PA , PB , compute PC , such that Σ Σ Σ PA PB = PC (equivalently, PA PB = PC ) Matrix -multiplication: running time type time general O(n3 ) standard 3 3 O n (log log n) log2 n [Chan: 2007] Monge O(n2 ) via [Aggarwal+: 1987] implicit unit-Monge O(n1.5 ) [T: 2006] O(n log n) [T: 2010] Alexander Tiskin (Warwick) Approximate matching in GC-strings 32 / 52
  • 54. Matrix distance multiplication Implicit unit-Monge -multiplication PB •• • •• •• • • • • • • • •• • • • • •• • • • • • • • •• •• • •• ? • • • • PA PC Alexander Tiskin (Warwick) Approximate matching in GC-strings 33 / 52
  • 55. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • • • • • • • •• •• •• • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Approximate matching in GC-strings 34 / 52
  • 56. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • •• •• •• • • • • • • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Approximate matching in GC-strings 34 / 52
  • 57. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • • • • • • • • • • • • •• • •• •• • • • • • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Approximate matching in GC-strings 34 / 52
  • 58. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • • • • •• • • • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC ,lo + PC ,hi Alexander Tiskin (Warwick) Approximate matching in GC-strings 34 / 52
  • 59. Matrix distance multiplication Implicit unit-Monge -multiplication vs Alexander Tiskin (Warwick) Approximate matching in GC-strings 35 / 52
  • 60. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • • • • •• • • • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC ,lo + PC ,hi Alexander Tiskin (Warwick) Approximate matching in GC-strings 36 / 52
  • 61. Matrix distance multiplication Implicit unit-Monge -multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • •• • • • •• • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC Alexander Tiskin (Warwick) Approximate matching in GC-strings 36 / 52
  • 62. Matrix distance multiplication Implicit unit-Monge -multiplication Implicit unit-Monge matrix -multiplication: the algorithm Σ Σ Σ PC (i, k) = minj PA (i, j) + PB (j, k) Divide-and-conquer on the range of j Divide PA horizontally, PB vertically; two subproblems of effective size n/2: Σ PA,lo Σ Σ PB,lo = PC ,lo Σ PA,hi Σ Σ PB,hi = PC ,hi Conquer: most (but not all!) nonzeros of PC ,lo , PC ,hi appear in PC Missing nonzeros can be obtained in time O(n) using the Monge property Overall time O(n log n) Alexander Tiskin (Warwick) Approximate matching in GC-strings 37 / 52
  • 63. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 38 / 52
  • 64. Compressed string comparison Grammar compression Notation: text t of length n; pattern p of length m A GC-string (grammar-compressed string) t is a straight-line program (context-free grammar) generating t = tn by n assignments of the form ¯ ¯ tk = α, where α is an alphabet character tk = ti tj , where i, j < k In general, n = O(2n ) ¯ Example: Fibonacci string “abaababaabaab” t1 = ‘b’ t2 = ‘a’ t3 = t2 t1 t4 = t3 t2 t5 = t4 t3 t6 = t5 t4 t7 = t6 t5 Alexander Tiskin (Warwick) Approximate matching in GC-strings 39 / 52
  • 65. Compressed string comparison Grammar compression Grammar-compression covers various compression types, e.g. LZ78, LZW (not LZ77 directly) Simplifying assumption: arithmetic up to n runs in O(1) This assumption can be removed by careful index remapping Alexander Tiskin (Warwick) Approximate matching in GC-strings 40 / 52
  • 66. Compressed string comparison Three-way semi-local LCS on GC-strings LCS: running time t p plain plain O(mn) [Wagner, Fischer: 1974] mn O log2 m [Masek, Paterson: 1980] [Crochemore+: 2003] GC plain O(m3 n + . . .) ¯ general CFG [Myers: 1995] O(m1.5 n)¯ 3-way semi [T: 2008] O(m log m · n) ¯ 3-way semi [T: NEW] GC GC NP-hard [Lifshits: 2005] O(r 1.2¯1.4 ) r [Hermelin+: 2009] O(r log r · ¯) r [T: NEW] r =m+n ¯=m+n r ¯ ¯ Alexander Tiskin (Warwick) Approximate matching in GC-strings 41 / 52
  • 67. Compressed string comparison Three-way semi-local LCS on GC-strings Three-way semi-local LCS (GC text, plain pattern): the algorithm For every k, compute by recursion the three-way seaweed matrix for p vs tk , using seaweed matrix -multiplication: time O(m log m · n) ¯ Overall time O(m log m · n) ¯ Alexander Tiskin (Warwick) Approximate matching in GC-strings 42 / 52
  • 68. Compressed string comparison Subsequence recognition on GC-strings The global subsequence recognition problem Does text t contain pattern p as a subsequence? Global subsequence recognition: running time t p plain plain O(n) greedy GC plain O(m¯) n greedy GC GC NP-hard [Lifshits: 2005] Alexander Tiskin (Warwick) Approximate matching in GC-strings 43 / 52
  • 69. Compressed string comparison Subsequence recognition on GC-strings The local subsequence recognition problem Find all minimally matching substrings of t with respect to p Substring of t is matching, if p is a subsequence of t Matching substring of t is minimally matching, if none of its proper substrings are matching Alexander Tiskin (Warwick) Approximate matching in GC-strings 44 / 52
  • 70. Compressed string comparison Subsequence recognition on GC-strings Local subsequence recognition: running time ( + output) t p plain plain O(mn) [Mannila+: 1995] mn O log m [Das+: 1997] O(c m + n) [Boasson+: 2001] O(m + nσ) [Troniˇek: 2001] c GC plain O(m2 log m¯) n [C´gielski+: 2006] e 1.5 n) O(m ¯ [T: 2008] O(m log m · n) ¯ [T: NEW] GC GC NP-hard [Lifshits: 2005] Alexander Tiskin (Warwick) Approximate matching in GC-strings 45 / 52
  • 71. Compressed string comparison Subsequence recognition on GC-strings 0 ˆ1 ı ˆ3 ı ˆ5 ı n n 2 2 2 1 ˆ 3 ˆ 5 ˆ 2 2 2 b i : j matching iff box [i : j] not pierced left-to-right -maximal seaweeds: -chain ˆ1 ,  1 ı ˆ ˆ3 ,  3 ı ˆ ··· ˆs− 1 , s− 1 ı ˆ 2 2 2 2 2 2 b i : j minimally matching iff (i, j) is in the interleaved -chain ˆ3 ,  1 ı ˆ ˆ5 ,  3 ı ˆ ··· ˆs− 1 , s− 3 ı ˆ 2 2 2 2 2 2 Alexander Tiskin (Warwick) Approximate matching in GC-strings 46 / 52
  • 72. Compressed string comparison Subsequence recognition on GC-strings Local subsequence recognition (GC text, plain pattern): the algorithm For every k, compute by recursion the three-way seaweed matrix for p vs tk , using seaweed matrix -multiplication: time O(m log m · n) ¯ Alexander Tiskin (Warwick) Approximate matching in GC-strings 47 / 52
  • 73. Compressed string comparison Subsequence recognition on GC-strings Local subsequence recognition (GC text, plain pattern): the algorithm For every k, compute by recursion the three-way seaweed matrix for p vs tk , using seaweed matrix -multiplication: time O(m log m · n) ¯ Given an assignment t = t t , count by recursion minimally matching substrings in t minimally matching substrings in t Alexander Tiskin (Warwick) Approximate matching in GC-strings 47 / 52
  • 74. Compressed string comparison Subsequence recognition on GC-strings Local subsequence recognition (GC text, plain pattern): the algorithm For every k, compute by recursion the three-way seaweed matrix for p vs tk , using seaweed matrix -multiplication: time O(m log m · n) ¯ Given an assignment t = t t , count by recursion minimally matching substrings in t minimally matching substrings in t Then, find -chain of -maximal seaweeds in time n · O(m) = O(m¯) ¯ n The interleaved -chain defines minimally matching substrings in t overlapping both t and t Overall time O(m log m · n) + O(m¯) = O(m log m · n) ¯ n ¯ Alexander Tiskin (Warwick) Approximate matching in GC-strings 47 / 52
  • 75. Compressed string comparison Subsequence recognition on GC-strings The threshold approximate matching problem Find all matching substrings of t with respect to p, according to a threshold k Substring of t is matching, if the edit distance for p vs t is at most k Alexander Tiskin (Warwick) Approximate matching in GC-strings 48 / 52
  • 76. Compressed string comparison Subsequence recognition on GC-strings Threshold approximate matching: running time ( + output) t p plain plain O(mn) [Sellers: 1980] O(mk) [Landau, Vishkin: 1989] 4 O(m + n + nk ) m [Cole, Hariharan: 2002] GC plain O(m¯k 2 ) n [K¨rkk¨inen+: 2003] a a O(m¯k + n log n) n ¯ [LV: 1989] via [Bille+: 2010] O(m¯ + nk 4 + n log n) n ¯ ¯ [CH: 2002] via [Bille+: 2010] O(m log m · n) ¯ [T: NEW] GC GC NP-hard [Lifshits: 2005] (Also many specialised variants for LZ compression) Alexander Tiskin (Warwick) Approximate matching in GC-strings 49 / 52
  • 77. Compressed string comparison Subsequence recognition on GC-strings Threshold approximate matching (GC text, plain pattern): the algorithm Algorithm structure similar to local subsequence recognition by seaweed matrix -multiplication and seaweed -chains Extra ingredients: the blow-up technique: reduction of edit distances to LCS scores the “implicit SMAWK” technique: row minima in an implicit Monge matrix by an extension of the classical “SMAWK” algorithm; replaces -chain interleaving Overall time O(m log m · n) + O(m¯) = O(m log m · n) ¯ n ¯ Alexander Tiskin (Warwick) Approximate matching in GC-strings 50 / 52
  • 78. 1 Introduction 2 Semi-local string comparison 3 Matrix distance multiplication 4 Compressed string comparison 5 Conclusions and future work Alexander Tiskin (Warwick) Approximate matching in GC-strings 51 / 52
  • 79. Conclusions and future work A powerful alternative to dynamic programming Implicit unit-Monge matrices: the seaweed monoid distance multiplication in time O(n log n) next: lower bound? Alexander Tiskin (Warwick) Approximate matching in GC-strings 52 / 52
  • 80. Conclusions and future work A powerful alternative to dynamic programming Implicit unit-Monge matrices: the seaweed monoid distance multiplication in time O(n log n) next: lower bound? Semi-local LCS problem: representation by implicit unit-Monge matrices generalisation to rational alignment scores next: real alignment scores? Alexander Tiskin (Warwick) Approximate matching in GC-strings 52 / 52
  • 81. Conclusions and future work A powerful alternative to dynamic programming Implicit unit-Monge matrices: the seaweed monoid distance multiplication in time O(n log n) next: lower bound? Semi-local LCS problem: representation by implicit unit-Monge matrices generalisation to rational alignment scores next: real alignment scores? Approximate matching in GC-text in time O(m log m · n) ¯ Other applications: maximum clique in a circle graph in time O(n log2 n) parallel LCS in time O mn , comm O m+n per processor p p 1/2 identification of evolutionary-conserved regions in DNA Alexander Tiskin (Warwick) Approximate matching in GC-strings 52 / 52