SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Compressed Membership in Automata with
                            Compressed Labels

                                              Markus Lohrey
                                            Christian Mathissen

                                             University of Leipzig


                                              June 13, 2011




Lohrey, Mathissen (University of Leipzig)     Compressed Membership   June 13, 2011   1 / 12
Motivation

  Try to develop algorithms that directly work on compressed data.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   2 / 12
Motivation

  Try to develop algorithms that directly work on compressed data.
  Goal: Beat straightforward decompress and manipulate strategy.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   2 / 12
Motivation

  Try to develop algorithms that directly work on compressed data.
  Goal: Beat straightforward decompress and manipulate strategy.
  In this talk: focus on compressed strings




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   2 / 12
Motivation

  Try to develop algorithms that directly work on compressed data.
  Goal: Beat straightforward decompress and manipulate strategy.
  In this talk: focus on compressed strings
  Applications:
     ◮    all domains, where massive string data arise and have to be
          processed, e.g. bioinformatics




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   2 / 12
Motivation

  Try to develop algorithms that directly work on compressed data.
  Goal: Beat straightforward decompress and manipulate strategy.
  In this talk: focus on compressed strings
  Applications:
     ◮    all domains, where massive string data arise and have to be
          processed, e.g. bioinformatics
     ◮    large (and highly compressible) strings often occur as intermediate
          data structures (e.g. in computational group theory, program analysis,
          verification).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   2 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .
  Alternatively: a context-free grammar that generates exactly one string.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .
  Alternatively: a context-free grammar that generates exactly one string.
  We write val(A) for the unique word generated by A.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .
  Alternatively: a context-free grammar that generates exactly one string.
  We write val(A) for the unique word generated by A.
  The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .
  Alternatively: a context-free grammar that generates exactly one string.
  We write val(A) for the unique word generated by A.
  The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.
  One may have |val(A)| = 2n .



Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Dictionary-based compression algorithms (LZ77, LZ78) exploit text
  repetition.
  Straight-line programs are a general representation for compressed strings,
  which covers most dictionary-based algorithms.
  A straight-line program (SLP) over the alphabet Γ is a sequence of
  definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
  some j, k < i .
  Alternatively: a context-free grammar that generates exactly one string.
  We write val(A) for the unique word generated by A.
  The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.
  One may have |val(A)| = 2n .
  Thus, an SLP A can be seen as a compressed representation of val(A).

Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   3 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).




Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
  Then val(A) = abaababaabaab:




Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
  Then val(A) = abaababaabaab:

                                     A3 = A2 A1 = ab
                                     A4 = A3 A2 = aba
                                     A5 = A4 A3 = abaab
                                     A6 = A5 A4 = abaababa
                                     A7 = A6 A5 = abaababaabaab




Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
  Then val(A) = abaababaabaab:

                                     A3 = A2 A1 = ab
                                     A4 = A3 A2 = aba
                                     A5 = A4 A3 = abaab
                                     A6 = A5 A4 = abaababa
                                     A7 = A6 A5 = abaababaabaab

  Relationship to dictionary-based compression (Rytter):




Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
  Then val(A) = abaababaabaab:

                                     A3 = A2 A1 = ab
                                     A4 = A3 A2 = aba
                                     A5 = A4 A3 = abaab
                                     A6 = A5 A4 = abaababa
                                     A7 = A6 A5 = abaababaabaab

  Relationship to dictionary-based compression (Rytter):
     ◮    From an SLP A one can compute in polynomial time LZ77(val(A)).




Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Straight-line programs
  Example: A = (A1 := b,                    A2 := a,      Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
  Then val(A) = abaababaabaab:

                                     A3 = A2 A1 = ab
                                     A4 = A3 A2 = aba
                                     A5 = A4 A3 = abaab
                                     A6 = A5 A4 = abaababa
                                     A7 = A6 A5 = abaababaabaab

  Relationship to dictionary-based compression (Rytter):
     ◮    From an SLP A one can compute in polynomial time LZ77(val(A)).
     ◮    From LZ77(w ) one can compute in polynomial time an SLP A with
          val(A) = w .

Lohrey, Mathissen (University of Leipzig)    Compressed Membership               June 13, 2011   4 / 12
Algorithms on SLP-compressed strings

  The mother of all algorithms on SLP -compressed strings:




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   5 / 12
Algorithms on SLP-compressed strings

  The mother of all algorithms on SLP -compressed strings:


  Plandowski 1994
  The following problem can be solved in polynomial time:
  INPUT: SLPs A, B
  QUESTION: val(A) = val(B)?




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   5 / 12
Algorithms on SLP-compressed strings

  The mother of all algorithms on SLP -compressed strings:


  Plandowski 1994
  The following problem can be solved in polynomial time:
  INPUT: SLPs A, B
  QUESTION: val(A) = val(B)?


  Note: The decompress-and-compare strategy does not work here.
  We cannot compute val(A) and val(B) in polynomial time.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   5 / 12
Algorithms on SLP-compressed strings

  The mother of all algorithms on SLP -compressed strings:


  Plandowski 1994
  The following problem can be solved in polynomial time:
  INPUT: SLPs A, B
  QUESTION: val(A) = val(B)?


  Note: The decompress-and-compare strategy does not work here.
  We cannot compute val(A) and val(B) in polynomial time.

  Plandowski’s algorithm uses combinatorics on words, in particular the
  Periodicity-Lemma of Fine and Wilf.


Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   5 / 12
Improvements of Plandowski’s result


  Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,
  Takeda (mid 90’s)
  The following problem can be solved in polynomial time
  (fully compressed pattern matching):
  INPUT: SLPs P, T
  QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   6 / 12
Improvements of Plandowski’s result


  Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,
  Takeda (mid 90’s)
  The following problem can be solved in polynomial time
  (fully compressed pattern matching):
  INPUT: SLPs P, T
  QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?


  The best known algorithm has a running time of O(|P| · |T|2 )
  (Lifshits 2006).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   6 / 12
Generalization: Compressed automata
  A compressed automata is an ordinary finite automaton, where every
  transition is labelled with an SLP.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   7 / 12
Generalization: Compressed automata
  A compressed automata is an ordinary finite automaton, where every
  transition is labelled with an SLP.

  Example: The compressed automaton
           Σ                Σ

                      q              A      q

  accepts all words that have val(A) as a factor.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   7 / 12
Generalization: Compressed automata
  A compressed automata is an ordinary finite automaton, where every
  transition is labelled with an SLP.

  Example: The compressed automaton
           Σ                Σ

                      q              A        q

  accepts all words that have val(A) as a factor.

  The size |A| of the compressed automaton A
  (with the set ∆ of transition triples) is

                                            |A| =                |A|.
                                                    (p,A,q)∈∆


Lohrey, Mathissen (University of Leipzig)     Compressed Membership     June 13, 2011   7 / 12
Compressed membership for compressed automata
  Compressed membership for compressed automata:
  INPUT: A compressed automaton A and an SLP B.
  QUESTION: val(B) ∈ L(A)?




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   8 / 12
Compressed membership for compressed automata
  Compressed membership for compressed automata:
  INPUT: A compressed automaton A and an SLP B.
  QUESTION: val(B) ∈ L(A)?


  Plandowski, Rytter 1999

     ◮    Compressed membership for compressed automata belongs to
          PSPACE.
     ◮    Compressed membership for compressed automata over a unary
          alphabet is NP-complete.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   8 / 12
Compressed membership for compressed automata
  Compressed membership for compressed automata:
  INPUT: A compressed automaton A and an SLP B.
  QUESTION: val(B) ∈ L(A)?


  Plandowski, Rytter 1999

     ◮    Compressed membership for compressed automata belongs to
          PSPACE.
     ◮    Compressed membership for compressed automata over a unary
          alphabet is NP-complete.


  Plandowski and Rytter conjectured that compressed membership for
  compressed automata is NP-complete (for every alphabet size).
Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   8 / 12
Some combinatorics on words

  The period of the word w ∈ Σ∗ is the smallest number p such that
  w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
  (period(w ) = |w | if such a p does not exist).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   9 / 12
Some combinatorics on words

  The period of the word w ∈ Σ∗ is the smallest number p such that
  w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
  (period(w ) = |w | if such a p does not exist).
                       |w |
  Let order(w ) = ⌊ period(w ) ⌋.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   9 / 12
Some combinatorics on words

  The period of the word w ∈ Σ∗ is the smallest number p such that
  w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
  (period(w ) = |w | if such a p does not exist).
                       |w |
  Let order(w ) = ⌊ period(w ) ⌋.

  Example: Let w = abbabbab = (abb)2 ab.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   9 / 12
Some combinatorics on words

  The period of the word w ∈ Σ∗ is the smallest number p such that
  w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
  (period(w ) = |w | if such a p does not exist).
                       |w |
  Let order(w ) = ⌊ period(w ) ⌋.

  Example: Let w = abbabbab = (abb)2 ab.
  Then period(w ) = 3 and order(w ) = 2.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   9 / 12
Some combinatorics on words

  The period of the word w ∈ Σ∗ is the smallest number p such that
  w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
  (period(w ) = |w | if such a p does not exist).
                       |w |
  Let order(w ) = ⌊ period(w ) ⌋.

  Example: Let w = abbabbab = (abb)2 ab.
  Then period(w ) = 3 and order(w ) = 2.

  For a compressed automaton A, let:

             order(A) = max{order(val(A)) | A occurs as a label in A}
           period(A) = max{period(val(A)) | A occurs as a label in A}



Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   9 / 12
First main result


  Theorem 1
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   10 / 12
First main result


  Theorem 1
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).


  We use the following combinatorial fact:


     Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most
     period(u) many occurrences of u in v that “touch” position p.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   10 / 12
Second main result


  Theorem 2
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
  and (iii) period(A).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   11 / 12
Second main result


  Theorem 2
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
  and (iii) period(A).

  Generalizes the result of Plandowski & Rytter for a unary alphabet
  (NP-completeness of compressed membership for compressed automata).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   11 / 12
Second main result


  Theorem 2
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
  and (iii) period(A).

  Generalizes the result of Plandowski & Rytter for a unary alphabet
  (NP-completeness of compressed membership for compressed automata).
  A word w = an has period 1!




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   11 / 12
Second main result


  Theorem 2
  For a compressed automaton A and an SLP B, we can check
  val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
  and (iii) period(A).

  Generalizes the result of Plandowski & Rytter for a unary alphabet
  (NP-completeness of compressed membership for compressed automata).
  A word w = an has period 1!
  The theorem is proven by a reduction to the case of a unary alphabet.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   11 / 12
Open problems and a conjecture
  Conjecture 1
  Compressed membership for compressed automata is NP-complete.




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   12 / 12
Open problems and a conjecture
  Conjecture 1
  Compressed membership for compressed automata is NP-complete.


  Conjecture 2
  If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there
  exists an accepting run of A on val(B) (viewed as a word over the set of
  transition triples of A), which can be generated by an SLP of size
  poly(|B|, |A|).




Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   12 / 12
Open problems and a conjecture
  Conjecture 1
  Compressed membership for compressed automata is NP-complete.


  Conjecture 2
  If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there
  exists an accepting run of A on val(B) (viewed as a word over the set of
  transition triples of A), which can be generated by an SLP of size
  poly(|B|, |A|).

  Conjecture 2 implies Conjecture 1:
     ◮    Guess nondeterministically an SLP C of polynomial size.
     ◮    Check (in deterministic polynomial time), whether val(C) is an
          accepting run of A on val(B).

Lohrey, Mathissen (University of Leipzig)   Compressed Membership   June 13, 2011   12 / 12

Contenu connexe

Plus de CSR2011

Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCSR2011
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCSR2011
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCSR2011
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCSR2011
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCSR2011
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCSR2011
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCSR2011
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCSR2011
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCSR2011
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCSR2011
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCSR2011
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCSR2011
 

Plus de CSR2011 (20)

Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudan
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avron
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilka
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remila
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanov
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanovCsr2011 june17 17_00_likhomanov
Csr2011 june17 17_00_likhomanov
 

Csr2011 june16 17_00_lohrey

  • 1. Compressed Membership in Automata with Compressed Labels Markus Lohrey Christian Mathissen University of Leipzig June 13, 2011 Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 1 / 12
  • 2. Motivation Try to develop algorithms that directly work on compressed data. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
  • 3. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
  • 4. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
  • 5. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformatics Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
  • 6. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformatics ◮ large (and highly compressible) strings often occur as intermediate data structures (e.g. in computational group theory, program analysis, verification). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
  • 7. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 8. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 9. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 10. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 11. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 12. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 13. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n . Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 14. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n . Thus, an SLP A can be seen as a compressed representation of val(A). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
  • 15. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 16. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 17. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 18. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 19. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 20. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)). ◮ From LZ77(w ) one can compute in polynomial time an SLP A with val(A) = w . Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
  • 21. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
  • 22. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
  • 23. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
  • 24. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time. Plandowski’s algorithm uses combinatorics on words, in particular the Periodicity-Lemma of Fine and Wilf. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
  • 25. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ? Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
  • 26. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ? The best known algorithm has a running time of O(|P| · |T|2 ) (Lifshits 2006). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
  • 27. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
  • 28. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
  • 29. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor. The size |A| of the compressed automaton A (with the set ∆ of transition triples) is |A| = |A|. (p,A,q)∈∆ Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
  • 30. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
  • 31. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
  • 32. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete. Plandowski and Rytter conjectured that compressed membership for compressed automata is NP-complete (for every alphabet size). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
  • 33. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
  • 34. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
  • 35. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
  • 36. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
  • 37. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2. For a compressed automaton A, let: order(A) = max{order(val(A)) | A occurs as a label in A} period(A) = max{period(val(A)) | A occurs as a label in A} Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
  • 38. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
  • 39. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A). We use the following combinatorial fact: Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most period(u) many occurrences of u in v that “touch” position p. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
  • 40. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
  • 41. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
  • 42. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1! Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
  • 43. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1! The theorem is proven by a reduction to the case of a unary alphabet. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
  • 44. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
  • 45. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
  • 46. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|). Conjecture 2 implies Conjecture 1: ◮ Guess nondeterministically an SLP C of polynomial size. ◮ Check (in deterministic polynomial time), whether val(C) is an accepting run of A on val(B). Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12