1. Compressed Membership in Automata with
Compressed Labels
Markus Lohrey
Christian Mathissen
University of Leipzig
June 13, 2011
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 1 / 12
2. Motivation
Try to develop algorithms that directly work on compressed data.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
3. Motivation
Try to develop algorithms that directly work on compressed data.
Goal: Beat straightforward decompress and manipulate strategy.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
4. Motivation
Try to develop algorithms that directly work on compressed data.
Goal: Beat straightforward decompress and manipulate strategy.
In this talk: focus on compressed strings
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
5. Motivation
Try to develop algorithms that directly work on compressed data.
Goal: Beat straightforward decompress and manipulate strategy.
In this talk: focus on compressed strings
Applications:
◮ all domains, where massive string data arise and have to be
processed, e.g. bioinformatics
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
6. Motivation
Try to develop algorithms that directly work on compressed data.
Goal: Beat straightforward decompress and manipulate strategy.
In this talk: focus on compressed strings
Applications:
◮ all domains, where massive string data arise and have to be
processed, e.g. bioinformatics
◮ large (and highly compressible) strings often occur as intermediate
data structures (e.g. in computational group theory, program analysis,
verification).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
7. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
8. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
9. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
10. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Alternatively: a context-free grammar that generates exactly one string.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
11. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Alternatively: a context-free grammar that generates exactly one string.
We write val(A) for the unique word generated by A.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
12. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Alternatively: a context-free grammar that generates exactly one string.
We write val(A) for the unique word generated by A.
The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
13. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Alternatively: a context-free grammar that generates exactly one string.
We write val(A) for the unique word generated by A.
The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.
One may have |val(A)| = 2n .
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
14. Straight-line programs
Dictionary-based compression algorithms (LZ77, LZ78) exploit text
repetition.
Straight-line programs are a general representation for compressed strings,
which covers most dictionary-based algorithms.
A straight-line program (SLP) over the alphabet Γ is a sequence of
definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for
some j, k < i .
Alternatively: a context-free grammar that generates exactly one string.
We write val(A) for the unique word generated by A.
The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.
One may have |val(A)| = 2n .
Thus, an SLP A can be seen as a compressed representation of val(A).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
15. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
16. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Then val(A) = abaababaabaab:
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
17. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Then val(A) = abaababaabaab:
A3 = A2 A1 = ab
A4 = A3 A2 = aba
A5 = A4 A3 = abaab
A6 = A5 A4 = abaababa
A7 = A6 A5 = abaababaabaab
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
18. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Then val(A) = abaababaabaab:
A3 = A2 A1 = ab
A4 = A3 A2 = aba
A5 = A4 A3 = abaab
A6 = A5 A4 = abaababa
A7 = A6 A5 = abaababaabaab
Relationship to dictionary-based compression (Rytter):
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
19. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Then val(A) = abaababaabaab:
A3 = A2 A1 = ab
A4 = A3 A2 = aba
A5 = A4 A3 = abaab
A6 = A5 A4 = abaababa
A7 = A6 A5 = abaababaabaab
Relationship to dictionary-based compression (Rytter):
◮ From an SLP A one can compute in polynomial time LZ77(val(A)).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
20. Straight-line programs
Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).
Then val(A) = abaababaabaab:
A3 = A2 A1 = ab
A4 = A3 A2 = aba
A5 = A4 A3 = abaab
A6 = A5 A4 = abaababa
A7 = A6 A5 = abaababaabaab
Relationship to dictionary-based compression (Rytter):
◮ From an SLP A one can compute in polynomial time LZ77(val(A)).
◮ From LZ77(w ) one can compute in polynomial time an SLP A with
val(A) = w .
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
21. Algorithms on SLP-compressed strings
The mother of all algorithms on SLP -compressed strings:
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
22. Algorithms on SLP-compressed strings
The mother of all algorithms on SLP -compressed strings:
Plandowski 1994
The following problem can be solved in polynomial time:
INPUT: SLPs A, B
QUESTION: val(A) = val(B)?
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
23. Algorithms on SLP-compressed strings
The mother of all algorithms on SLP -compressed strings:
Plandowski 1994
The following problem can be solved in polynomial time:
INPUT: SLPs A, B
QUESTION: val(A) = val(B)?
Note: The decompress-and-compare strategy does not work here.
We cannot compute val(A) and val(B) in polynomial time.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
24. Algorithms on SLP-compressed strings
The mother of all algorithms on SLP -compressed strings:
Plandowski 1994
The following problem can be solved in polynomial time:
INPUT: SLPs A, B
QUESTION: val(A) = val(B)?
Note: The decompress-and-compare strategy does not work here.
We cannot compute val(A) and val(B) in polynomial time.
Plandowski’s algorithm uses combinatorics on words, in particular the
Periodicity-Lemma of Fine and Wilf.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
25. Improvements of Plandowski’s result
Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,
Takeda (mid 90’s)
The following problem can be solved in polynomial time
(fully compressed pattern matching):
INPUT: SLPs P, T
QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
26. Improvements of Plandowski’s result
Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,
Takeda (mid 90’s)
The following problem can be solved in polynomial time
(fully compressed pattern matching):
INPUT: SLPs P, T
QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?
The best known algorithm has a running time of O(|P| · |T|2 )
(Lifshits 2006).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
27. Generalization: Compressed automata
A compressed automata is an ordinary finite automaton, where every
transition is labelled with an SLP.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
28. Generalization: Compressed automata
A compressed automata is an ordinary finite automaton, where every
transition is labelled with an SLP.
Example: The compressed automaton
Σ Σ
q A q
accepts all words that have val(A) as a factor.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
29. Generalization: Compressed automata
A compressed automata is an ordinary finite automaton, where every
transition is labelled with an SLP.
Example: The compressed automaton
Σ Σ
q A q
accepts all words that have val(A) as a factor.
The size |A| of the compressed automaton A
(with the set ∆ of transition triples) is
|A| = |A|.
(p,A,q)∈∆
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
30. Compressed membership for compressed automata
Compressed membership for compressed automata:
INPUT: A compressed automaton A and an SLP B.
QUESTION: val(B) ∈ L(A)?
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
31. Compressed membership for compressed automata
Compressed membership for compressed automata:
INPUT: A compressed automaton A and an SLP B.
QUESTION: val(B) ∈ L(A)?
Plandowski, Rytter 1999
◮ Compressed membership for compressed automata belongs to
PSPACE.
◮ Compressed membership for compressed automata over a unary
alphabet is NP-complete.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
32. Compressed membership for compressed automata
Compressed membership for compressed automata:
INPUT: A compressed automaton A and an SLP B.
QUESTION: val(B) ∈ L(A)?
Plandowski, Rytter 1999
◮ Compressed membership for compressed automata belongs to
PSPACE.
◮ Compressed membership for compressed automata over a unary
alphabet is NP-complete.
Plandowski and Rytter conjectured that compressed membership for
compressed automata is NP-complete (for every alphabet size).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
33. Some combinatorics on words
The period of the word w ∈ Σ∗ is the smallest number p such that
w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
(period(w ) = |w | if such a p does not exist).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
34. Some combinatorics on words
The period of the word w ∈ Σ∗ is the smallest number p such that
w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
(period(w ) = |w | if such a p does not exist).
|w |
Let order(w ) = ⌊ period(w ) ⌋.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
35. Some combinatorics on words
The period of the word w ∈ Σ∗ is the smallest number p such that
w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
(period(w ) = |w | if such a p does not exist).
|w |
Let order(w ) = ⌊ period(w ) ⌋.
Example: Let w = abbabbab = (abb)2 ab.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
36. Some combinatorics on words
The period of the word w ∈ Σ∗ is the smallest number p such that
w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
(period(w ) = |w | if such a p does not exist).
|w |
Let order(w ) = ⌊ period(w ) ⌋.
Example: Let w = abbabbab = (abb)2 ab.
Then period(w ) = 3 and order(w ) = 2.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
37. Some combinatorics on words
The period of the word w ∈ Σ∗ is the smallest number p such that
w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p
(period(w ) = |w | if such a p does not exist).
|w |
Let order(w ) = ⌊ period(w ) ⌋.
Example: Let w = abbabbab = (abb)2 ab.
Then period(w ) = 3 and order(w ) = 2.
For a compressed automaton A, let:
order(A) = max{order(val(A)) | A occurs as a label in A}
period(A) = max{period(val(A)) | A occurs as a label in A}
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
38. First main result
Theorem 1
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
39. First main result
Theorem 1
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).
We use the following combinatorial fact:
Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most
period(u) many occurrences of u in v that “touch” position p.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
40. Second main result
Theorem 2
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
and (iii) period(A).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
41. Second main result
Theorem 2
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
and (iii) period(A).
Generalizes the result of Plandowski & Rytter for a unary alphabet
(NP-completeness of compressed membership for compressed automata).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
42. Second main result
Theorem 2
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
and (iii) period(A).
Generalizes the result of Plandowski & Rytter for a unary alphabet
(NP-completeness of compressed membership for compressed automata).
A word w = an has period 1!
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
43. Second main result
Theorem 2
For a compressed automaton A and an SLP B, we can check
val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,
and (iii) period(A).
Generalizes the result of Plandowski & Rytter for a unary alphabet
(NP-completeness of compressed membership for compressed automata).
A word w = an has period 1!
The theorem is proven by a reduction to the case of a unary alphabet.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
44. Open problems and a conjecture
Conjecture 1
Compressed membership for compressed automata is NP-complete.
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
45. Open problems and a conjecture
Conjecture 1
Compressed membership for compressed automata is NP-complete.
Conjecture 2
If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there
exists an accepting run of A on val(B) (viewed as a word over the set of
transition triples of A), which can be generated by an SLP of size
poly(|B|, |A|).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
46. Open problems and a conjecture
Conjecture 1
Compressed membership for compressed automata is NP-complete.
Conjecture 2
If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there
exists an accepting run of A on val(B) (viewed as a word over the set of
transition triples of A), which can be generated by an SLP of size
poly(|B|, |A|).
Conjecture 2 implies Conjecture 1:
◮ Guess nondeterministically an SLP C of polynomial size.
◮ Check (in deterministic polynomial time), whether val(C) is an
accepting run of A on val(B).
Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12