Contenu connexe

A new technique for proving non regularity based on the measure of a language

  1. 言語の測度に基づく非正規性の証明技法 Ryoma Sin’ya Tokyo Institute of Technology, Department of Mathematics and Computer Science. A NEW TECHNIQUE FOR PROVING NON-REGULARITY BASED ON THE MEASURE OF A LANGUAGE
  2. 無限の猿定理 - Infinite Monkey Theorem - (a.k.a. Borge’s theorem) http://en.wikipedia.org/wiki/Infinite_monkey_theorem 2
  3. The main issue of the talk is the “inverse direction” of Infinite Monkey Theorem. In the case of regular languages, Infinite Monkey Theorem states a necessary and sufficient condition for the notion of “almost sureness”. 3
  4. 4 記法 - Notation - A : an alphabet (finite set of letters) An : the set of all words over A of length n A⇤ : the set of all words over A(A⇤ = [ n2N An ) A language over A is a subset of A⇤ .
  5. 5 union: L ∪ K; concatenation: LK = {vw | v ∈ L, w ∈ K}; Kleene star: L∗ = n∈N Ln = {ε} ∪ L ∪ LL ∪ LLL ∪ · · · . For languages L and K, we define the following three operations: The class of regular languages is the smallest class that includes all finite languages and closed under the above three operations.
  6. 6 言語の階層 - Language Hierarchy -
  7. 6 言語の階層 - Language Hierarchy - よわい?
  8. 6 言語の階層 - Language Hierarchy - よわい? きれい!
  9. 測度と零壱定理 - Measure & Zero-One Theorem - 7
  10. µn(L) = number of all words of length n in L number of all words of length n For a language L over A, its probability function is the fraction defined by: 8 = |L An | |An| .
  11. µn(L) = number of all words of length n in L number of all words of length n This is exactly the probability that a randomly chosen word of length n belongs to L. For a language L over A, its probability function is the fraction defined by: 8 = |L An | |An| .
  12. For a language L over A, its probability function is the fraction defined by: µn(L) = number of all words of length n in L number of all words of length n The measure of a language L is the limit of its probability function: µ(L) µ(L) = lim n!1 µn(L). 9 = |L An | |An| .
  13. Example 10 The full language is almost full, and the empty language is almost empty. That is, the set of all words A∗ over A satisfies µ(A∗ ) = 1, and its complement ∅ satisfies µ(∅) = 0.
  14. Example 10 Consider aA∗ the set of all words which start with the letter a in A. Then the following holds: µn(aA∗ ) = |aAn−1 | |An| = 1 |A| . Hence µ((aA)∗ ) = 1/|A| holds and aA∗ is not zero-one if |A| ≥ 2. The full language is almost full, and the empty language is almost empty. That is, the set of all words A∗ over A satisfies µ(A∗ ) = 1, and its complement ∅ satisfies µ(∅) = 0.
  15. Example 10 Consider aA∗ the set of all words which start with the letter a in A. Then the following holds: µn(aA∗ ) = |aAn−1 | |An| = 1 |A| . Hence µ((aA)∗ ) = 1/|A| holds and aA∗ is not zero-one if |A| ≥ 2. The full language is almost full, and the empty language is almost empty. That is, the set of all words A∗ over A satisfies µ(A∗ ) = 1, and its complement ∅ satisfies µ(∅) = 0. Consider (AA)∗ the set of all words with even length. Then: µn((AA)∗ ) = 1 if n is even, 0 if n is odd. Hence, its limit µ((AA)∗ ) does not exist.
  16. 禁句 - Forbidden Word - A word w is forbidden for a language L over A, if holds.A⇤ wA⇤ L = ; (A⇤ wA⇤ ✓ L) More intuitively, w is a forbidden word of L if and only if every words in L does not contain w as a factor.
  17. 無限の猿定理 - Infinite Monkey Theorem - (a.k.a. Borge’s theorem) http://en.wikipedia.org/wiki/Infinite_monkey_theorem 12
  18. 無限の猿定理 - Infinite Monkey Theorem - (a.k.a. Borge’s theorem) 13 Let L be a language over A. If L contains a language of the form , then L is almost full.A⇤ wA⇤ (i.e., A⇤ wA⇤ ✓ L ) µ(L) = 1) Infinite Monkey Theorem (formal statement)
  19. 零壱定理 - Zero-One Theore - Let L be a regular language. Then the following are equivalent: 1. L is almost empty (i.e., ) 2. L has a forbidden word. µ(L) = 0. Theorem [S. 2015]
  20. 零壱定理 - Zero-One Theore - Let L be a regular language. Then the following are equivalent: 1. L is almost empty (i.e., ) 2. L has a forbidden word. µ(L) = 0. Theorem [S. 2015] The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem.
  21. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word
  22. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word ) 9w 2 A⇤ (A⇤ wA⇤ L = ;)
  23. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word ) 9w 2 A⇤ (A⇤ wA⇤ L = ;) , 9w 2 A⇤ (A⇤ wA⇤ ✓ L)
  24. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word ) 9w 2 A⇤ (A⇤ wA⇤ L = ;) , 9w 2 A⇤ (A⇤ wA⇤ ✓ L) ) µ(L) = 1
  25. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word ) 9w 2 A⇤ (A⇤ wA⇤ L = ;) , 9w 2 A⇤ (A⇤ wA⇤ ✓ L) ) µ(L) = 1 ) µ(L) = 1 − µ(L) = 0
  26. 零壱定理 - Zero-One Theore - The implication (2) → (1) is nothing but the well-known Infinite Monkey Theorem. (2) L has a forbidden word ) 9w 2 A⇤ (A⇤ wA⇤ L = ;) , 9w 2 A⇤ (A⇤ wA⇤ ✓ L) ) µ(L) = 1 ) µ(L) = 1 − µ(L) = 0 , L is almost empty (1).
  27. 零壱定理 - Zero-One Theore - Let L be a regular language. Then the following are equivalent: 1. L is almost empty (i.e., ) 2. L has a forbidden word. µ(L) = 0. Theorem [S. 2015] The remarkable fact of this theorem is that its converse (1) → (2) is also true!
  28. 零壱定理 - Zero-One Theore - Let L be a regular language. Then the following are equivalent: 1. L is almost empty or almost full 2. L or its complement has a forbidden word. 3. The syntactic monoid of L has a zero element. 4. The minimal automaton of L is zero. 5. L is recognised by a quasi-zero automata. (µ(L) = 0) (µ(L) = 1). Theorem [S. 2015] (complete version)
  29. An Automata Theoretic Approach to the Zero-One Law for Regular Languages: Algorithmic and Logical Aspects Ryoma Sin’ya Tokyo Institute of Technology. shinya.r.aa@m.titech.ac.jp ´Ecole Nationale Sup´erieure des T´el´ecommunications. rshinya@enst.fr A zero-one language L is a regular language whose asymptotic probability converges to either zero or one. In this case, we say that L obeys the zero-one law. We prove that a regular language obeys the zero-one law if and only if its syntactic monoid has a zero element, by means of Eilenberg’s variety theoretic approach. Our proof gives an effective automata characterisation of the zero-one law for regular languages, and it leads to a linear time algorithm for testing whether a given regular language is zero-one. In addition, we discuss the logical aspects of the zero-one law for regular languages. For more details, see arxiv:1509.07209 18
  30. 非正規性の証明技法 - Technique for Proving Non-Regularity- Let L be a almost empty language over A. If L does not have a forbidden word, then L is not regular. Zero Lemma (corollary of Zero-One Theorem)
  31. 非正規性の証明技法 - Technique for Proving Non-Regularity- Let L be a almost empty language over A. If L does not have a forbidden word, then L is not regular. Zero Lemma (corollary of Zero-One Theorem) A new necessary condition of the regularity.
  32. 20 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular.
  33. 20 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd.
  34. 20 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd.
  35. 20 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd. 8w 2 A⇤ (wwr 2 P).
  36. 20 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd. 8w 2 A⇤ (wwr 2 P). (i.e., P does not have a forbidden word)
  37. 20 P is not regular by Zero Lemma! 回文 - Palindromes - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd. 8w 2 A⇤ (wwr 2 P). (i.e., P does not have a forbidden word)
  38. 21 Recall that the Dyck language D over A = {[, ]} is the set of all balanced square brackets: D = {ε, [], [[]], [][], [[[]]], [[][]], [[]][], [][[]], [][][], . . .}. µn(D) = Θ 1 n3/2 if n is even, 0 if n is odd. D is not regular by Zero Lemma! 括弧の対応 - Dyck Language - 8w 2 A⇤ 9n, m 2 N([n w]m 2 D) . (i.e., D does not have a forbidden word)
  39. 22 by Prime Number Theorem. is not regular by Zero Lemma! by Dirichlet's theorem : the set of all prime numbers. 素数 - Primes -
  40. Zero Lemma ~ • states a necessary condition for regular languages. • can be only applied to almost empty languages. • is useful, since the assumption “L is almost empty” is often intuitively clear.
  41. Zero Lemma ~ • states a necessary condition for regular languages. • can be only applied to almost empty languages. • is useful, since the assumption “L is almost empty” is often intuitively clear. However, even though “L is almost empty” is often intuitively clear, proving it requires extra work.
  42. Proving “L is almost empty” requires the asymptotic behaviour of the probability function of L. However, even though “L is almost empty” is often intuitively clear, proving it requires extra work.
  43. Motivation: Can we find a simple sufficient condition for the almost emptiness? Proving “L is almost empty” requires the asymptotic behaviour of the probability function of L. However, even though “L is almost empty” is often intuitively clear, proving it requires extra work.
  44. 零測度の十分条件 - Sufficient Condition for the Almost Emptiness- 25
  45. http://www.newscientist.com/article/dn10521-forest-growth-is-encouraging-say-researchers/ Dense 26
  46. http://www.evs.anl.gov/news/2014/03-31-mapping-ephemeral-streams.cfm Sparse 27
  47. Idea: If no element has a neighbour element, the set looks like sparse, e.g., is of measure zero. 28
  48. In order to formalise this idea, we have to introduce some distance between words! 29 Idea: If no element has a neighbour element, the set looks like sparse, e.g., is of measure zero.
  49. 30 Hamming距離 - Hamming Distance - Hamming distance is a distance between words of same length.
  50. 31 Hamming距離 - Hamming Distance - d(u, v) = |{i 2 [0, n − 1] | ui 6= vi}| where wi is the i-th later of w. The hamming distance between two words is the number of positions at which corresponding letters are different: u, v 2 An d(u, v)
  51. 32 Hamming距離 - Hamming Distance - d(u, v) = |{i 2 [0, n − 1] | ui 6= vi}| where wi is the i-th later of w. d(0001, 0000) = 1, d(1111, 0000) = 4, d(1101, 0110) = 3, d(1001, 1111) = 2.
  52. 33 Hamming距離 - Hamming Distance - d(u, v) = |{i 2 [0, n − 1] | ui 6= vi}| where wi is the i-th later of w.
  53. 34 Hamming距離 - Hamming Distance -
  54. 35 For a word , its distance-one neighbours is defined by: w 2 An B(w) B(w) = {u 2 An | d(w, u)  1}.
  55. 35 For a word , its distance-one neighbours is defined by: w 2 An B(w) B(w) = {u 2 An | d(w, u)  1}.
  56. 35 For a word , its distance-one neighbours is defined by: w 2 An B(w) B(000) = B(w) = {u 2 An | d(w, u)  1}.
  57. 35 For a word , its distance-one neighbours is defined by: w 2 An B(w) B(000) = B(w) = {u 2 An | d(w, u)  1}.
  58. 36 For a word , its distance-one neighbours is defined by: w 2 An B(w) Note: the size of satisfies:B(w) |B(w)| = n(|A| − 1) + 1 for every word w 2 An . B(w) = {u 2 An | d(w, u)  1}.
  59. 37 零測度の十分条件 - Sufficient Condition for the Almost Emptiness- Lemma 1 Let L be a language over A. If the number of distance-one neighbours that are in L is bounded by some constant for every sufficiently large word, then L is almost empty.
  60. 37 零測度の十分条件 - Sufficient Condition for the Almost Emptiness- Lemma 1 Let L be a language over A. If the number of distance-one neighbours that are in L is bounded by some constant for every sufficiently large word, then L is almost empty. Namely, if L satisfies the following condition, then L is almost empty: 9C, N 2 N 8n 2 N 8w 2 An n > N ) |B(w) L|  C .
  61. 38 回文・再 - Palindromes (Revised) - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd.
  62. 38 回文・再 - Palindromes (Revised) - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd.
  63. 38 回文・再 - Palindromes (Revised) - Recall that the set of all palindromes P over A is defined as follows: P = {w ∈ A∗ | w = wr }. Note that, if A is singleton (|A| = 1), then P = A∗ and hence P is regular. µn(P) = ⎧ ⎨ ⎩ |A|n/2 |A|n = 1 |A|n/2 if n is even, |A|×|A|(n−1)/2 |A|n = 1 |A|(n−1)/2 if n is odd. µ(P) = 0 since the number of distance-one neighbours that is in L is bounded by |A| for any words.
  64. 39 回文・再 - Palindromes (Revised) - madamimadam 2 P (Madam, I’m Adam) Example:
  65. 39 回文・再 - Palindromes (Revised) - madamimadam 2 P (Madam, I’m Adam) Example: badamimadam /2 P distance-one
  66. 39 回文・再 - Palindromes (Revised) - madamimadam 2 P (Madam, I’m Adam) Example: badamimadam /2 P distance-one madambmadam 2 P distance-one
  67. 39 回文・再 - Palindromes (Revised) - madamimadam 2 P (Madam, I’m Adam) Example: We can obtain another palindrome, only if we change the central letter “i ” to another letter. badamimadam /2 P distance-one madambmadam 2 P distance-one
  68. 39 回文・再 - Palindromes (Revised) - madamimadam 2 P (Madam, I’m Adam) Example: We can obtain another palindrome, only if we change the central letter “i ” to another letter. badamimadam /2 P distance-one madambmadam 2 P distance-one µ(P) = 0 since the number of distance-one neighbours that is in L is bounded by |A| for any words.
  69. The proof of Lemma 1 is not so difficult. It uses a result of Coding Theory (Cohen’s theorem), please see my paper for the details of Lemma 1. 40
  70. Theorem [Cohen et al. 1986] A language L over A is said to be a covering of if holds. An [ w2L B(w) = An We denote the minimal size of a covering of by An KA(n) = min |L| {|L| | L is a covering of An }. For any alphabet A, there exists some constant C such that: lim sup n!1 KA(n) ⇥ (n(|A| − 1) + 1) |An| < C.
  71. 課題 - Future Works - 42
  72. 43 Our Lemma 1 is: • a sufficient condition of the almost emptiness. • general. It can be applied to any language.
  73. 43 Our Lemma 1 is: • a sufficient condition of the almost emptiness. • general. It can be applied to any language. • but not strong enough, we can not prove the almost emptiness of Dyck language by Lemma 1. • We want to improve Lemma 1. Some conjectures are written in my paper.
  74. 予想 - Conjecture - 問題 1. 2 つ以上の文字を含むアルファベット A 上の言語を L とする.L から定められる 2 つの関 数 f, g : N → N をそれぞれ f(n) = max{|L ∩ BA(w, n)| | w ∈ L ∩ An }, g(n) = min{|L ∩ BA(w, n)| | w /∈ L ∩ An } で定義する. この時,次が成り立つか? (イ) f(n) が定数で上から抑えられ (f(n) ∈ O(1)),かつ g(n) が線形で下から抑えられる (g(n) ∈ Ω(n)) ならば L はほとんど空. (ロ) limn→∞ f(n)/g(n) = 0 ならば L はほとんど空. Dyck 言語は問題 1 の (イ) の具体例となっている.直感的には (ロ) は「多数派 (L の要素) の周り には多数派が多く,少数派 (L の要素) の周りには少数派が少ない)」という状況を表している.
  75. Tokyo Tech Official Mascot: “Tech-chan” (東工大公式マスコット:テックちゃん) Thank you♪ 45
  76. Tokyo Tech Official Mascot: “Tech-chan” (東工大公式マスコット:テックちゃん) Any questions or comments? 46