1. 暗号理論における安全性の証明: プログラム難読化( Obfuscation )の研究動向 Yury Lifshits & Boaz Barak の資料を一部借用しています Part of slides are copies from the web pages of Yury Lifshits & Boaz Barak http:// logic.pdmi.ras.ru/~yura/obfuscation.html http:// www.cs.princeton.edu/~boaz / 羽田 知史( IBM 東京基礎研究所) Satoshi Hada (IBM Research - Tokyo) mailto: satoshih at jp ibm com
20. * “TASTE” OF PROOF Pf: Show function family {P , } s.t. O totally fails ( code recovery + hard to learn ) on random member: Thm [BGI+01] : ∀ O ∃ P s.t. O totally fails on P . (assuming OWF exist) Define P , (b,x)= b=0 , x= b=1 , x(0, )= 0 otherwise Claim: ∀ O for random , w.h.p. O totally fails on P ,
21. * “TASTE” OF PROOF Claim: ∀ O for random , w.h.p. O totally fails on P , Thm [BGI+01] : ∀ O ∃ P s.t. O totally fails on P . (assuming OWF exist) Pf: Show function family {P , } s.t. O totally fails ( code recovery + hard to learn ) on random member: Define P , (b,x)= b=0 , x= b=1 , x(0, )= 0 otherwise
22. Pf: To recover , from P’= O (P , ) - output P’(1,P’) For random , can’t distinguish bet P , and all-zero function using BB access. Define P , (b,x)= b=0 , x= b=1 , x(0, )= 0 otherwise Claim: ∀ O for random , w.h.p. O totally fails on P , Note: In paper, rule out OBFs for programs with bounded input length. Black-box access is useless: Can recover source from obf’d code:
23.
24.
25.
26.
27.
28. Positive Result (1) : 実は、パスワード認証で広く用いられているテクニックであり、その安全性を、難読化の観点で議論することができます。 (user, password) Valid / Invalid Alice α Bob β ・・・ Password Table 難読化 Alice f(α) Bob f(β) ・・・
33. 最近の SaaS 環境では、署名鍵はサーバー上に保持され、 StE の処理がサーバー上で行われる必要があります。したがって、署名鍵の漏洩が懸念されます。 (e.g., Lotus Live, Gmail, Hotmail, etc) Sign-then- Encrypt@ Server Alice’s Web Mail Bob’s Web Mail Browser Browser Server Server Key leakage is a potential security issue!! Browsers have no capability of sign-then-encrypt
The main result of this part is that general purpose software obfuscators do *not* exist, even if we use a very weak definition of security. By “general purpose” I mean an obfuscator that takes any program and should output a Scrambled version of this program. This result is joint work with Goldreich, Impagliazzo, Rudich, Sahai, Vadhan and Yang.
Before proving such a result we need to formally define what does it mean for An obfuscator to be secure, or at least what it means for an obfuscator to be insecure. We use the following definition. Let O be a candidate obfuscator. That means that O is an algorithm that takes A program as input and outputs a different program. We say that O totally fails on some program P if P can be efficiently recovered from O(P). Note that this means that it is possible to invert the obfuscator and recover the entire Original source code from the supposedly obfuscated version. If this happens then the obfuscator definitely failed. However one may complain that this definition does not capture all possible ways in which an obfuscator can fail. Indeed, if P contained comments Then even the trivial obfuscator that just throws away the comments will not fail totally on P. However, because we Are shooting for a negative result, the stronger notion of failure we require, the better – it will just mean our impossibility result will be stronger. Condition # 2 is meant to avoid trivialities. For example, if P is a program that prints its own source code, then, No matter how you obfuscate P, it will always be possible to recover the source from the obfuscated version by simple executing it. Thus, we require that it will be impossible to recover the source code of P just by executing it. The main theorem of this part is that for every candidate obfuscator O there exists a program P on which O totally fails. I will now give a “taste” of the proof of this theorem.
To prove the theorem we will give a family of functions, parametrized by a pair of strings \\alpha,\\beta of length, say 1000, Such that for every O if we choose alpha and beta at random then O will totally fail on P_{\\alpha,\\beta}. For every \\alpha,\\beta we define the program P_{\\alpha,\\beta} as follows: It takes as input a bit b and a string x. If b=0 and x=\\alpha, then P_{\\alpha,\\beta} outputs \\beta. I will now turn to proving this claim. If b=1, then P_{\\alpha,\\beta} interprets the string x as a description of a program! It then runs this program On the input (0,\\alpha). If the output is equal to \\beta then P_{\\alpha,\\beta} outputs the pair (\\alpha,\\beta). In all other cases P_{\\alpha,\\beta} outputs 0. As I said above, I will claim that every obfuscator fails w.h.p. on a P_{\\alpha,\\beta} if alpha,\\beta are chosen at random.
I will first show the second condition (i.e., that a random P_{\\alpha,\\beta} is hard to learn). I claim that black box access to P_{\\alpha,\\beta} is completely useless, since there is only negligible probability that one can obtain a non-zero answer from that black-box. Indeed, suppose that someone gives you a black-box that computes P_{\\alpha,\\beta} but you don’t know \\alpha or \\beta. Then, if you feed the box a query of the form (0,x) you will get a zero answer unless you happened to guess \\alpha, which We can assume never happens, since it will only happen with low probability. Now, if you only get zero answers For queries of the form 0,x then you have no information about \\beta, and so you have no way of coming up with a program that outputs \\beta on input \\alpha. This means that you will also get zero answers on queries of the form (1,x). Thus with very high probability you will only get zero answers from the black box. I will now show the first condition. Namely, that it is possible to efficiently recover the source of P_[\\alpha,\\beta} From any P’ which is an obfuscated version of P_{\\alpha,\\beta}. First, note that once we know \\alpha and \\beta Then it is easy to recover the code of P_{\\alpha,\\beta}, and hence we need to show that we can recover \\alpha and \\beta from P’. However, this is quite easy – to get \\alpha and \\beta run P’ on input 1 and *its own code*. Since P’ is a program that on input 0,\\alpha outputs \\beta, it follows that P’(1,P’)=\\alpha,\\beta. This finishes the proof. I just want to comment that this argument used crucially the fact that we could feed P’ as input to itself. This is no longer true if we want to rule out obfuscators for programs with bounded input length. I don’t have time to even hint how we overcome this obstacle , but actually in some sense that part is the technical heart of the paper.
I want now to dicuss a bit the meaning of this impossibility result. What we prove is that there doesn’t exist a general purpose obfuscator that takes any program and scrambles it securely. However, a manufacturer of commercial obfuscator might claim that we still didn’t rule out the possibility that his obfuscator is “virtually general purpose”. By this I mean that it is secure for all the programs that come up in practice, Although it will of course fail for our somewhat contrived counterexample. Let’s visualize this hypothetical (or not so hypothetical, see slashdot) obfuscator manufacturer’s argument as follows: We have the set of “useful” programs that come up in practice (don’t worry if you don’t know all these acronyms). Then there is our counter example, which admittedly is not in this set. Then, the manufacturer claims that the set of programs for which his obfuscator is secure contains all the useful programs and hence his obfuscator is “virtually general purpose”. This claim is similar to criticism of NP-completeness results. When we prove that the TSP problem is NP-complete, we prove that it is hard in the worst case, and that no algorithm can solve it on the crazy instances that come from our reduction. However, it may very well be that there is a simple heuristic that solves TSP on all the instances that actually come up in practice. I want to comment that in fact there is a difference between the case of obfuscation and the case of TSP, and actually in our case our negative result means more than may seem at first sight.