cover_image

Obfuscating Compilers and Their Potential Applications

Kurt Pan XPTY
2021年01月04日 12:46

An obfuscating compiler is an algorithm that takes as input a program  and produces a functionally equivalent program  So far, this does not rule out the compiler that simply outputs its input unchanged, but we will want the program  to be "inscrutable" or "obfuscated." Defining this requirement formally takes some care, and as we will see, there is more than one way to do so.

Our guiding principle is that the output program  will not reveal any more information about the input  than is necessarily revealed by the fact the two programs are functionally equivalent. To take two extreme examples, if you wrote a program  that outputs your credit card number when given the number 0 as input, then no matter how obfuscated  is, it will still reveal the same information. In contrast, if the program  contained the credit number as a comment, with no effect on its functionality, then the obfuscated version  should not reveal it. Of course, we want an obfuscator compiler to do much more than strip all comments.

Specifically, we say a compiler transforming  to  is virtual black-box secure if for any attacker  that learns some piece of information  from  could have been learned by simply treating  (or ) as a black box and querying it on various inputs and observing the outputs. More formally, the definition requires that for every (polynomial-time) algorithm  there is another polynomial time algorithm  (known as a simulator in crypto parlance) such that the random variables  and  are computationally indistinguishable, where denotes the output of  given the code of the obfuscated program  while  denotes the output of  given black-box (i.e., input/ output) access to  (or to the functionally equivalent  ).

Let us imagine that we had a practically efficient obfuscating compiler that satisfied this notion of virtual black-box security. What would we use it for? The most obvious application is software protection- publishing an obfuscated program  would be the equivalent of providing a physical "black box" that can be executed but not opened up and understood. But there are many other applications. For example, we could use an obfuscator to design a "selective decryption scheme"( The technical name for this notion is functional encryption.) - a program that would contain inside it a decryption key  but would only decrypt messages satisfying very particular criteria. For example, suppose that all my email was encrypted, and I had an urgent project that needed attention while I was on vacation. I could write a program  that given an encrypted email message as input, uses my secret decryption key to decrypt it, checks if it is related to this project and if so outputs the plaintext message. Then, I could give my colleague an obfuscated version P′ of P, without fearing she could reverse engineer the program, learn my secret key and manage to decrypt my other email messages as well.

def DecryptEmail(EncryptedMsg):  
 SecretKey = "58ff29d6ad1c33a00d0574fe67e53998" 
 m = Decrypt(EncryptedMsg,SecretKey)  
 if m.find("Congratulations!")>=0return m 
 return "Sorry, this email is out of your access right."

There are many more applications for obfuscation. The example of functional encryption could be vastly generalized. In fact, almost any cryptographic primitive you can think of can be fairly directly derived from obfuscation, starting from basic primitives such as publickey encryption and digital signatures to fancier notions such as multiparty secure computation, fully homomorphic encryption, zero knowledge proofs, and their many variants. There are also applications to obfuscation that a priori seem to have nothing to do with cryptography; for example, one can use it to design autonomous agents that would participate on your behalf in digital transactions such as electronic auctions, or to publish patches for software vulnerabilities without worrying that attackers could learn the vulnerabilities by reverse engineering the patch.

So, virtual black-box obfuscation is wonderful, but does it exist? This is what [BGI+] set to find out in 2001, the answer was negative. Specifically, it showed the existence of inherently unobfuscatable functions—this is a program P whose source code can be recovered from any functionally equivalent program P′ though curiously it cannot be efficiently recovered using only black-box access to P.

In the intervening years, cryptography has seen many advances, in particular achieving constructions of some of the cryptographic primitives that were envisioned as potential applications of obfuscation, most notably fully homomorphic encryption . In particular, in 2012 Garg, Gentry and Halevi put forward a candidate construction for an object they called "cryptographic multilinear maps", and which in this article I will somewhat loosely refer to as a "homomorphic quasi encryption" scheme. Using this object, Garg et al. showed a candidate construction of a general-purpose indistinguishablity obfuscators.