Obfuscating Compilers and Their Potential Applications

Kurt Pan XPTY

2021年01月04日 12:46

An obfuscating compiler is an algorithm that takes as input a program and produces a functionally equivalent program So far, this does not rule out the compiler that simply outputs its input unchanged, but we will want the program to be "inscrutable" or "obfuscated." Defining this requirement formally takes some care, and as we will see, there is more than one way to do so.

Our guiding principle is that the output program will not reveal any more information about the input than is necessarily revealed by the fact the two programs are functionally equivalent. To take two extreme examples, if you wrote a program that outputs your credit card number when given the number 0 as input, then no matter how obfuscated is, it will still reveal the same information. In contrast, if the program contained the credit number as a comment, with no effect on its functionality, then the obfuscated version should not reveal it. Of course, we want an obfuscator compiler to do much more than strip all comments.

Specifically, we say a compiler transforming to is virtual black-box secure if for any attacker that learns some piece of information from could have been learned by simply treating (or ) as a black box and querying it on various inputs and observing the outputs. More formally, the definition requires that for every (polynomial-time) algorithm there is another polynomial time algorithm (known as a simulator in crypto parlance) such that the random variables and are computationally indistinguishable, where denotes the output of given the code of the obfuscated program while denotes the output of given black-box (i.e., input/ output) access to (or to the functionally equivalent ).

Let us imagine that we had a practically efficient obfuscating compiler that satisfied this notion of virtual black-box security. What would we use it for? The most obvious application is software protection- publishing an obfuscated program would be the equivalent of providing a physical "black box" that can be executed but not opened up and understood. But there are many other applications. For example, we could use an obfuscator to design a "selective decryption scheme"( The technical name for this notion is functional encryption.) - a program that would contain inside it a decryption key but would only decrypt messages satisfying very particular criteria. For example, suppose that all my email was encrypted, and I had an urgent project that needed attention while I was on vacation. I could write a program that given an encrypted email message as input, uses my secret decryption key to decrypt it, checks if it is related to this project and if so outputs the plaintext message. Then, I could give my colleague an obfuscated version P′ of P, without fearing she could reverse engineer the program, learn my secret key and manage to decrypt my other email messages as well.

def DecryptEmail(EncryptedMsg):  
 SecretKey = "58ff29d6ad1c33a00d0574fe67e53998" 
 m = Decrypt(EncryptedMsg,SecretKey)  
 if m.find("Congratulations!")>=0: return m 
 return "Sorry, this email is out of your access right."

There are many more applications for obfuscation. The example of functional encryption could be vastly generalized. In fact, almost any cryptographic primitive you can think of can be fairly directly derived from obfuscation, starting from basic primitives such as publickey encryption and digital signatures to fancier notions such as multiparty secure computation, fully homomorphic encryption, zero knowledge proofs, and their many variants. There are also applications to obfuscation that a priori seem to have nothing to do with cryptography; for example, one can use it to design autonomous agents that would participate on your behalf in digital transactions such as electronic auctions, or to publish patches for software vulnerabilities without worrying that attackers could learn the vulnerabilities by reverse engineering the patch.

So, virtual black-box obfuscation is wonderful, but does it exist? This is what [BGI+] set to find out in 2001, the answer was negative. Specifically, it showed the existence of inherently unobfuscatable functions—this is a program P whose source code can be recovered from any functionally equivalent program P′ though curiously it cannot be efficiently recovered using only black-box access to P.

In the intervening years, cryptography has seen many advances, in particular achieving constructions of some of the cryptographic primitives that were envisioned as potential applications of obfuscation, most notably fully homomorphic encryption . In particular, in 2012 Garg, Gentry and Halevi put forward a candidate construction for an object they called "cryptographic multilinear maps", and which in this article I will somewhat loosely refer to as a "homomorphic quasi encryption" scheme. Using this object, Garg et al. showed a candidate construction of a general-purpose indistinguishablity obfuscators.