That is a fascinating result. A remarkable mathematical achievement. And a nightmare.
This thesis says that it's possible to obfuscate code in such a way that there is a lower bound on the level of effort needed to de-obfuscate it. That lower bound can apparently be comparable to the level of effort required to break a cryptosystem.
So, coming soon, viruses and worms nobody can figure out. Code where no one can tell if it has a backdoor. ML classifiers where no one can be sure what they really do.
What you are worried about is VBB (roughly can i take a program a and obfuscate it in a way that you get no information about the original from the obfuscated version), which has been proven impossible.
This is about indistinguishability, which is a form of obfuscation, but not the kind you are thinking about.
This is really something like "If you have two programs a and b that compute the same function, it's possible to obfuscate them in a way that the resulting programs are indistinguishable from each other"
So if program a is "bubble sort" and program b is "selection sort" (function here is the mathematical sense, so these compute the same function), obfuscation can make it so you can't tell them from each other, and you can't tell if the original program of what you are holding was bubble or selection .
Roughly: VBB is the ability to take any program and make it indistinguishable from random.
IO is the ability to take two programs that compute the same function and make them indistinguishable from each other.
"compute the same function" is actually what makes it feasible :)
The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.
So an example use case today would be "every DVD decoding program has a different key but computes the same function, can you make them all indinguishtable and thus hide the key" or something like that.
> The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.
Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!
In practice, as the thesis proves, you can use iO to generate a lot of things people care about in cryptography.
I will go out on a limb and say you can use it to build almost everything interesting :)
Now, it happens for a lot of those we don't need iO and iO may be impractical, but as a theoretical building block, it's quite nice.
As for your concern, yes, this is now possible, at least in the sense that you do as good as is possible to do in making them computationally indistinguishable. This does not prevent you from attacking it in other ways :)
Two things, good or bad depending on how you look at it:
1. You can prove iO is as least as good as the best possible obfuscation scheme that can ever exist. So whatever that enables you to do or not, it's the limit.
2. It also means you can get away with handing over less secrets, ensure better isolation, etc.
The following things are possible with iO (trivially so), due to this paper:
Adaptively secure succinct garbled RAM - which would let you hand secure databases to untrusted providers and not worry about it.
Sender deniable encryption where you can't prove what the original plaintext was and various options are equally likely.
Fully deniable interactive encryption where secrecy of communication is protected even if the parties are later coerced (or willingly bribed) to expose their entire internal states: plaintexts, keys and randomness.
etc
These are just some examples.
Now, some of this, as I said, we know how to do already, some we don't. But this paper gives you iO as a building block that can do them without having to separately prove that it is as sound and secure as existing crypto systems are.
Again, iO only provides computational indistinguishability, not other things, but it is a nice primitive.
> Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!
That future is already here for most people. See things like Secure Enclave or Intel's SGX.
We already have code that obfuscates itself. And runs differently based on whether it is running in a debugger or not. This has been the cat & mouse game played by:
virus writers vs anti-virus code.
game hackers vs anti-cheat software.
license checking code vs cheapskates.
Stuxnet.
> Code where no one can tell if it has a backdoor.
Back in 1984, Ken Thompson wrote Reflections on Trusting Trust [0]. The short answer is "no, you can never ever tell if it has a backdoor. Ever."
> The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode.
Back in the 90s, I was interested in hacking. The sort of hacking that starts with "this is the disassembler. Step one, hack the trial version of IDA Pro." It was fascinating at the time (and bores me now, my punishment for growing up, I guess) so I read & did lots of stuff like that.
You should require positive proof that code does what you want.
Similar to how eg Haskell's type systems doesn't have to solve the halting problem [0]: it just rejects some programs that would be ok, but don't conform to the type system.
First, for already closed source programs the obfuscation results don't make a difference at all. They remain just as obscure before as they are now. (Unless you routinely try to de-compile all closed source programs?)
The obfuscation result in the submitted article really only makes a difference to open source programs.
But: thanks to advances in mathematics and cryptography, a vendor can prove to you that their program does what it's supposed to, without having to reveal the source code or anything else about the program.
Among other things, tons of desktop software and most mobile apps and even javascript you run from most websites is obfuscated. How is it possible to avoid that? Do you run 100% open source for everything with javascript disabled?
I'm not sure. It seems to be applied to "circuits", which in this context seems to mean a one-way set of logic gates (such as AND, OR, NOT, NAND) which map a set of boolean inputs to a set of boolean outputs. In theory you can construct any finite digital function that way. It's a useful abstraction, like a Turing machine.
Unlike a Turing machine, a circuit is finite. All circuits are "solveable" (given a outputs, compute an input which yields them) by trying all the input patterns..
There's no undecidability and no halting problem. There's just difficulty.
"Difficult" here means there's no way easier than trying all the patterns.
This is the same property sound cryptosystems are supposed to have - there's no easier way than trying all the keys.
Whether this result can be extended to programs with iteration I'm not sure. The paper doesn't seem to mention iteration or storage.
A (polynomial-sized) circuit in this context is a sequence of DAGs whose nodes are binary logic gates where the nth DAG infers 1 output bit from n input bits, such that some polynomial bounds the size of the graphs of the sequence. See https://en.wikipedia.org/wiki/Circuit_complexity
The class of circuits in question is P/poly which includes BPP (which includes P, the class of polynomial-time programs). So the result is quite general, since most practical programs are in P.
For an introduction to complexity theory, see Arora, Barak (2009) (Draft available here https://theory.cs.princeton.edu/complexity/book.pdf ); same Barak that did the presentation "On the (Im)Possibility of Obfuscating Programs" (2001) mentioned in the abstract.
IMO: Paranoid about the implications, slightly less worried about it applying in practice.
As shown by the DRM schemes used in modern games, this type of obfuscation comes at the cost of performance: Unless you want to compute a sensitive function at the user's end in an obfuscated manner, it'd be much simpler to just run that function on your end and optimize it in terms of running costs & performance.
Such a design would also runs counter to the everything-as-a-service model that companies are trending towards, as it places more power back at the user's end, even if the user can't decipher the obfuscated function's inner mechanisms. Such a design would reduce the need to phone home, and thus the need for EaaS.
Would it be possible to use this to obfuscate other cryptographic primitives (e.g. a hash function or a simple program that checks whether the hash of some input equals a hardcoded value)?
The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they
Because figuring out what a worm does doesn't help if it has already done it, and for that matter they're supposed to remain undetected. Furthermore people already cannot tell whether code has backdoors, or what ML classifiers do for that matter.
Unfortunately, the biggest industrial use case is making wi-fi routers that are absolutely impossible to install OpenWRT on, etc...
> The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they
The fun open secret about working as a researcher for the NSA is that even after you leave the NSA, any and all research must be approved by them before publishing.
There are more efficient ways to hide malware from current scanners, and on the flip-side, the scanners that use heuristics/ML to classify the sequence of system calls made by the binary wouldn't be affected by this sort of obfuscation.
In theory? Absolutely; unrolling a loop is exactly what it sounds like. Recursion is just a fancy loop. But there's a reason no one does that, and why no compilers emit that as code.
It is not efficient enough that you need to worry about exciting developments using this scheme. But it’s a step towards more reasonable mathematical assumptions than previous constructions, assumptions that may actually be true. Give it a few more years and we might be able to use this stuff.
> I think Saberhagen's Berserkers actually had this feature.
Yes. Code decoded other code, etc. So if the code wasn't running normally, it couldn't be analyzed.
It was handwaving back then, but it may become real.
It was standard coding practices if you were selling games - to prevent people from hacking the license key stuff. Or by cheating software to prevent detection by anti-cheat software. One of the hooks built into Windows is "is a debugger connected" helps the virus/cheat code determine if it is "safe" to run or not.
You can be sure that it matches if the vendor (1) doesn't use an obfuscated binary, or (2) provides exact details of how the obfuscated binary was constructed (presumably including some kind of seed for the obfuscator).
This was the point behind Reflections on Trusting Trust - you can't trust it. Unless you build the entire compiler/linker tool chain from source that you've already inspected.
I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
Yet I think the utility of encryption is well-demonstrated, and not just theoretical. It can be used by good people to defend against adversarial intent. And it may well be that obfuscation (in its most direct application) has the same effect. (e.g. companies sharing proprietary algorithms for you to run at home, without revealing the secret sauce; or me delegating computation to AWS without revealing secrets)
Note that this is not why indistinguishability obfuscation (iO) is a crown jewel, here. Practically, iO is nowhere close to obfuscating anything larger than a tiny circuit. But it can still be useful to do things like obfuscate secret keys when designing cryptographic protocols. Theoretically, iO allows us to derive essentially every cryptographic primitive, which is why this paper is interesting, and why iO is called a crown jewel. And now, we can build iO for the first time from well-studied hardness assumptions.
> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
There is a huge difference: Encryption is used to hide something from a third party, while obfuscation is used to hide something from the intended recipient.
Or, if you'd like to argue that the intended recipient is the computer, not the user: To turn that computer into a deputy of the sender, while still formally belonging to the recipient.
> while obfuscation is used to hide something from the intended recipient.
No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs. Not the user (99% of the recipients) but the adversary is the intended target. The users are just cought in the crossfire.
Unless you consider every user of apps as someone who has both the skill and the need to reverse engineer your app.
> No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs.
So to stop the intended recipient from deciphering the program.
"Adversaries" are the strawman used to hurt legitimate customers who are left with black boxes for device drivers and no functioning hardware since the manufacturers make it impossible to open source firmware updates.
In case of DRM sure, but there are cases like online games where "adversaries" are players (intended users of the app) trying to use cheat programs on game binary to get an advantage over other players.
I share the sentiment that in most cases it just serves DRM vendors and is disservice to users but that's not all the cases.
> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
It's easy to argue that everyone has legitimate interest in hiding some things in data, but what would be a legitimate case for hiding things in code?
Except we know(and have example from cryptocurrency) that the problems with ownership in real life have little to do with cryptography.
The reason coca cola doesn't use Pepsi's recipe isnt because mixing the ingredients under heat creates an irreversible mixture that you can seperate to find the recipe. They don't use pepsis recipe because its illegal.
That's true for a recipe that was developed in a past millennium. Trade secrets are still very much relevant for companies that work with more modern technologies than soda from before the Great War, and a useful protection against competition. Even if you do believe that companies within the US or EU wouldn't touch other's IP with a 10 foot pole due to strong laws, you'd have to be very naive to think this is valid in all parts of the globe.
Edit: it's also trivially not true for trade secrets in the legal sense -- if you upload your trade secret willingly to AWS, it's no longer a trade secret, unless you have extra contracts in place. With obfuscation, you could skip a lot of red tape to keep it secret.
You can't copyright a recipe, the thing reason it doesn't make sense for Pepsi to make an exact duplicate of coke is that Coke is better at selling Coke that Pepsi is, and it's better for them to be a niche alternative some people prefer than to be an undifferentiated competititor. Reverse engineering trade secrets is perfectly legal if you haven't agreed otherwise.
You say the intent is different but don’t show why. What about obfuscation implies malicious intent? How come it doesn’t apply to encryption?
It can’t be about applications, because I can come up with a thousand applications for obfuscation with benign intent. Such as delegating private computation to AWS without Amazon learning my personal information.
If it's a total black box, wouldn't the NOBUS thing to do would be to have some large key that it's watching input for that flips it into a malicious mode?
If BB(6) took years to execute, how long would you have to spend feeding random input to a suspected-hostile 10000 symbol Turing machine (whose source code and state you can't examine) in a sandbox before you decided it was safe?
Wouldn't this also be a field day for encryption maximalists?
I'd imagine if all code were to be perfectly obscured then to the public it wouldn't matter, but many would demand for more open source and that'd be the only way to trust.
Autonomous, polymorphic malware utilizing secure computation to perform operations that aren't revealed to systems they infect.
Seems like there is an opportunity for bot nets to perform computations, say BTC mining, in a completely sealed and zero-knowledge environment, which now may not even need a backdoor or are able to hide it perfectly.
Seems some of those cases (backdoors and classifiers) could be prevented by regulation and/or social convention (eg- don’t use programs/services obfuscated in this way).
The malware seems trickier. Maybe systems will need to require proof of unobfuscated source to run code?
This thesis says that it's possible to obfuscate code in such a way that there is a lower bound on the level of effort needed to de-obfuscate it. That lower bound can apparently be comparable to the level of effort required to break a cryptosystem.
So, coming soon, viruses and worms nobody can figure out. Code where no one can tell if it has a backdoor. ML classifiers where no one can be sure what they really do.