That would require the tool to prove the equivalence of the two programs, which is generally undecidable. Maybe this could be weakened to preserving some properties of the program.
No, it would not. It would require the LLM to provide a proof for the program that it outputs, which seems reasonable in the same way that a human decompiling a program would be able to provide a record of his/her reasoning.
The formal verifier would then merely check the provided proof, which is a simple mechanical process.
This is analogous to a mathematician providing a detailed proof and a computer checking it.
What is impossible due to undecidability is for two arbitrary programs, to either prove or disprove their equivalence. However, the two programs we are talking about are highly correlated, and thus not arbitrary at all with respect to each other. If an LLM is able to provide a correct decompilation, then in principle it should also be able to provide a proof of the correctness of that decompilation.
Yes, and then someone needs to check that proof. It’s not particularly clear if that decompilation proof would be any more helpful than just doing the lifting by hand.
That doesn’t mean that it’s impossible, right? Just that no tool is guaranteed to give an answer in any case. And those cases might be 90%, 10% or it-doesn’t-matter-in-practice %