by @chokermaxx
https://hackmd.io/@chokermaxx/SkbGw4fk0
While it is easy to understand why Groth16 satisfies completeness, understanding why it satisfies knowledge soundness is relatively not straightforward. In this article, we take a deep dive into the proof that Groth16 satisfies knowledge soundness.
Separation of concerns is useful to break down a big complex problem into a bunch of small problems, each of which is easy to understand. Groth16 is a proof system that uses elliptic curve cryptography, but surprisingly, to understand the meat of its knowledge soundness proof, you can put away the detail of elliptic curve operations. This is due to an abstraction called NILP, which Groth et al uses to argue for Groth16's knowledge soundness. This separation of concerns lets us use less brain to get the proof.
NILP is a type of non-interactive proof, where a proof can computed in this formula:
Each of those letters is a matrix. is a proof. We'll look at the definition of later. is a common reference string. Common reference string, or CRS, is an academic jargon for a list of values that we generate during a trusted setup ceremony.
If you recall the definiton of Groth16, the proof consists of 3 values . and the trusted setup ceremony generates these values.
Which means we can rewrite the equation in this way:
If we expand and assign a letter to each element, (to make it easy for me to write an explainer) the equation becomes:
Even a malicious prover of Groth16 has to follow this formula to calculate . Why? because each elements of the CRS of Groth16 is hidden under elliptic curve points. Notice that , not .

NILP conviniently captures this property of Groth16 -- The impossibility to multiply a value generated during trusted setup, by another value generated during trusted setup. We later use this fact to prove that the prover can't generate valid without knowing the valid witnesses.
In Groth16, a proof is valid if and only if it satisfies this equation.

What we wanna show is that, except a negligible probability, the prover can't come up with that satisfies the above equation unless the prover knows that makes divide
To see this, we have to check the terms of the lhs one by one. There are many terms that has to be constant. We'll see what those constants are.First, we check the term involving . This term arises because both and contains , and we multiply the two. When we go back to the NILP matrix equation, we can see that must be multiplied by both and . When we see the Groth16 verification equation, the rhs contains no term involving . This proves that, if the verification equation holds, then and/or must be 0.
But when we look at the rhs again, it has a term . Thus, there's no such case that both and is 0, because in that case does not contain . The equation will not satisfy.
We can confidently say that either one of or is 0.
Let's think about the term involving . This term arises because both and contains and , and we multiply the two. When we look at the NILP matrix equation, we can see that the term involving must look like . When we look at the rhs of the Groth16 verification equation, the rhs contains . This proves that, if the Groth16 verification equation holds, . But by the assumption , so we can say that . This means and .
Next up, we look at the term involving . The lhs in the Groth16 verification equation contains a term involving , because both and contains a term involving . The term involving must look like . This must equal 0 because the rhs does not contain any term involving . Considering we have already shown that , .
Let's think about the term involving . This term arises because both and contains and , and we multiply the two. When we look at the NILP matrix equation, we can see that the term involving must look like . When we look at the rhs of the Groth16 verification equation, the rhs contains . This proves that, if the Groth16 verification equation holds, . But by the assumption , so we can say that . This means and .
Next up, we look at the term involving . The lhs in the Groth16 verification equation contains a term involving , because both and contains a term involving . The term involving must look like . This must equal 0 because the rhs does not contain any term involving . Considering we have already shown that , .
We have shown that
This shows that cannot contain both and . has to contain either or . also cannot contain both and . has to contain either or that is not in .Considering both and are randomly sampled from the same distibution, without a loss of generality, I will call the random challenge in , and the random challenge in from now on.
The lhs contains terms
We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.
I first thought, doesn't the multiplication below also produce terms involving ?
But surprisingly this is not the case. I'll show the reasoning in section of .
The lhs contains terms
We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.
I first thought, doesn't the multiplication below also produce terms involving ?
But surprisingly this is not the case. I'll show the reasoning in section of .
The term involving in the lhs looks like this.
We know this equals 0 because the rhs does not have any term involving . This shows that either the left factor or the right factor is 0. But surprisingly, from here we can show that both the left factor and the right factor has to be 0.If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.
This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the left factor being 0 implies that the right factor is 0.If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.
This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the right factor being 0 implies that the left factor is 0.The lhs contains terms
We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.
I first thought, doesn't the multiplication below also produce terms involving ?
But surprisingly this is not the case. I'll show the reasoning in section of .
The lhs contains terms
We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.
I first thought, doesn't the multiplication below also produce terms involving ?
But surprisingly this is not the case. I'll show the reasoning in section of .
The term involving in the lhs looks like this.
We know this equals 0 because the rhs does not have any term involving . This shows that either the left factor or the right factor is 0. But surprisingly, from here we can show that both the left factor and the right factor has to be 0.If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.
This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the left factor being 0 implies that the right factor is 0.If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.
This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the right factor being 0 implies that the left factor is 0.Removing all the terms we have shown to be 0, we can say
By transposing the rhs, we get
This equation determines how the prover must generate C. To see this, we have to see the lhs of the above equation as a polynomial of . By Schwartz-Zippel lemma, we can say that the below 3 holds except negligible proability.
The last term arises from . It must be multiplied by by definition, and since the Groth16 verification equation multiplies by , it also must be multiplied by to cancel it out. Out of the CRS, only the below is multiplied by both and . Thus we know that the above equation must hold.
The last term arises from . It must be multiplied by by definition, and since the Groth16 verification equation multiplies by , it also must be multiplied by to cancel it out. Out of the CRS, only the below is multiplied by both and . Thus we know that the above equation must hold.
The last term arises from . Out of the CRS, only the below is multiplied by , and multiplied by neither nor . Thus we know that the above equation must hold.
We have shown that the below 3 equations hold.

determines and . The product of and determines . The witnesses within the rhs constrains the lhs. In turn the product in the lhs constrains the rhs back.
To make the 3 equations hold at the same time, we have to assign in a way that the below holds
This is exactly how the honest prover behaves!

Now we can see why even a malicious prover has to follow the proof generation formula stated in the original Groth16 paper!
Now we know that, what the lhs of the Groth16 verification equation is doing, is to multiply the left wire of each multiplication gate and the right wire of each multiplication gate, just like in the QAP verification equation!
All we have left to do is to include the output of each multiplication gate in the rhs , and which shows .
Thus, we can say, unless the prover knows valid that satisfies QAP, Groth16 verification does not pass.
a prover can't come up with a valid proof unless a valid witness exist in this universe
a prover can't come up with a valid proof unless he knows a valid witness. (Assumption here is that the prover does not know any other proof)
a prover can't come up with a valid proof unless he knows a valid witness. (Even if he has seen other valid proofs)