Groth16 satisfies Knowledge Soundness

Kurt Pan XPTY

2024年05月09日 11:52

by @chokermaxx
https://hackmd.io/@chokermaxx/SkbGw4fk0

While it is easy to understand why Groth16 satisfies completeness, understanding why it satisfies knowledge soundness is relatively not straightforward. In this article, we take a deep dive into the proof that Groth16 satisfies knowledge soundness.

Citation

The original Groth16 paper
A paper that proves that Groth16 satisfies weak simulation extractability In this paper the author proves the same knowledge soundness as the original paper with a different method.

NILP: Non-interactive linear proof

Separation of concerns is useful to break down a big complex problem into a bunch of small problems, each of which is easy to understand. Groth16 is a proof system that uses elliptic curve cryptography, but surprisingly, to understand the meat of its knowledge soundness proof, you can put away the detail of elliptic curve operations. This is due to an abstraction called NILP, which Groth et al uses to argue for Groth16's knowledge soundness. This separation of concerns lets us use less brain to get the proof.

NILP is a type of non-interactive proof, where a proof can computed in this formula:

Each of those letters is a matrix. is a proof. We'll look at the definition of later. is a common reference string. Common reference string, or CRS, is an academic jargon for a list of values that we generate during a trusted setup ceremony.

If you recall the definiton of Groth16, the proof consists of 3 values . and the trusted setup ceremony generates these values.

Which means we can rewrite the equation in this way:

If we expand and assign a letter to each element, (to make it easy for me to write an explainer) the equation becomes:

Even a malicious prover of Groth16 has to follow this formula to calculate . Why? because each elements of the CRS of Groth16 is hidden under elliptic curve points. Notice that , not .

In an elliptic curve group we can't multiply a point and a point -- because it's not a field, no multiplication operation is defined. Whatever the prover submits, the verifier can be sure that none of those contains terms like , because the prover can't multiply and .

NILP conviniently captures this property of Groth16 -- The impossibility to multiply a value generated during trusted setup, by another value generated during trusted setup. We later use this fact to prove that the prover can't generate valid without knowing the valid witnesses.

A, B

In Groth16, a proof is valid if and only if it satisfies this equation.

What we wanna show is that, except a negligible probability, the prover can't come up with that satisfies the above equation unless the prover knows that makes divide

To see this, we have to check the terms of the lhs one by one. There are many terms that has to be constant. We'll see what those constants are.

and

First, we check the term involving . This term arises because both and contains , and we multiply the two. When we go back to the NILP matrix equation, we can see that must be multiplied by both and . When we see the Groth16 verification equation, the rhs contains no term involving . This proves that, if the verification equation holds, then and/or must be 0.

But when we look at the rhs again, it has a term . Thus, there's no such case that both and is 0, because in that case does not contain . The equation will not satisfy.

We can confidently say that either one of or is 0.

Assuming

Let's think about the term involving . This term arises because both and contains and , and we multiply the two. When we look at the NILP matrix equation, we can see that the term involving must look like . When we look at the rhs of the Groth16 verification equation, the rhs contains . This proves that, if the Groth16 verification equation holds, . But by the assumption , so we can say that . This means and .

Next up, we look at the term involving . The lhs in the Groth16 verification equation contains a term involving , because both and contains a term involving . The term involving must look like . This must equal 0 because the rhs does not contain any term involving . Considering we have already shown that , .

Assuming

Summing it up

We have shown that

This shows that cannot contain both and . has to contain either or . also cannot contain both and . has to contain either or that is not in .

Considering both and are randomly sampled from the same distibution, without a loss of generality, I will call the random challenge in , and the random challenge in from now on.

The lhs contains terms

We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.

I first thought, doesn't the multiplication below also produce terms involving ?

But surprisingly this is not the case. I'll show the reasoning in section of .

The lhs contains terms

We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.

I first thought, doesn't the multiplication below also produce terms involving ?

But surprisingly this is not the case. I'll show the reasoning in section of .

The term involving in the lhs looks like this.

We know this equals 0 because the rhs does not have any term involving . This shows that either the left factor or the right factor is 0. But surprisingly, from here we can show that both the left factor and the right factor has to be 0.

Assuming that the left factor is 0

If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.

This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the left factor being 0 implies that the right factor is 0.

Assuming that the right factor is 0

If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.

This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the right factor being 0 implies that the left factor is 0.

Summing it upWe have shown that - the left factor = 0 => the right factor = 0 - the right factor = 0 => the left factor = 0 - the left factor = 0 the right factor = 0 This means that both the left factor and the right factor must be 0.

The lhs contains terms

We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.

I first thought, doesn't the multiplication below also produce terms involving ?

But surprisingly this is not the case. I'll show the reasoning in section of .

The lhs contains terms

We know this equals 0 because the rhs does not contain any term involving . We already know and . Thus we can say the rightmost factor is 0.

I first thought, doesn't the multiplication below also produce terms involving ?

But surprisingly this is not the case. I'll show the reasoning in section of .

The term involving in the lhs looks like this.

Assuming that the left factor is 0

If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.

This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the left factor being 0 implies that the right factor is 0.

Assuming that the right factor is 0

If we see this as a polynomial of , we can use schwartz zippel lemma to say that the below holds except negligible probability.

This is exactly the coefficient of the terms that had to be 0 in the section of . Thus we can say that, the right factor being 0 implies that the left factor is 0.

AB Conclusion

Removing all the terms we have shown to be 0, we can say

C

By transposing the rhs, we get

This equation determines how the prover must generate C. To see this, we have to see the lhs of the above equation as a polynomial of . By Schwartz-Zippel lemma, we can say that the below 3 holds except negligible proability.

The term involving only out of the 3 random challenges = 0
The term involving only out of the 3 random challenges = 0
The term involving none of the 3 random challenges = 0

The term involving only

The last term arises from . It must be multiplied by by definition, and since the Groth16 verification equation multiplies by , it also must be multiplied by to cancel it out. Out of the CRS, only the below is multiplied by both and . Thus we know that the above equation must hold.

The term involving only

The term involving no random challenge

The last term arises from . Out of the CRS, only the below is multiplied by , and multiplied by neither nor . Thus we know that the above equation must hold.

C Conclusion

We have shown that the below 3 equations hold.

determines and . The product of and determines . The witnesses within the rhs constrains the lhs. In turn the product in the lhs constrains the rhs back.

To make the 3 equations hold at the same time, we have to assign in a way that the below holds

This is exactly how the honest prover behaves!

Now we can see why even a malicious prover has to follow the proof generation formula stated in the original Groth16 paper!

Now we know that, what the lhs of the Groth16 verification equation is doing, is to multiply the left wire of each multiplication gate and the right wire of each multiplication gate, just like in the QAP verification equation!

All we have left to do is to include the output of each multiplication gate in the rhs , and which shows .

Thus, we can say, unless the prover knows valid that satisfies QAP, Groth16 verification does not pass.

Memo

Soundness

a prover can't come up with a valid proof unless a valid witness exist in this universe

Knowledge Soundness

a prover can't come up with a valid proof unless he knows a valid witness. (Assumption here is that the prover does not know any other proof)

Simulation Extractable

a prover can't come up with a valid proof unless he knows a valid witness. (Even if he has seen other valid proofs)