David Wong

cryptologie.net

cryptography, security, and random thoughts

Hey! I'm David, cofounder of zkSecurity, research advisor at Archetype, and author of the Real-World Cryptography book. I was previously a cryptography architect of Mina at O(1) Labs, the security lead for Libra/Diem at Facebook, and a security engineer at the Cryptography Services of NCC Group. Welcome to my blog about cryptography, security, and other related topics.

As I explained here a while back, checking polynomial identities (some left-hand side is equal to some right-hand side) when polynomials are hidden using polynomial commitment schemes, gets harder and harder with multiplications. This is why we use pairings, and this is why sometimes we “linearize” our identities. If you didn’t get what I just said, great! Because this is exactly what I’ll explain in this post.

Using Schwartz-Zippel with no multiplication

First, let me say that there’s typically two types of “nice” polynomial commitment schemes that people use with elliptic curves: Pedersen commitments and KZG commitments.

Pedersen commitments are basically hidden random linear combinations of the coefficients of a polynomial. That is, if your polynomial is f(x)=ci·xi your commitment will look like [ri·ci]G for some base point G and unknown random values ri. This is both good and bad: since we have access to the coefficients we can try to use them to evaluate a polynomial from its commitment, but since it’s a random linear combination of them things can get ugly.

On the other hand, KZG commitments can be seen as hidden evaluations of your polynomials. For the same polynomial f as above, a KZG commitment of f would look like [f(s)]G for some unknown random point s. Not knowing s here is much harder than not knowing the values ri in Pedersen commitments, and this is why KZG usually requires a trusted setup whereas Pedersen doesn’t.

In the rest of this post we’ll use KZG commitments to prove identities.

Let’s use [a] to mean “commitment of the polynomial a(x)“, then you can easily check that a(x)=b(x) knowing only the commitments to a(x) and b(x) by checking that [a]=[b] or [a][b]=[0]. This is because of the Schwartz-Zippel (S-Z) lemma which tells us that checking this identity at a random point is convincing with high-enough probability.

When multiplication with scalars is required, then things are fine. As you can do i·[a] to obtain [i·a], checking that i·a=j·b is as simple as checking that i·[a]j·[b]=[0].

This post is about explaining how pairing helps us when we want to check an identity that involves multiplying a and b together.

Using elliptic curve pairings for a single multiplication

It turns out that elliptic curve pairings allow us to perform a single multiplication. Meaning that once things get multiplied, they move to a different planet where things can only get added together and compared. No more multiplications.

Pairings give you this function e which allows you to move things in the exponent like this: e([a],[b])=e([1],[1])ab. Where, remember, ab is the multiplication of the two polynomials evaluated at a random point: a(s)·b(s).

As such, if you wanted to check something like this for example: a·b=c+3 with commitments only, you could check the following pairings:

e([a],[b])=e([c]+3[1],[1])

By the way, the left argument and the right argument of a pairing are often in different groups for “reasons”. So we usually write things like this:

e([a]1,[b]2)=e([c]1+3[1]1,[1]2)

And so it is important to have commitments in the right groups if you want to be able to construct your polynomial identity check.

Evaluations can help with more than one multiplication

But what if you want to check something like a·b·c=d+4? Are we doomed?

We’re not! One insight that plonk brought to me (which potentially came from older papers, I don’t know, I’m not an academic, leave me alone), is that you can reduce the number of multiplication with “this one simple trick”. Let me explain…

A typical scenario includes you wanting to check an identity like this one:

a(x)·b(x)·c(x)=d(x)

and you have KZG commitments to all three polynomials [a],[b],[c]. (So in other words, hidden evaluations of these polynomials at the same unknown random point s)

You can’t compute the commitment of the left-hand side because you can’t perform the multiplication of the three commitments.

The trick is to evaluate (using KZG) the previous identity at a different point, let’s say ζ, and pre-evaluate (using KZG as well) as many polynomials as you can to ζ to reduce the number of multiplications down to 0.

Note: that is, if we want to check that a(x)b(x)=0 is true, and we want to use S-Z to do that at some point ζ, then we can pre-evaluate a (or b) and check the following identity a(ζ)b(x)=0 at some point ζ instead.

More precisely, we’ll choose to pre-evaluate b(ζ)=b¯ and c(ζ)=c¯, for example. This means that we’ll have to produce a quotient polynomial qb and qc such that:

  1. b(s)b¯=(sζ)·qb(s)
  2. c(s)c¯=(sζ)·qc(s)

which means that the verifier will have to perform the following two pairings (after having been sent the evaluation b¯ and c¯ in the clear):

  1. e([b]1b¯·[1]1,[1]2)=e([x]1ζ·[1]1,[qb]2)
  2. e([c]1c¯·[1]1,[1]2)=e([x]1ζ·[1]1,[qc]2)

Then, they’ll be able to check the first identity at ζ and use b¯ and c¯ in place of the commitments [b] and [c]. The verifier check will look like the following pairing (after receiving a commitment [q] from the prover):

e(b¯·c¯·[a]1[d]0,[1]2)=e([x]1ζ·[1]1,[q]2)

which proves using KZG that a(ζ)b(ζ)c(ζ)d(ζ)=0 (which proves that the identity checks out with high probability thanks to S-Z).

Aggregating all the KZG evaluation proofs

In the previous explanation, we actually perform 3 KZG evaluation proofs instead of one:

  • 2 pairings that are KZG evaluation proofs that pre-evaluate different polynomials from the main check at some random point ζ.
  • 1 pairing that evaluates the main identity at ζ, after it was linearized to get rid of any multiplication of commitments.

Pairings can be aggregated by simply creating a random linear combinations of the pairings. That is, with some random values ri we can aggregate the checks where the left-hand side is:

b(s)b¯+r1(c(s)c¯)+r2(b¯·c¯·a(s)d(s)0])

and the right-hand side is:

=(sζ)·qb(s)+r1((sζ)·qc(s))+r2((sζ)·q(s))

I’ve recorded a video on how the plonk permutation works here, but I thought I would write a more incremental explanation about it for those who want MOAR! If things don’t make sense in this explanation, I’m happy to dig into specifics in more detail, just ask in the comments! Don’t forget your companion eprint paper.

Multiset equality check

Suppose that you have two ordered sets of values D={d1,d2,d3,d4} and E={e1,e2,e3,e4}, and that you want to check that they contain the same values. That is, you want to check that there exists a permutation of the elements of D (or E) such that the multisets (sets where some values can repeat) are the same, but you don’t care about which permutation exactly gets you there. You’re willing to accept ANY permutation.

{d1,d2,d3,d4}=some\_permutation({e1,e2,e3,e4})

For example, it could be that re-ordering E as {e2,e3,e1,e4} gives us exactly D.

Trick 1: multiply things to reduce to a single value

One way to do perform our multiset equality check is to compare the product of elements on both sides:

d1·d2·d3·d4=e1·e2·e3·e4

If the two sets contain the same values then our identity checks out. But the reverse is not true, and thus this scheme is not secure.

Can you see why?

For example, D=(1,1,1,15) and E=(3,5,1,1) are obviously different multisets, yet the product of their elements will match!

Trick 2: use polynomials, because maybe it will help…

What we can do to fix this issue is to encode the values of each lists as roots of two polynomials:

  • d(x)=(xd1)(xd2)(xd3)(xd4)
  • e(x)=(xe1)(xe2)(xe3)(xe4)

These two polynomials are equal if they have the same roots with the same multiplicities (meaning that if a root repeats, it must repeat the same number of times).

Trick 3: optimize polynomial identities with Schwartz-Zippel

Now is time to use the Schwartz-Zippel lemma to optimize the comparison of polynomials! Our lemma tells us that if two polynomials are equal, then they are equal on all points, but if two polynomials are not equal, then they differ on MOST points.

So one easy way to check that they match with high probability is to sample a random evaluation point, let’s say some random γ. Then evaluate both polynomials at that random point γ to see if their evaluations match:

(γd1)(γd2)(γd3)(γd4)=(γe1)(γe2)(γe3)(γe4)

Permutation check

The previous check is not useful for wiring different cells within some execution trace. There is no specific “permutation” being enforced. So we can’t use it as in in plonk to implement our copy constraints.

Trick 4: random linear combinations to encode tuples

To enforce a permutation, we can compare tuples of elements instead! For example, let’s say we want to enforce that E must be re-ordered using the permutation (132)(4) in cycle notation. Then we would try to do the following identity check:

((1,d1),(2,d2),(3,d3),(4,d4))=((2,e1),(3,e2),(1,e3),(4,e4))

Here, we are enforcing that d1 is equal to e3, and that d2 is equal to e1, etc. This allows us to re-order the elements of E:

((1,d1),(2,d2),(3,d3),(4,d4))=((1,e3),(2,e1),(3,e2),(4,e4))

But how can we encode our tuples into the polynomials we’ve seen previously? The trick is to use a random linear combination! (And that is often the answer in a bunch of ZK protocol.)

So if we want to encode (2,d2) in an equation, for example, we write 2+β·d2 for some random value β.

Note: The rationale behind this idea is still due to Schwartz-Zippel: if you have two tuples (a,b) and (a,b) you know that the polynomials a+x·b is the same as the polynomial a+x·b if a=a and b=b, or if you have x=aabb . If x is chosen at random, the probability that it is exactly that value is 1N with N the size of your sampling domain (i.e. the size of your field) which is highly unlikely.

So now we can encode the previous lists of tuples as these polynomials:

  • d(x,y)=(1+y·d1x)(2+y·d2x)(3+y·d3x)(4+y·d4x)
  • e(x,y)=(2+y·e1x)(3+y·e2x)(1+y·e3x)(4+y·e4x)

And then reduce both polynomials to a single value by sampling random values for x and y. Which gives us:

  • (1+β·d1γ)(2+β·d2γ)(3+β·d3γ)(4+β·d4γ)
  • (2+β·e1γ)(3+β·e2γ)(1+β·e3γ)(4+β·e4γ)

If these two values match, with overwhelming probability we have that the two polynomials match and thus our permutation of E matches D.

Wiring within a single execution trace column

Let’s now see how we can use the (optimized) checks we’ve learn previously in plonk. We will first learn how to wire cells of a single execution trace column, and in the next section we will expand this to three columns (as vanilla Plonk uses three columns).

Take some moment to think about how can we use the previous stuff.

The answer is to see the execution trace as your list E, and then see if it is equal to a fixed permutation of it (D). Note that this permutation is decided when you write your circuit, and precomputed into the verifier key in Plonk.

Remember that the formula we’re trying to check is the following for some random β and γ, and for some permutation function σ that we defined:

i=1(i+β·d[i]γ)=i=1(σ(i)+β·e[i]γ)

Trick 5: write a circuit for the permutation check

To enforce the previous check, we will write a mini-circuit (yes an actual circuit!) which will progressively accumulate the result of dividing the left-hand side with the right-hand side. This circuit only requires one variable/register we’ll call z (and so it will add a new column z in our execution trace) which will start with the initial value 1 and will end with the following value:

i=1i+β·d[i]γσ(i)+β·e[i]γ=1

Let’s rewrite it using only the first wire/column a of Plonk, and using our generator ω as index in our tuples (because this is how we handily index things in Plonk):

i=1ωi+β·a[i]γσ(ωi)+β·a[i]γ=1

We can then constrain the last value to be equal to 1, which will enforce that the two polynomials encoding our list of value and its permutation are equal (with overwhelming probability).

In plonk, a gate can only access variables/registers from the same row. So we will use the following extra gate (reordering the previous equation, as we can’t divide in a circuit) throughout the circuit:

z[i+1]·(σ(i)+β·a[i]γ)=z[i]·(i+β·a[i]γ)

Now, how do we encode this gate in the circuit? The astute eye will have noticed that we are using a cell of the next row (z[i+1]) which we haven’t done in Plonk so far.

Trick 6: you’re in a multiplicative subgroup, remember?

Enforcing things across rows is actually possible in plonk because we encode our polynomials in a multiplicative subgroup of our field! Due to this, we can reach for the next value(s) by multiplying an evaluation point with the subgroup’s generator.

That is, values are encoded in our polynomials at evaluation points ω,ω2,ω3,, and so multiplying an evaluation point by ω (the generator) brings you to the next cell in an execution trace.

As such, the verifier will later try to enforce that the following identity checks out in the multiplicative subgroup:

z(x·ω)·(σ(x)+β·a(x)γ)=z(x)·(x+β·a(x)γ)

Note: This concept was generalized in turboplonk, and is used extensively in the AIR arithmetization (used by STARKs). This is also the reason why in Plonk we have to evaluate the z polynomial at ζω.

There will also be two additional gates: one that checks that the initial value is 1, and one that check that the last value is 1, both applied only to their respective rows. One trick that Plonk uses is that the last value is actually obtained in the last row. As last_value + 1 = 0 in our multiplicative subgroup, we have that z[last_value+1]=z[0] is constrained automatically. As such, checking that z[0]=1 is enough.

You can see these two gates added to the vanilla plonk gate in the computation of the quotient polynomial t in plonk. Take a look at this screenshot of the round 3 of the protocol, and squint really hard to ignore the division by ZH(X), the powers of α being used to aggregate the different gate checks, and the fact that b and c (the other wires/columns) are used:

round 3

The first line in the computation of t is the vanilla plonk gate (that allows you to do multiplication and addition); the last line constrains that the first value of z is 1; and the other lines encode the permutation gate as I described (again, if you ignore the terms involving b and c).

Trick 7: create your execution trace in steps

There’s something worthy of note: the extra execution trace column z contains values that use other execution trace columns. For this reason, the other execution trace columns must be fixed BEFORE anything is done with the permutation column z.

In Plonk, this is done by waiting for the prover to send commitments of a, b, and c to the verifier, before producing the random challenges β and γ that will be used by the prover to produce the values of z.

Wiring multiple execution trace columns

The previous check only works within the cells of a single execution trace, how does Plonk generalizes this to several execution trace columns?

Remember: we indexed our first execution trace column with the values of our circuit domain (that multiplicative subgroup), we simply have to find a way to index the other columns with distinct values.

Trick 8: use cosets

A coset is simply a set that is the same size as another set, but that is completely disjoint from that set. Handily, a coset is also defined as something that’s very easy to compute if you know a subgroup: just multiply it with some element k.

Since we want a similar-but-different set from the elements of our multiplicative subgroup, we can use cosets!

Plonk produces the values k1 and k2 (which can be the values 2 and 3, for example), which when multiplied with the values of our multiplicative subgroup ({ω,ω2,ω3,}) produces a different set of the same size. It’s not a subgroup anymore, but who cares!

We now have to create three different permutations, one for each set, and each permutation can point to the index of any of the sets.

Spent some time to write a challenge focused on GKR (the proof system) on top of the gnark framework (which is used to write ZK circuits in Golang).

It was a lot of fun and I hope that some people are inspired to try to break it :)

We’re using the challenge to hire people who are interested in doing security work in the ZK space, so if that interests you, or if you purely want a new challenge, try it out here: https://github.com/zksecurity/zkBank

And of course, since this is an active wargame please do not release your own solution or write up!

My previous article on zkBitcoin blew up, and we got 100 Github stars, a number of contributions on the repository, and some interest from a number of projects, and all of that in a single day!

So as requested, I made a number of videos to explain what zkBitcoin is.

Something that might not be immediately obvious if you’re not used to zero-knowledgifying your applications, is that the provable circuits you end up using are pure functions. They do not have access to long-lasting memory and cannot have side effects. They just take some input, and produce some output.

Note: circuits are actually not strictly pure, as they are non-deterministic. For example, you might be able to use out-of-circuit randomness in your circuit.

So when mutation of persistent state is needed, you need to provide the previous state as input, and return the new state as output. This not only produces a constraint on the previous state (time of read VS time of write issues), but it also limits the size of your state.

I’ve talked about the first issue here:

The problem of update conflicts comes when one designs a protocol in which multiple participants decide to update the same value, and do so using local execution. That is, instead of having a central service that executes some update logic sequentially, participants can submit the result of their updates in parallel. In this situation, each participant locally executes the logic on the current state assuming that it will not have changed. But this doesn’t work as soon as someone else updates the shared value. In practice, someone’s update will invalidate someone else’s.

The second issue of state size is usually solved with Merkle trees, which allow you to compress your state in a verifiable way, and allow you to access or update the state without having to decompress the ENTIRE state.

That’s all.

zkbitcoin

A few months ago Ivan told me “how cool would it be if we could verify zero-knowledge proofs on Bitcoin?” A week later, we had a prototype of the best solution we could come up with: a multi-party computation to manage a Bitcoin wallet, and a committee willing to unlock funds only in the presence of valid zero-knowledge proofs. A few iterations later and we had something a bit cooler: stateful apps with states that can be tracked on-chain, and committee members that don’t need to know anything about Bitcoin. Someone might put it this way: a Bitcoin L2 with minimal trust assumption of a “canonical” Bitcoin blockchain.

From what we understand, a better way to verify zero-knowledge proofs on Bitcoin is not going to happen, and this is the best we ca have. And we built it! And we’re running it in testnet. Try it here!

suggested reads:

In the realm of multi-party computation (MPC) protocols, threshold signing is the protocol that address how multiple participants can sign something under a “shared” private key. In other words, instead of one guy signing something with a private key, we want N guys doing the same thing and obtaining the same result without any of them actually knowing the private key (each of them holds a share of the private key, revealing nothing about the private key itself).

The threshold part means that not every participant who has a share has to participate. If there’s N participants, then only t<N has to participate for the protocol to succeed. The t and N depend on the protocol you want to design, on the overhead you’re willing to eat, the security you want to attain, etc.

Threshold protocols are not just for signing, they’re everywhere. The NIST has a Multi-Party Threshold Cryptography competition, in which you can see proposals for threshold signing, but also threshold decryption, threshold key exchanges, and others.

This post is about threshold signatures for ECDSA specifically, as it is the most commonly used signature scheme and so has attracted a number of researchers. In addition, I’m only going to talk about the history of it, because I haven’t written an actual explainer on how these works, and because the history of threshold signing for ECDSA is really messy and confusing and understanding what constructions exist out there is near impossible due to the naming collisions and the number of papers released without proper nicknames (unlike FROST, which is the leading threshold signing algorithm for schnorr signatures).

So here we are, the main line of work for ECDSA threshold signatures goes something like this, and seems to mainly involve two Gs (Gennaro and Goldfeder):

  1. GG18. This paper is more officially called “Fast Multiparty Threshold ECDSA with Fast Trustless Setup” and improves on BGG: Using level-1 homomorphic encryption to improve threshold DSA signatures for bitcoin wallet security (2017) and GGN: Threshold-optimal dsa/ecdsa signatures and an application to bitcoin wallet security (2016).
  2. GG19. This has the same name as GG18, but fixes some of the issues in GG18. I think this is because GG18 was published in a journal, so they couldn’t update it. But GG18 on eprint is the updated GG19 one. (Yet few people refer to it as GG19.) It fixes a number of bugs, including the ones brought by the Alpha-Rays attack, and A note about the security of GG18.
  3. GG20. This paper is officially called “One Round Threshold ECDSA with Identifiable Abort” and builds on top of GG18/GG19 to introduce the ability to identify who caused the abort. (In other words, who messed up if something was messed up during the multi-party computation.) Note that there are still some bugs in this paper.
  4. CGGMP21. This one combines GG20 with CMP20 (another work on threshold signatures). This is supposed to be the latest work in this line of work and is probably the only version that has no known issues.

Note that there’s also another line of work that happened in parallel from another team, and which is similar to GG18 except that they have different bugs: Lindell-Nof: Fast secure multiparty ecdsa with practical distributed key generation and applications to cryptocurrency custody (2018).

PS: thanks to Rosario Gennaro for help figuring this out :)

I haven’t seen much ink being spewed on the ZK update conflict issue so I’ll write a short note here.

Let’s take a step back. Zero-knowledge proofs allow you to prove the result of the execution of some logic. Like signatures attached to data you receive, ZK proofs can be attached to a computation result. This means that with ZK, internet protocols can be rethought and redesigned. If execution of the protocol logic had to happen somewhere trusted, now some of it can be moved around and delegated to untrusted places, or for privacy-reasons some of it can be moved to places where private data should remain.

How do we design protocols using ZK? It’s easy, assume that when a participant of your protocol computes something, they will do it honestly. Then, when you implement the protocol, use ZK proofs to enforce that they behave as intended.

The problem of update conflicts comes when one designs a protocol in which multiple participants decide to update the same value, and do so using local execution. That is, instead of having a central service that executes some update logic sequentially, participants can submit the result of their updates in parallel. In this situation, each participant locally executes the logic on the current state assuming that it will not have changed. But this doesn’t work as soon as someone else updates the shared value. In practice, someone’s update will invalidate someone else’s.

This issue is not just a ZK issue, if you know anything about databases then how to perform conflict resolution has been an issue for a very long time. For example, in distributed databases with more than one writer, conflicts could happen as two nodes attempt to update the same value at the same time. Conflict can also happen in the same way in applications where multiple users want to update the same data, think Google Docs.

The solutions as far as I know can be declined in the following categories:

  1. Resolve conflicts automatically. The simplest example is the Thomas write rule which discards any outdated update. In situations were discarding updates is unacceptable more algorithm can take over. For example, Google Docs uses an algorithm called Operational Transformation to figure out how to merge two independent updates.
  2. Ask the user for help if needed. For example, the git merge command that can sometimes ask for your help to resolve conflicts.
  3. Refuse to accept any conflicts. This often means that the application is written in such a way that conflicts can’t arise, and in distributed databases this always mean that there can only be a single node that can write (with all other nodes being read-only). Although applications can also decide to simply deny updates that lead to conflicts, which would lead to poor performance in concurrency-heavy scenarios, as well as poor user experience.

As one can see, the barrier between application and database doesn’t matter too much, besides the fact that a database has poor ways of prompting a user: when conflict resolution must be done by a user it is generally the role of the application to reach out.

What about ZK though? From what I’ve seen, the last “avoid conflicts” solution is always chosen. Perhaps this is because my skewed view has only been within the blockchain world, which can’t afford to play conflict resolution with $$$.

For example, simpler ZK protocols like Zcash will often massage their protocol such that proofs are only computed on immutable data. For example, arguments of a function cannot be the latest root of a merkle tree (as it might get updated before we can publish the result of running the function) but it can easily be the root of a merkle tree that was seen previously (we’re using a previous state, not the latest state, that’s fine).

Another technique is to extract the parts of updates that occur on a shared data structure, and sequence them before running them. For example, the set of nullifiers in zcash is updated outside of a ZK execution by the network, according to some logic that only gets executed sequentially. More complicated ZK platforms like Aleo and Mina do that as well. In Aleo’s case, the user can split the logic of its smart contracts by choosing what can be executed locally (provided a proof) and what has to be executed serially by the network (Ethereum-style). In Mina’s case, updates that have the potential to lead to conflicts are queued up and later on a single user can decide (if authorized) to process the queued updates serially but in ZK.

Cairo's public memory

blog

Here are some notes on how the Cairo zkVM encodes its (public) memory in the AIR (arithmetization) of the STARK.

If you’d rather watch a 25min video of the article, here it is:

The AIR arithmetization is limited on how it can handle public inputs and outputs, as it only offer boundary constraints. These boundary constraints can only be used on a few rows, otherwise they’re expensive to compute for the verifier. (A verifier would have to compute iS(xgi) for some given x, so we want to keep |S| small.)

For this reason Cairo introduce another way to get the program and its public inputs/outputs in: public memory. This public memory is strongly related to the memory vector of cairo which a program can read and write to.

In this article we’ll talk about both. This is to accompany this video and section 9.7 of the Cairo paper.

Cairo’s memory

Cairo’s memory layout is a single vector that is indexed (each rows/entries is assigned to an address starting from 1) and is segmented. For example, the first l rows are reserved for the program itself, some other rows are reserved for the program to write and read cells, etc.

Cairo uses a very natural “constraint-led” approach to memory, by making it write-once instead of read-write. That is, all accesses to the same address should yield the same value. Thus, we will check at some point that for an address a and a value v, there’ll be some constraint that for any two (a1,v1) and (a2,v2) such that a1=a2, then v1=v2.

Accesses are part of the execution trace

At the beginning of our STARK, we saw in How STARKs work if you don’t care about FRI that the prover encodes, commits, and sends the columns of the execution trace to the verifier.

The memory, or memory accesses rather (as we will see), are columns of the execution trace as well.

The first two columns introduced in the paper are called L1.a and L1.v. For each rows in these columns, they represent the access made to the address a in memory, with value v. As said previously, we don’t care if that access is a write or a read as the difference between them are blurred (any read for a specific address could be the write).

These columns can be used as part of the Cairo CPU, but they don’t really prevent the prover from lying about the memory accesses:

  1. First, we haven’t proven that all accesses to the same addresses ai always return the same value vi.
  2. Second, we haven’t proven that the memory contains fixed values in specific addresses. For example, it should contain the program itself in the first l cells.

Let’s tackle the first question first, and we will address the second one later.

Another list to help

In order to prove that the two columns in the L1 part of the execution trace, Cairo adds two columns to the execution trace: L2.a and L2.v. These two columns contain essentially the same things as the L1 columns, except that these times the accesses are sorted by address.

One might wonder at this point, why can’t L1 memory accesses be sorted? Because these accesses represents the actual memory accesses of the program during runtime, and this row by row (or step by step). The program might read the next instruction in some address, then jump and read the next instruction at some other address, etc. We can’t force the accesses to be sorted at this point.

We will have to prove (later) that L1 and L2 represent the same accesses (up to some permutation we don’t care about).

So let’s assume for now that L2 correctly contains the same accesses as L1 but sorted, what can we check on L2?

The first thing we want to check is that it is indeed sorted. Or in other words:

  • each access is on the same address as previous: ai+1=ai
  • or on the next address: ai+1=ai+1

For this, Cairo adds a continuity constraint to its AIR:

Screenshot 2023-11-21 at 10.55.07 AM

The second thing we want to check is that accesses to the same addresses yield the same values. Now that things are sorted its easy to check this! We just need to check that:

  • either the values are the same: vi+1=vi
  • or the address being accessed was bumped so it’s fine to have different values: ai+1=ai+1

For this, Cairo adds a single-valued constraint to its AIR:

Screenshot 2023-11-21 at 10.56.11 AM

And that’s it! We now have proven that the L2 columns represent correct memory accesses through the whole memory (although we didn’t check that the first access was at address 1, not sure if Cairo checks that somewhere), and that the accesses are correct.

That is, as long as L2 contains the same list of accesses as L1.

A multiset check between L1 and L2

To ensure that two list of elements match, up to some permutation (meaning we don’t care how they were reordered), we can use the same permutation that Plonk uses (except that plonk fixes the permutation).

The check we want to perform is the following:

{(ai,vi)}i={(ai,vi)}i

But we can’t check tuples like that, so let’s get a random value α from the verifier and encode tuples as linear combinations:

{ai+α·vi}i={ai+α·vi}i

Now, let’s observe that instead of checking that these two sets match, we can just check that two polynomials have the same roots (where the roots have been encoded to be the elements in our lists):

i[X(ai+α·vi)]=i[X(ai+α·vi)]

Which is the same as checking that

i[X(ai+α·vi)]i[X(ai+α·vi)]=1

Finally, we observe that we can use Schwartz-Zippel to reduce this claim to evaluating the LHS at a random verifier point z. If the following is true at the random point z then with high probability it is true in general:

i[z(ai+α·vi)]i[z(ai+α·vi)]=1

The next question to answer is, how do we check this thing in our STARK?

Creating a circuit for the multiset check

Recall that our AIR allows us to write a circuit using successive pairs of rows in the columns of our execution trace.

That is, while we can’t access all the ai and ai and vi and vi in one shot, we can access them row by row.

So the idea is to write a circuit that produces the previous section’s ratio row by row. To do that, we introduce a new column p in our execution trace which will help us keep track of the ratio as we produce it.

pi=pi1·z(ai+α·vi)z(ai+α·vi)

This is how you compute that p column of the execution trace as the prover.

Note that on the verifier side, as we can’t divide, we will have to create the circuit constraint by moving the denominator to the right-hand side:

p(g·x)·[z(a(x)+α·v(x))]=p(x)·[z(a(x)+α·v(x))]

There are two additional (boundary) constraints that the verifier needs to impose to ensure that the multiset check is coherent:

  • the initial value p0 should be computed correctly (p0=z(a0+α·v0)z(a0+α·v0))
  • the final value p1 should contain 1

Importantly, let me note that this new column p of the execution trace cannot be created, encoded to a polynomial, committed, and sent to the verifier in the same round as other columns of the execution trace. This is because it makes uses of two verifier challenges z and α which have to be revealed after the other columns of the execution trace have been sent to the verifier.

Note: a way to understand this article is that the prover is now building the execution trace interactively with the help of the verifier, and parts of the circuits (here a permutation circuit) will need to use these columns of the execution trace that are built at different stages of the proof.

Inserting the public memory in the memory

Now is time to address the second half of the problem we stated earlier:

Second, we haven’t proven that the memory contains fixed values in specific addresses. For example, it should contain the program itself in the first l cells.

To do this, the first l accesses are replaced with accesses to (0,0) in L1. L2 on the other hand uses acceses to the first parts of the memory and retrieves values from the public memory m\* (e.g. (1,m\*[0]),(2,m\*[1]),).

This means two things:

  1. the nominator of p will contain z(0+α·0)=z in the first l iterations (so zl). Furthermore, these will not be cancelled by any values in the denominator (as L2 is supposedly using actual accesses to the public memory)
  2. the denominator of p will contain i[[0,l]][z(ai+α·m\*[i])], and these values won’t be canceled by values in the nominator either

As such, the final value of the accumulator should look like this if the prover followed our directions:

zli[[0,l]][z(ai+α·m\*[i])]

which we can enforce (as the verifier) with a boundary constraint.

Section 9.8 of the Cairo paper writes exactly that:

Screenshot 2023-11-21 at 11.31.39 AM

Here’s some notes on how STARK works, following my read of the ethSTARK Documentation (thanks Bobbin for the pointer!).

Warning: the following explanation should look surprisingly close to PlonK or SNARKs in general, to anyone familiar with these other schemes. If you know PlonK, maybe you can think of STARKs as turboplonk without preprocessing and without copy constraints/permutation. Just turboplonk with a single custom gate that updates the next row, also the commitment scheme makes everything complicated.

The execution trace table

Imagine a table with W columns representing registers, which can be used as temporary values in our program/circuit. The table has N rows, which represent the temporary values of each of these registers in each “step” of the program/circuit.

For example, a table of 4 registers and 3 steps:

r0 r1 r2
1 0 1 534
2 4 1 235
3 3 4 5

The constraints

There are two types of constraints which we want to enforce on this execution trace table to simulate our program:

  • boundary constraints: if I understand correctly this is for initializing the inputs of your program in the first rows of the table (e.g. the second register must be set to 1 initially) as well as the outputs (e.g. the registers in the last two rows must contain 3, 4, and 5).
  • state transitions: these are constraints that apply to ALL contiguous pairs of rows (e.g. the first two registers added together in a row equal the value of the third register in the next row). The particularity of STARKs (and what makes them “scallable” and fast in practice) is that the same constraint is applied repeatidly. This is also why people like to use STARKs to implement zkVMs, as VMs do the same thing over and over.

This way of encoding a circuit as constraints is called AIR (for Algebraic Intermediate Representation).

Straw man 1: Doing things in the clear coz YOLO

Let’s see an example of how a STARK could work as a naive interactive protocol between a prover and verifier:

  1. the prover constructs the execution trace table and sends it to the verifier
  2. the verifier checks the constraints on the execution trace table by themselves

This protocol works if we don’t care about zero-knowledge, but it is obviously not very efficient: the prover sends a huge table to the verifier, and the verifier has to check that the table makes sense (vis a vis of the constraints) by checking every rows involved in the boundary constraints, and checking every contiguous pair of rows involved in the state transition constraints.

Straw man 2: Encoding things as polynomials for future profit

Let’s try to improve on the previous protocol by using polynomials. This step will not immediately improve anything, but will set the stage for the step afterwards. Before we talk about the change to the protocol let’s see two different observations:

First, let’s note that one can encode a list of values as a polynomial by applying a low-degree extension (LDE). That is, if your list of values look like this: (y0,y1,y2,), then interpolate these values into a polynomial f such that f(0)=y0,f(1)=y1,f(2)=y2,

Usually, as we’re acting in a field, a subgroup of large-enough size is chosen in place of 0,1,2 as domain. You can read why’s that here. (This domain is called the “trace evaluation domain” by ethSTARK.)

Second, let’s see how to represent a constraint like “the first two registers added together in a row equal the value of the third register in the next row” as a polynomial. If the three registers in our examples are encoded as the polynomials f1,f2,f3 then we need a way to encode “the next row”. If our domain is simply (0,1,2,) then the next row for a polynomial f1(x) is simply f1(x+1). Similarly, if we’re using a subgroup generated by g as domain, we can write the next row as f1(x·g). So the previous example constraint can be written as the constraint polynomial c0(x)=f1(x)+f2(x)f3(x·g).

If a constraint polynomial c0(x) is correctly satisfied by a given execution trace, then it should be zero on the entire domain (for state transition constraints) or on some values of the domain (for boundary constraints). This means we can write it as c0(x)=t(x)·i(xgi) for some “quotient” polynomial t and the evaluation points gi (that encode the rows) where the constraint should apply. (In other words, you can factor c0 using its roots gi.)

Note: for STARKs to be efficient, you shouldn’t have too many roots. Hence why boundary constraints should be limited to a few rows. But how does it work for state transition constraints that need to be applied to all the rows? The answer is that since we are in a subgroup there’s a very efficient way to compute i(xgi). You can read more about that in Efficient computation of the vanishing polynomial of the Mina book.

At this point, you should understand that a prover that wants to convince you that a constraint c1 applies to an execution trace table can do so by showing you that t exists. The prover can do so by sending the verifier the t polynomial and the verifier computes c1 from the register polynomials and verifies that it is indeed equal to t multiplied by the i(xgi). This is what is done both in Plonk and in STARK.

Note: if a constraint doesn’t satisfy the execution trace, then you won’t be able to factor it with i(xgi) as not all of the gi will be roots. For this reason you’ll get something like c1(x)=t(x)·i(xgi)+r(x) for r some “rest” polynomial. TODO: at this point can we still get a t but it will have a high degree? If not then why do we have to do a low-degree test later?

Now let’s see our modification to the previous protocol:

  1. Instead of sending the execution trace table, the prover encodes each column of the execution trace table (of height N) as polynomials, and sends the polynomials to the verifier.
  2. The prover then creates the constraint polynomials c0,c1, (as described above) for each constraint involved in the AIR.
  3. The prover then computes the associated quotient polynomials t0,t1, (as described above) and sends them to the verifier. Note that the ethSTARK paper call these quotient polynomials the constraint polynomials (sorry for the confusion).
  4. The verifier now has to check two things:
    • low-degree check: that these quotient polynomials are indeed low-degree. This is easy as we’re doing everything in the clear for now (TODO: why do we need to check that though?)
    • correctness check: that these quotient polynomials were correctly constructed. For example, the verifier can check that for t0 by computing c0 themselves using the execution trace polynomials and then checking that it equals t0·(x1). That is, assuming that the first constraint c0 only apply to the first row g0=1.

Straw man 3: Let’s make use of the polynomials with the Schwartz-Zippel optimization!

The verifier doesn’t actually have to compute and compare large polynomials in the correctness check. Using the Schwartz-Zippel lemma one can check that two polynomials are equal by evaluating both of them at a random value and checking that the evaluations match. This is because Schwartz-Zippel tells us that two polynomials that are equal will be equal on all their evaluations, but if they differ they will differ on most of their evaluations.

So the previous protocol can be modified to:

  1. The prover sends the columns of the execution trace as polynomials f0,f1, to the verifier.
  2. The prover produces the quotient polynomials t0,t1, and sends them to the verifier.
  3. The verifier produces a random evaluation point z.
  4. The verifier checks that each quotient polynomial has been computed correctly. For example, for the first constraint, they evaluate c0 at z, then evaluate t0(z)·(z1), then check that both evaluations match.

Straw man 4: Using commitments to hide stuff and reduce proof size!

As the title indicates, we eventually want to use commitments in our scheme so that we can add zero-knowledge (by hiding the polynomials we’re sending) and reduce the proof size (our commitments will be much smaller than what they commit).

The commitments used in STARKs are merkle trees, where the leaves contain evaluations of a polynomial. Unlike the commitments used in SNARKs (like IPA and KZG), merkle trees don’t have an algebraic structure and thus are quite limited in what they allow us to do. Most of the complexity in STARKs come from the commitments. In this section we will not open that pandora box, and assume that the commitments we’re using are normal polynomial commitment schemes which allow us to not only commit to polynomials, but also evaluate them and provide proofs that the evaluations are correct.

Now our protocol looks like this:

  1. The prover commits to the execution trace columns polynomials, then sends the commitments to the verifier.
  2. The prover commits to the quotient polynomials, the sends them to the verifier.
  3. The verifier sends a random value z.
  4. The prover evaluates the execution trace column polynomials at z and z·g (remember the verifier might want to evaluate a constraint that looks like this c0(x)=f1(x)+f2(x)f3(x·g) as it also uses the next row) and sends the evaluations to the verifier.
  5. The prover evaluates the quotient polynomials at z and sends the evaluations to the verifier (these evaluations are called “masks” in the paper).
  6. For each evaluation, the prover also sends evaluation proofs.
  7. The verifier verifies all evaluation proofs.
  8. The verifier then checks that each constraint is satisfied, by checking the t=c·i(xgi) equation in the clear (using the evaluations provided by the prover).

Straw man 5: a random linear combination to reduce all the checks to a single check

If you’ve been reading STARK papers you’re probably wondering where the heck is the composition polynomial. That final polynomial is simply a way to aggregate a number of checks in order to optimize the protocol.

The idea is that instead of checking a property on a list of polynomial, you can check that property on a random linear combination. For example, instead of checking that f1(z)=3 and f2(z)=4, and f3(z)=8, you can check that for random values r1,r2,r3 you have:

r1·f1(z)+r2·f2(z)+r3·f3(z)=3r1+4r2+8r3

Often we avoid generating multiple random values and instead use powers of a single random value, which is a tiny bit less secure but much more practical for a number of reasons I won’t touch here. So things often look like this instead, with a random value r:

f1(z)+r·f2(z)+r2·f3(z)=3+4r+8r2

Now our protocol should look like this:

  1. The prover commits to the execution trace columns polynomials, then sends the commitments to the verifier.
  2. The verifier sends a random value r.
  3. The prover produces a random linear combination of the constraint polynomials.
  4. The prover produces the quotient polynomial for that random linear combination, which ethSTARK calls the composition polynomial.
  5. The prover commits to the composition polynomial, then sends them to the verifier.
  6. The protocol continues pretty much like the previous one

Note: in the ethSTARK paper they note that the composition polynomial is likely of higher degree than the polynomials encoding the execution trace columns. (The degree of the composition polynomial is equal to the degree of the highest-degree constraint.) For this reason, there’s some further optimization that split the composition polynomial into several polynomials, but we will avoid talking about it here.

We now have a protocol that looks somewhat clean, which seems contradictory with the level of complexity introduced by the various papers. Let’s fix that in the next blogpost on FRI…

📖 my book
Real-World Cryptography is available from Manning Publications.
A practical guide to applied cryptography for developers and security professionals.
🎙️ my podcast
Two And A Half Coins on Spotify.
Discussing cryptocurrencies, databases, banking, and distributed systems.
📺 my youtube
Cryptography videos on YouTube.
Video explanations of cryptographic concepts and security topics.
page info:
page 3 of 62
616 posts total