Loading...
Loading...

Welcome to Zero Knowledge. I'm your host Anna Rose. In this podcast, we will be exploring
the latest in Zero Knowledge Research and the Decentralized Web, as well as new paradigms
that promise to change the way we interact and transact online.
This week, in our third installment of the six-part mini-series on Lean Ethereum, Niko
is joined by Jackamo Fenzie, PhD student at EPFL, and Antonio Sanso from the Ethereum
Foundation. They discuss the theory and security of post-quantum snarks, the hash-based
proof system that will underpin Lean VM and signature aggregation in a post-quantum
Ethereum. For these episodes, we have Niko as our host, we are also releasing this in
video form over on our YouTube channel. Links are in the show notes, in case you want
to check it out over there, and be sure to subscribe to the channel while you're at it.
In this conversation, Niko and Antonio and Jackamo discuss the recent wave of research on proximity
gap conjectures and the $1 million in proximity prizes offered by the EF, an initiative which
has prompted a flurry of new papers on this topic. It's a deep dive into the mathematical
foundations that determine how much confidence we can place in post-quantum proof systems. Hope
you will watch or listen along to this and other episodes in our mini-series. And yeah,
let us know what you think. Now before we kick off, three quick points. First, ZK Summit
is coming back with its 14th edition. Applications to attend and or speak are now open. It will
be on May 7th in Rome, so hope to see there. Second, ZK Mesh Plus, our paid subscription
tier is rolling. We have already released a few bonus clips from the show, as well as
some of our exclusive research reports, like the State of ZK Report and its addendums.
If you want access, please join the paid tier on ZK Mesh Plus. Third, if you're looking
for other ways to support the show, we have donation accounts visible in the footer of
our website. So if you choose to do so, thank you in advance. That's all for now. Links
are in the show notes. And here is Niko's interview with Jackamo and Antonio.
Today, I'm here with Jackamo Fancy, PhD student at EPFL and Antonio Sanso, cryptography researcher
at The Theorem Foundation. This is episode three of our mini series on the linear theorem
upgrade. And this one is dedicated to the theory and security of post-content snarks.
Hello, Jackamo. Welcome back to the show. Hi. How's it going? Good. Thanks. And hello,
Antonio. Also, welcome back to the show. Yeah. Thank you. Actually, before we get started,
I'm curious. You both have already been on the show. What's been happening with you guys
since your last year? That's a good question. So I think it was the last time, one year or
two years ago, we were with Gal. We were talking about stir, which was like a time like a new
and fancy protocol that had just gotten out. Since then, we've had a few other cute and
fancier protocols. Maybe I would say so. We had like where, we had the warp tensor switch,
next to other things. And I'm still in switch running at EPFL, enjoying the mountains,
let's say, with a bit of cryptography on the side. Nice. How about you, Antonio? Actually,
last time you came on, you were talking about kind of the opposite of what we're doing here,
like very non-post-content snarks, right? Exactly. Yeah. Well, we talk about some elliptical
stuff. And I don't know, someone decided to kill my area of research. So there used to be
elliptical curves and that not are gone. And I saw Janice. But lately, I've actually tried to
people from from elliptical and I saw Janice. And since then, I've been mostly working on Poseidon
and now like on Starric and more original somehow, proof system. And yeah, that's life.
Like we adapt and we learn. So far on the series, we've covered the fact that Ethereum is moving to
post-content signatures, like post-content everything, essentially. We've learned that we also need
multi-signatures for consensus and that to achieve those were going to be using a snark.
I've also been told that there is a custom VM being built, lean VM. And that's actually the topic
of next episode. But what I was curious about is what snark are we using to prove this VM? Is this
girl-classic growth 16? Are we using off-the-shelf 2, funky 2? What's happening?
Yeah, so things are changing a little bit. So from the old tie-way that we're doing cryptography,
like the main focus of some of these upgrade of Ethereum is about trying to get to post-content
resistance. So a lot of the techniques that we were previously using cannot really work.
So both for like conservatism and for performance, like the choice has fallen towards
aspect signatures. And Linvian is currently like being built on top of like a custom stack of like
different components, like trying to be built for the signal aggregation stack. It would
choose a VM advertising and then were as the, it's called polynomial commitment plus underlying
the construction. Okay, and so is this a sum check based snark? Is this unit varied based with
our good old like quotients like we like to do? So we're doing, in modern, we're in the 2026
multiliner is the way to go. And Linvian is also following his multiliner approach using some checks
as much as possible. So the older communication is multiliner oriented and it is using at the same time
other like look up techniques like log up star, which I think Toma and Emil will be telling us much
more in the next episode. But multiliner is the name of the game is the recently I think a lot of
people have been realizing that this is the way to actually do us based the persistence in general.
And it has a lot of benefits. So currently like there's been a big push from industry and like
academia like to move towards multiliner persistence and some check of course. Yeah. And so is our name?
For this snark. So not really like as far as I know it's just called the super spartan plus log up
slash log of star plus were. Okay. So maybe they need to a bit of a bandit. Yeah. So let's just
call it the Linvian stack for now. Okay. So we have this new snark. It's using components that we are
somewhat familiar with or that have been around in different papers. There was still I guess some
security questions. I remember Justin Drake announcing actually on the show a one million dollar
prize for something about hash based proofs. Can you guys tell us a bit more about what that was
about? How that connects to the snark and what the hell happened with proximity apps? Yeah.
I can actually just like give a bit of history here. Right. I mean, you know, like there is always
this this this like dichotomy between security and speed and and then like hash based snark. I
mean, there is like you can actually play really safe and and goes low or being aggressive on on
on on mathematics and on conjectures and and and then being fast. And you know, the
the tendency in the last years has been like to go on where I have a crypto and blockchain. So
the tendency was speed and at the cost of potential security in the in the last years. So what
what was like happening like we want to have some bit of more confidence on the conjecture we
were using on some of this proof system deployment. And there was this this proximity app prize of
one million dollars that was announced by Justin Drake and sponsored by the Trim Foundation.
Like one million in price if someone was actually able to to break. So I'll give you a time for
example, or prove these conjectures that are associated to like to this stark fry. And maybe
Jack one wants to since Jack is one of the judges of the of Dulce. This prize maybe won't
spend some more words, but this was giving a bit of context. Yeah, absolutely. Yeah. So I'm
helping out with the with judging submissions. And together with Dumbonegan that are known we're
writing like a summary because as Nico saying a lot of things have happened in the last month,
especially like in November. A lot of people come out. There's been a lot of things going on.
But maybe let me take a step back on like what do you surprise about what are this conjecture now
they kind of like fit in in the larger like ash-based landscape. Yeah. Most ash-based
narks they rely on like very similar techniques. They're similar to like batching in a certain sense
but with some slide like differences. So for example, batching when we were talking about like in
pro system is this concept of like I have like many claims which I of which many some of them
may be valid some of them might be on valid. And I want to make sure that I can take many claims
compressing into a single one. And if any of the original claim was invalid then I would want
the final claim to also invalid. So in a certain sense I'm kind of like preserving like the validity
of the entire like system into a single claim while performing some side of compression. And these
kind of primitives are the main building block of like pro systems in general but ash-based in
particular. So in ash-based pro system this is as to do a lot with error correcting codes.
And the condition of when some error correcting codes that have been corrupted overall by a
large fraction of their code word end up like being passed into something which ends up like
being corrupted in a smaller fraction of the code word. So in a sense the corruptions are
easy to detect so then the verifier will figure them out. But if I'm kind of like reducing the
amount of corruptions in the code word then this makes the verifier order to detect things.
And the opportunity price about trying to like figure out effectively how this distance
preservation kind of claims end up like behaving in the content of an ash-based narc.
And we know that for certain levels and fractional corruptions we can prove security probably.
Right. And once the noise instead becomes too large we know that the security cannot be guaranteed.
And we know there's like an area in the middle where we don't our current mathematical techniques
do not tell us enough to really be able to prove security protocols. We don't need or know
ways to break security. Neither we know how to prove security. And traditionally people have been
conjecturing that these area were secure. And the opportunity price main objective is to try to like
really put the foot down on formalizing what these conjectures are and try to spur like research
into really figuring out what the behavior will be in these areas.
Right. So just to be clear it doesn't mean our snarks are insecure. It's just like the way we
analyze them and the way we compute parameters changes depending on what the mathematics tell us
is true or not, right? Exactly. Exactly. Okay. So maybe there's a lot of luxury that we have in
ash-based cryptography. In that we don't really have to rely on cryptographic assumptions
other than the collision resistance of an ash-function. What basically means is that the
proof system can be fully proven secure in the random oracle model. What this means is that if
we believe the random oracle we can give like concrete information to radical guarantees
on the soundness of the protocol. This is different for example from a liptic curve in lattice-based
cryptography where a better algorithm to like solve certain problems would make progress in breaking
the cryptographic system. Yeah. So we know like that we know that our analysis guarantees
security up to some point. And we do suspect that security is also guaranteed after this point,
up to some point after which actually there are there would be attacks. Right. And are like maybe
there are information to radical attacks, maybe there are concrete attacks that one could mount,
but basically yes this is yeah. Just wanted to add that actually from my perspective at least
this millennium price kind of met the objectives. It was like in a spare of a one-half month,
we got more progress on the topic, more morning one half month than in years of research,
and actually a little trivia I guess like me and you like the people that were in the room in
Cambridge seen live how this idea was one of the the first idea that actually like the first like
track on the on the conjecture actually was born and that was actually pretty exciting that I'm
talking about the scores about the retreating Cambridge that was amazing. Yeah. And we've actually
mentioned that in the first episode with Justin. So this problem of can I batch things and still
have the right distances was sort of this proximity gaps problem. There were five or six papers that
came up pretty much within like two, three weeks of each other. All talking about this problem,
showing negative positive results. What what did they do? Like what does it mean? So yeah,
there's been a lot of progress. There's been I think five or six papers. I actually have a list of
them in front of me. So the first one was this paper by Ben Diamond and Angus Gruen,
which was closely followed by one by Elizabeth Crive and Alistair Stewart. Then there was one by
Elements Asson, Dan Carmon, Ulrich Abok, Sassico Party and Schubanjus Raff. Then there was one
by Ran Goyal and Venkat Guzwami. Then there was one by Sarah, Alessandro, Z and Ignacio
from our group at EPFL. And all of them had slightly different like results, which are kind of
hard to serve it together, but let us try to do that. So okay, before this explosion of papers,
we basically knew the following. So remember before we was talking about this fraction.
So there's one value of this fraction, which is called Johnson-bound, which is a very
classical measure in coding theory. If the corruption is less than a Johnson-bound,
then we know the security holds unconditionally. So we know for the code that we care about for
LIMVM and for like Starks in general, we know that the construction is secure.
Beyond a certain other fraction, which is called the capacity, we knew the security cannot hold.
When the fraction of corruption is more than this capacity bound, then we know that there's
no security can be granted. And nothing was really known in the middle.
Okay. So this is the status quo before all this paper came out. Then there were the
the diamond and ruin and the crisis toward papers. What this paper showed roughly was that
the correlated agreement error, which is the name for this property of batching things,
actually cannot hold somewhere like, which is at a fraction of corruption, which is
lightly smaller than the capacity value. Okay, so it's inside the range where it was unknown
until then. Exactly, exactly. So this is a quantity that corresponds
roughly to the alias bound, which is another like classic information theoretically coding
quantity. This is like close to like the close to the capacity bound, but far enough that actually
this attack would the grade, well, this attack, this analysis would degrade the security of like
some of the deployments in practice by I think from correctly was the three to five bits.
So if you deploy like a system with 128 bits of security, a conjecture security,
then this would reduce it to 125 bits, for example. Yeah, opening a parenthesis here,
in general, like what surprised me a lot about, especially like the second paper by
two or the end crisis, that actually, surprising is from my perspective, the result that they
managed to achieve, they use a really old paper that is like from 1964. So you will expect this
answer will have a answer like that will use the modern technique. It's not like they really use
a really classical bounce from the 60s, and that was this amazed me at the beginning. Right.
I guess at the end of the day, these are very like fundamental problems of like counting
fractions of things, right. And when you start talking about like counting things and like looking
at polynomials, then there's a whole depth of literature like in mathematics that you can go and
take from. Yeah, and it's really really beautiful as well. Like in a sense, it was not
unexpected. Like we knew there would be like a face transition at capacity. So after capacity,
everything would be broken. So you know, one of the questions that we had that this
retribution came, but it was like, look, I cannot believe that one little bit below capacity
actually, I think will magically be safe. There should be like a somewhat smooth face transition
between when we're secure and not secure. And this particular like consideration was kind of like
what guided some of these papers in the attacks to look at this boundary region and see what
what issues were to arise. Right. This was like some of the negative results, and then there's
like some of the positive results. Okay. So negative again, we mean like within the unknown area,
some part of it is actually unusable for security analysis. Because of results, I guess you mean
on the other side, like pushing beyond what we knew. Exactly. Exactly. And on that, so there was
the paper by Eliben Sasson, Dan Carmon, Ulrich Abok, Fastico Party, and Shubangis Raf. So they
showed a lot of things. This paper is a kaleidoscopic paper. It has like both negative and
positive results. But among the positive, they showed, for example, that for like Rich Solomon
Code, which are called the we like, particularly appreciating systems. Not only the bound to
also the Johnson bound that I was saying before, but they showed an improderer. So that the bound
up to the Johnson bound is actually like the error of this batching step is lower than what
than what was previously thought. Okay. So the result is positive in the sense that we get
better errors than what we thought we had before. But it's not positive in the sense that we can
use our analysis beyond the Johnson bound. We're still limited by the Johnson bound. Well, yeah,
forgetting beyond the Johnson bound, actually, there was the first positive result. It was a
breakthrough by Goliath and Groswami. So for people who are like Nidera, Groswami is
incredibly gifted coding theorist, one of the like most celebrated that I know, like from the
Groswami to Dan Algrim, among many, many other things. I cannot understand out in their state how
many of the contributions in coding theory come from this camp. And they what they showed is that
actually for some specific families of code, and in fact for random linear codes,
mutual correlated agreements or a strengthening of this batching property that was talking about
actually all almost up to capacity for this specific kind of codes. Okay. So these codes are also
very related to Rit Solomon codes. They are either like folded Rit Solomon codes,
univariate multiplicity codes, or random linear Rit Solomon codes, or random linear codes,
even these are like the more extravagant ones. Yeah. Like especially this random linear code,
Rit Solomon code is like, I'm going to pick a Rit Solomon code as random according to some
procedure. And with that probability, we'll have good properties for us to use. So for this random
codes, I would be able to actually prove security up to very close capacity.
Okay. So did you still, concretely, these are asymptotic results. They're still missing like a
lot of things. We cannot use these random codes. Right. Like we wouldn't have an efficient snark
from these random codes. Exactly. And even the folded codes, which are explicit codes,
we've run the numbers. It doesn't really seem like they end up like giving a practical benefits
over just using the probable analysis for Rit Solomon codes. But did you like the first
analysis to my knowledge, which actually manages to get beyond Johnson bound in any capacity
for like, uh, for a created agreement. Okay. So I guess there's still a lot of open questions in
this area, right? There's still what kind of codes. We know that some codes go beyond the Johnson
bound, but not the ones we're using right now. But also the ones that go beyond so far, not like
as efficient as the ones we have. So there's still like an open little area there, right?
Right. Just if I can quickly open a parenthesis about the paper, I was mentioning before Jack
went off, I had been just on the towel. If someone is looking for a small project, even for
students at our couriers, this paper is full of Easter eggs for negative results. So there are
like a lot of failure of a correlation agreement for special primes or for it's like a lot of
little examples that in my opinion, it can be extended a lot if someone can again like send that
for special families or primes and stuff. There are like some weird corner cases and there are many
little corner cases. And my opinion like if someone is looking for a nice little project to spend a
weekend, it's not. Listen up people. Is this just like something you would do pen and paper,
or is there something you can do like numerical examples of and like I can code something up and
try to find? It's a mixture of that. It's like it really like with that bit of help of
of course like sage or magma will be better. So you guys also haven't mentioned so far,
but you've also co-authored a paper on the topic. It's not specifically about proximity
gaps as far as I understood, but like a special case that is used in strikes.
Yeah, basically like what we did, like more focus on instances of start to actually deploy it.
Now if you go back quickly on how start was using the past, that in mind that they shared the
fact that they were working over bigger primes. So the field was like, I don't know, at the
beginning starts was using the same prime of the sides of like a nifty curve. So I was really happy
with that. The primes are recognized. If you look like now the stocks that are used in production,
they are like either 301 bits or 64 bits and you know for me at the beginning was like a bit
mind blowing that actually you could work on this prime field and still be saved. But what
not is with Jaco and after we learned that we were not the first one that noticed this property,
is that the alias bound that that before was mentioned by Jaco,
basically like could be seen in a multi-faceted way. Namely like usually you work over this
small fields, but you actually work over extension field because you know you need to work over
for example if you are working with the 31 bits prime you want to work on a six extension
field, fifth extension field and so on because we want our randomness to be at least 128 bits.
Exactly. We did more than one field element. Exactly. So what the first thing to
be noticed is like that actually in this case there are multiple alias bound because basically
the domain that you're working on is over the best field. But you work over the extension field.
And basically then you will have the multiple alias bound. You have the alias bound over the best
field that is actually smaller than alias bound over the extension field. Okay. So you're still
working on the code over extension field but with the domain you sample over the best field.
And basically what happens are having this multiple alias bound while weird things can happen
and Jacomo starts to to to apply a bit of a more theoretical framework there and yeah this
interesting results came out maybe I spent a few more words but basically we've seen that
there was about 10 bits degradation of security for a project and we're using for example
stark over 31 bits. Okay. And one thing which is interesting to talk about is attack and which
is also similar to the crisis to where the paper is that we're not really attacking the property
I was talking about before and this is the property of like you know batching claims.
If you do the security analysis for like we're like protocols there's another term that shows up
in the sound analysis which is related to a very classical problem in coding theory which is
the least size of a code. So basically the question is like I have like a corrupted code word so
I have some fraction of the code which is corrupted. In how many ways can I complete this code word
by replacing the corrupted part to get like to to get a valid code word. Right so if I were to
replace the errors with real symbols how many how many valid code words would be possible essentially.
Exactly. Exactly. And depending on how many fractions there are like if the fraction is very small
they're just not much I can do to get a valid code word so maybe there's only one single
closed code word. But once my fraction increases this is where I become in what's called the
least the coding range then there's many ways I could complete this this code word.
And so what our like paper shows apart is that for a lot of these
snarks this error term which depends on the least size is inherent. And in fact like as part of
the proximity prize we're going to have two problems. One of them is about bounding this mutual
created agreement property that I was talking about before and the other one is about the you
know by now six years old problem of like least the coding of its Solomon codes beyond Johnson
Bound. So we're going to have these two parts which are important and yeah and then attack as
Antonio was saying we were using the the fact that the least size of the extension field code can
be related to the least size of the base code. And the alias bound is like more effective
in the grading security when the when the alphabet size in this case the field is small.
And so for like practically the closed system designs I'm giving you like a better attack
than before. Okay so paper describes a concrete attack? No concrete. So this is that information
theoretical like they say like that information theoretical way in which you can show that this
protocol is less bit of security like it seems still plausible that like computationally it is
unfeasible to find like satisfying an attack to again the protocol. And I think it's a very
interesting problem to look at directly but this definitely like degrades the security of the protocol
from like being information theoretical to actually being a computation. Just to connect to the
the first attack then to check on the scope at the beginning for example they grew in a attack.
Yeah. You know there you had like three bits of security and the the gradation here you have like
additional 10 basically right. So then also that one I was not constructing they would say okay
there's this degradation but there was not there's not there's not there was not a constraint
destructive attacks okay you follow the steps. I see. I have a very stupid question which is
if we had our system we thought it was 128 bits now you have a three-bit attack a 10-bit attack
we're still way above a hundred bits of security why should I care. So there is this
psychologically threshold right I mean we in cryptography we are someone last time came
from the this came from but basically like we have this 128 psychologically threshold that we
don't want to go under because if you start to say go under say okay why 128 you know 120 180
when do you stop so we really chose to have like 120 bits I think the cryptography word I
agreed that 120 bits is the number we want to stay on right I agree that if we are 125 is not
the end of the world there's not really real attack but if we really want to stay on this we want to
call it psychological threshold let's call it like this and the after all the series of attack
is clear that we want to stay on the 120 bits probable security so I think that at the moment
people are gave up on the conjectural security because you know of all these things would happen
it right another like metapoint when you're deploying like a very large system like for example
Ethereum and you might need to choose a set of parameters that you hope you will not have to change
for like the next like you know maybe say 100 years then you you like things do not
we probably cannot give you surprises in these aspects security in the provable regime is like a very
like attractive kind of like landscaping that you I can put my foot down and say like look we
understand these things well enough that there will not be an attack if I use it in the
provable regime so the these small attacks they kind of like shaken our confidence because before
we also thought that in the conjecture regimes we understood things well enough but this turns out
not to to be the case okay so I guess that's the important part right is we thought we knew things
and then with anyone a lot of like security degradation was shown and so we can't be confident
that we know things anymore is that mostly what it is right indeed let me take the time to shield
a tool that has been developed by the cryptography by the yeah and basically like we have this tool
that we try to to compute the the soundness of the GKVM projects that it's called this tool is called
sound hulk and basically at the beginning we were supporting a conjectural cases so we were
actually the tool per se you could actually plug some configuration and and having and I mean like
the bits of security that are in the in the conjectural case or at capacity or for john's bound
or for unique recording but now like after all these tears of attack we actually don't support
anymore the the conjectural case we stay either in the unique bound or on the john's and bound
so and you know we don't we decide to give up completely on the conjectures at least like
clockwise so guys I think we're reaching kind of time I didn't want to I didn't want to end the
episode with a question about what comes next you know what questions are left open for the security
of post-comptum snarks and even maybe for the proximity price so this is a wonderful area I could
talk about this for a long time because I think there's a lot of super exciting questions regarding
this security so I think we can like kind of split them into very different buckets so there's
kind of like the overall security of edge-based snarks and snark in general yeah so this encompasses
this question like for example you know fiasco mere security like if I compile something in the
random article what kind of grantees can I get about the overall snark so there's been a lot of
progress on this there's the security of recursion this is a incredibly interesting and important
topic there's this is like kind of security of like the compilation process of like taking like a
an information theoretical tool like an IOP and then compiling it with medical trees
then there's the coding thready questions which I think there's also a lot to do like we
in the proximity price write up we're calling these things the grand challenge so there's the grand
challenges like really understanding for reach Solomon codes where does the face transition
happen where do we pass from somewhere where we can show security to someone where we have no
showing security so this is like the grand challenge but there's a lot of other interesting things
so for example there's you know which Solomon codes have been great but there's another beautiful
families of code like stillman codes or repeat accumulator accumulate codes what kind of properties
do they have for mutual credit agreement and at least the coding or what is that do they have even
this is also not particularly well understood by the way those are the codes using breakdown and
plays right exactly I have questions about the relation between primities like at least the coding
and curated agreement does one imply the other or not is there like we know from for example
crates in Stuart and like the benzazone carbon scoparty Schu-Raff and abok paper that there are
some relations between the two but it is so for example like proving mca seems to be as are for
reach Solomon codes as proving is the coding which is a very well studied question up to capacity of
course and but we don't really have like nice clean understanding of what the relation within the
two so these are also interesting questions there's a question of like more pertinent generators
this is something like Alexi Ignacio and Sara have been working on there is this very nice paper
by Cresciune and Cazanine on trying to exploit some properties of like twadic fields to try to get
like better attacks on mca there's a lot of very interesting like work which I think can be done
like in a sense there was a psychological barrier as Antonio was saying in another context that we
couldn't we didn't have the tools to do anything beyond capacity beyond johnson bounds sorry
and now with this flow of paper I showed that actually there was like a very like interesting
and like deep area of research beyond johnson bound and close to capacity where I think people
can do progress so and these are very beautiful they're very cute questions about polynomials
and I find it's extremely fascinating that this is just like really just polynomials so you
have plenty of work for the years to come and from a set as well as well like this is the field
of a hash based system probably you're already talking to this episode but I mean we have a new
new hash function we are using everywhere this like Poseidon 2 and this is a really central piece
of the security of all the system and yeah we just put another million on actually the
the security of Poseidon that I think is going to be exciting well guys thank you very much
for coming all the show it was a pleasure hosting you both thanks a lot Nico thank you for
adding us always pleasure I want to say thank you to the podcast team Anna Rachel Henrik Tanya
and Hector to those of you catching us in video thanks for watching enter our listeners thanks for
listening
