Loading...
Loading...

We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion.
People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops.
But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long.
During this time, what we know today as the better theory can actually make worse predictions.
And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy!
Watch on YouTube; read the transcript.
Sponsors
- Jane Street loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Street’s ResNet challenge and posted a great walk-through on X. If you want to try one of these puzzles yourself, there’s one live now at janestreet.com/dwarkesh.
- Labelbox can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train how it thinks, not just what it thinks. Whatever you’re focused on—math, physics, finance, psychology or something else—Labelbox can help. Learn more at labelbox.com/dwarkesh.
- Mercury just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. It’s a super low-friction way to stay on top of your business. Learn more at mercury.com/insights.
Timestamps
(00:00:00) – Kepler was a high temperature LLM
(00:11:44) – How would we know if there’s a new unifying concept within heaps of AI slop?
(00:26:10) – The deductive overhang
(00:30:31) – Selection bias in reported AI discoveries
(00:46:43) – AI makes papers richer and broader, but not deeper
(00:53:00) – If AI solves a problem, can humans get understanding out of it?
(00:59:20) – We need a semi-formal language for the way that scientists actually talk to each other
(01:09:48) – How Terry uses his time
(01:17:05) – Human-AI hybrids will dominate math for a lot longer
Okay, today I'm chatting with Terrence Tao, who needs an introduction.
Terrence, I want to begin by having you retell the story of how Kepler discovered the laws of planetary motion,
because I think this will be great jumping off point to talk about AI from math.
Okay, yeah, so I've always had amateur interest in astronomy, and so I've loved stories of how the earliest astronomers worked out the Nature of the Universe.
So Kepler was building on the work of Copernicus, who was himself building on the work of Out of Starkis.
So Copernicus very famously proposed the Hayley-O-Centric model, that instead of the planets and the Sun going around,
the Earth, that the Sun was at the center of the solar system, and the other planets were going around the Sun.
And Copernicus proposed that the orbits of the planets were perfect circles, and his theory kind of fit the observations that the Greeks and the Arabs and the Indians
that worked out over all the centuries.
I think Kepler got interested, like he learned about these theories in his studies, and he made this observation that the ratios of the size of the orbits that Copernicus predicted seemed to have some geometric meaning.
I think he started proposing that, you know, if you take, say, the orbit of the Earth, and you enclose it, and I think maybe a cube,
the outer sphere of that that encloses the cube, almost match perfectly the orbit of Mars, and so forth.
And there were six planets, none of the time five gaps between them, and there were five perfectly electronic solids,
with the cube, the tetrahedron, isocretion, octetion, and dorecation.
And so he had this theory, which he thought was absolutely beautiful, that he could inscribe these pletang solids between the spheres of the planets,
and it seemed to fit, and it seemed to him, like, you know, God's design of the planets was matching this mathematical perfection of the pletronic solids.
So he needed data to confirm this theory.
And at the time, there was only one really high-quality data set.
It almost had existence, which was the...
So Taikobrahi, the Danish astronomer, very wealthy, eccentric astronomer, had managed to convince the Danish government to fund this extremely expensive observatory.
It was, in fact, an entire island, where he had taken decades of observations of all the planets, Mars, Jupiter, every night,
or at least every night, for which the weather was clear.
With the naked eye, actually, it was the last of the naked eye astronomers.
And so he had all this data, which Kepler could use to confirm his theory.
And so Kepler started working with Taikobrahi.
Taikobrahi was very jealous of the data.
He only gave a little bit of bits a bit at a time, and I think Kepler eventually just stole the data,
he copied it, and had to have a fight with Taikobrahi's descendants.
But he did work out... he'd taken the data, and then he worked out to kind of his disappointment, that his beautiful theory didn't quite work.
The data was sort of off from his pletronic solid theory, about 10% or something.
And he'd had all kinds of fudges moving the circles around and things, and it didn't quite work.
But he worked on this problem for years and years, and eventually he figured out how to use the data to work out the actual orbits
of the planets.
And that was incredibly clever, genius amount of data analysis, exactly.
And then he eventually worked out that the ellipses, not circles, which was shocking to him.
And then he worked out the two levels of planetary motion, ellipses, also equal areas, sweep out the equal times.
And then 10 years later, after collecting a lot of data, the furthest planets, like Saturn and Jupiter, were the hardest for him to work out.
But then he finally worked out this third law also, that the orbits, the time it takes for a planet to go to this orbit, was proportional to some power of the distance to the Sun.
And these are the three famous Kepler laws of motion, and he had no explanation for them.
It was just all driven by experiment.
And it took Newton a century later to give a theory that explained all three laws at once.
The take I want to try on you is that Kepler was a high-temperature LLM, where Newton comes up with this explanation of why the three laws of planetary motion must be true.
And of course, the way that Kepler discovers the laws of planetary motion, or figures out the relative orbits of the different planets is, as you say, a work of genius.
But then, you know, if there's career, he's just trying random relationships.
And in fact, in the book in which he writes down the third law of planetary motion, it's sort of on the side, on the harmonics of the world, which is this book about, you know, all these different planets have these different harmonies, and the reason there's so much famine and misery on Earth is because the Earth is mefami, that's the note of Earth, and so all this random astrology.
But in there is the cube square law, which tells you what relationship the period has to a planet's distance from the Sun, which is, as you're detailing, if you add that to Newton's f equals m a, and then the equation for centripetal acceleration, you get the inverse square law.
And so Newton works that out, but the reason I, I think this is an interesting story is, I feel like LLM's going to do the kind of thing of like 20 years, let's try random relationships.
Some of which make no sense, as long as there's a verifiable data bank, like Brahis data set, where, okay, I'm going to try out random things about like musical notes, I'm going to try out random things about platonic objects, I'm going to all these different geometries have this bias that is, there's some important thing about the geometry of these orbits, and then one thing works, and as long as you can verify it, it can then draw these empirical regularities can then drive actual deeps, scientific progress.
Traditionally, when we talk about the history of science, idea generation has always been kind of the prestige part of science, so I mean, a scientific problem comes with, there's many steps, you know, you have to identify a problem, and then you have to identify a good problem to work on the fruitful problem.
And then you need to collect data, you need to figure out a strategy to analyze the data to make a hypothesis, and at this point, you need to propose a good hypothesis, and then you need to validate.
And then you need to write things up and explain that there's a dozen different components, but yeah, the ones we celebrate are these of Eureka genius moments of idea generation.
And yeah, so Kepler certainly had to, as I say, cycle through many ideas, and several which didn't work, and I bet many that he didn't even publish at all, because, yeah, they just didn't fit, and that's an important part of the process.
Trying all kinds of random things and seeing if they worked, but as you say, the, you know, the, if they have to match by an equal amount of verification, otherwise it's, it's slow.
I mean, we celebrate Kepler, but we should also celebrate Brahe for his, his, his, his, his, a cityist data collection, which was 10 times more precise than any previous observation, and it was that extra decimal point of accuracy was actually essential for Kepler to get his,
his, his, his results. And, you know, and he was using, you know, Euclidean geometry and, and like, like, the most advanced mathematics, he could use it at the time to, to match his models with the data.
So, like, all aspects had to be in play, you know, the data and the theory and the, the hypothesis generation.
I'm not sure nowadays that hypothesis generation is the bottleneck anymore.
Sciences has changed in the, in the century since. So, classically, sort of the, the two big paradigms for, for, for science, for theory and experiment.
Then in the 20th century, numerical simulation came along, and so he can also do computer simulations of, of, of, of, to test theories.
But then finally, in the late 20th century, we had big data. No, we, we had the, the, the error of data analysis.
And so, a lot of new progress is actually driven now by analyzing massive data sets first, collecting large data sets, and then drawing the patterns from them to, to, to do slots, which is a little bit different from how science used to work where you make a few observations, or you just have one,
out of the blue idea. And then you collect data to test your idea, that's the classic scientific method.
Now it's almost reverse. You collect big data first, and then you, you try to, to get hypotheses from it.
I mean, Kepler was maybe one of the first early data scientists, but, but even, even he didn't start with type of tacos data set and, and analyze it.
He had some preconceived theories first, but it's, it seems that this is less than this, the way we make progress in, in, in, yeah, just because, yeah, the data is just so much more massive is just so much more useful.
Oh, interesting. I actually feel like the more the 20th century science that you're describing is actually very well described with having Kepler where he did have these ideas.
1595 and 96 is where he comes up with first polygons and then platonic objects theory, but they were wrong. And then a few years later, he gets brought his data.
And it's only after 20 years of just trying random things that he gets this empirical regularity. And so it actually feels closer to brought his data is analogous to some massive data bank of simulations.
And then he now that you got the data, you can keep trying random things. But if it was and Kepler would be out there just writing books about harmonics and the platonic objects and there would be nothing to actually verify against.
Yeah, yeah, yeah. So the data was extremely important, but the distinction I was trying to make it was that sort of traditionally you make a hypothesis and then you tested against data.
But now with machine learning and data analysis and statistics and somebody you can you can start with data and through statistics work out laws that were not person before.
And so Kepler, so Kepler's third law is a little bit like this, except that for the third law instead of having the thousand data points that brought he had Kepler had like six data points to like every planet.
You knew the length of the orbit and the distance of the sun and there was like five or six data points and he did what we would now call regression.
You know, he could fit a curve to the six data points and he got a square group law, which was amazing. But actually he was quite lucky. I mean that these six data points gave him the right conclusion.
You know, it's that's not enough data to be really reliable. There was a later astronomer, Johannes Bauder, who took the same data, actually, the distances to the planets and inspired by Kepler.
I think he had a prediction that the distances the planets forms basically a shifted geometric progression that he also fit a curve.
Except there was one point missing. So there's a big gap between Mars and Jupiter. His law predicted that there's a missing planet. So it was a kind of a crank theory, except when Uranus was discovered by Herschel, the distance Uranus fit exactly this pattern.
And then series was discovered this asteroid between I think in the asteroid belt. And they're also fit the pattern. So people got really excited that that that board had discovered this is amazing new law of nature.
But then Neptune was discovered and it was completely like way off. And you know, and basically it was just a numerical fluke, you know, there were six data points.
Yeah, so maybe one reason why Kepler didn't highlight his third law as much as the first two losses that maybe instinctively, even though we didn't have modern statistics, he kind of knew that with six data points, he had to be somewhat tentative with the conclusions.
But maybe to ask the question about the analogy more explicitly.
Does this analogy make sense to if we have, you know, in the future, we'll have smarter and smarter AIs and we'll have millions of them. And then they can go out and hunt for all these empirical regularities.
It sounds like you don't think the bottleneck in science is finding more things that are for each given field, their equivalent of the third law of planetary motion, so that then later on, somebody can say, oh, we need a way to explain this.
Let's work out the math. Here is the inverse square law of gravity.
Right. So I think AI has basically driven the cost of idea generation down to almost zero.
In a very similar way to how the internet drove the cost of communication down to almost zero, which is an amazing thing, but it doesn't make, it doesn't create abundance by itself.
So now the bottleneck is different. So we're now in a situation where suddenly people can generate thousands of theories for a given scientific problem.
And now we have to verify them, evaluate them. And this is something which we have to change our structures of science to actually sort this out.
So in fact, traditionally, we build walls. So in the past, before we hit AI Slop, we had sort of amateur scientists create, have their own theories of universe, many of which were basically of very little value.
And so we've bought these like peer review publication systems and things to kind of filter out and try to isolate the high signal ideas to test.
But now that we can generate these possible explanations at massive scale and some of them are good and a lot are terrible.
I mean, human reviewers, we just, they're already being overwhelmed actually. I mean, many, many journals are reporting AI.
During submissions, I just are just flooding their submissions. So it's great that we can generate all kinds of things now with AI, but it means that we have to the rest of the rest of the aspects of science have to catch up.
So verification, validation, and assessing what ideas actually move this up to forward and what we're trying to get dead ends or red herrings.
And that's not something we know how to do at scale.
You know, for each individual paper, we can discuss it with, you know, I have a debate among scientists and get to consensus in a few years.
But when we're generating, you know, 1000 of these every day, yeah, this doesn't work.
So I think there is this currently interesting question of if you have billions of AI scientists.
Not only how do you gauge which ones are real progress, but how do you, I mean, this is actually a question that human sciences have to face and we've solved somehow.
I actually am not sure how we solve this, but in any given field, less than in their 1940s, and there's, if you're a Bell Labs or if you're just generally trying to do these new technologies coming out of postcode modulation, basically, how do you transfer signals, how do you digitize signals, how do you transform over analog wires.
And then, but there's like all these papers about the engineering constraints there and the details and then there's one, which is like comes up with the idea of the bit, which has implications across many different fields.
And you need some system, which can then look at that and say, okay, we need to apply this to probability, we need to apply this to computer science, et cetera.
And in the future, the AI's are coming up with the next version of this kind of unifying concept.
And how would you identify it among millions of papers, which might actually constitute progress, but which have much less general unifying ideas.
A lot of us, the test of time. So many great ideas didn't actually get a great reception at the time that they were first proposed.
It was only after some other scientists realized that they could take it further and apply them to their own deep learning itself was actually a niche area of AI for a long time.
The idea of getting artists entirely through training on data and not through first principles, you know, reasoning was was was very controversial and then they would just took a long time before it actually started bearing fruit.
You know, you mentioned the bit, you know, I mean, there were there were other proposals for computer architectures than the zero one that is universal today.
I think there were there were tricks, you know, zero one three valued logic and you know, in an alternate universe, maybe a different paradigm would have would have showed up.
People have argued that, you know, the transformer, for example, is is the foundation of all modern large language models.
And it was the first deep learning architecture that really was was sophisticated after capture language, but it didn't have to be that way.
There could have been some other architecture that was the first to do it. And once that was adopted, it would become the standard.
So I think one reason why it's hard to assess whether a given idea is going to be fruitful is that it depends on the future.
It depends on, and it depends on also on the culture and society, like, like, which ones get adopted, which ones don't, you know, the base 10 new system in mathematics extremely useful, much better than the Roman new system, for instance.
But again, there's nothing special about 10. It's a system that we it's useful for us because everyone else uses it.
And we've standardized it and we've brought all our computers and our number of representation systems around it.
And so we're stuck with it now, actually, you know, people are some people occasionally pushed for other systems than decimal, but it's this, there's no, this is no, there's too much inertia.
So you can't look at any given scientific achievement, purely in isolation and give it an objective grade.
Without being aware of the context, both in the past and the future.
And so it may never be something that you can just reinforcement learn the same way that you can for much sort of more localized problems.
Yeah.
It seems often in the history of science when what when a new theory comes up that in retrospect, we realize is correct.
It seems to make implications that just either make no sense because they're wrong.
And we realize later on, rather wrong, or they're correct, but seem wildly plausible at the time.
So as you talked about, Aristarchus had heliocentrism in the third century BC.
And then the ancient Athenians were like, this can't be because it wouldn't if the earth is going around the sun.
We should see the relative position of the stars change as we're going around the sun.
And the only way that wouldn't be the case is if they're so far away that that you don't notice any parallax, which is actually the correct implication.
But there's times when actually the implication isn't correct and we just need to graduate to a better level of understanding.
So a lightness would, you know, try Newton and disagree with you and see your gravity on the basis that it implied action at a distance.
And then there's, we don't know the mechanism and Newton himself was sort of stunned that inertial mass and gravitational mass were the same quantity.
So all these things, they're resolved by Einstein.
Yes, yes.
But it was still progress.
And so the question for a system of pure euphoria would be, even if you can falsify a theory, how would you notice that it still constitutes progress relative to the thing before?
Often actually the ultimately correct theory initially is worse in many ways.
So Copernicus' theory of the planets, it was less accurate than Tom Lee's theory.
So geocentrism had been developed for, you know, a millennium at that point.
And they had many, many tweaks and a very increased and complicated ad hoc fixes to make it more and more accurate.
And Copernicus' theory was a lot simpler but much as accurate.
Not only a couple of that made it more accurate than Tom Lee's theory.
I mean science is always a work in progress.
So when you only get part of the solution, it looks worse than a theory which is incorrect, but somehow if it has been completed at clearly a point where it kind of answers all the questions.
As you say, you know, Newton's theory had had big mysteries, you know, the coordinates of mass and accurate distance which were only resolved with a very conceptually different approach centuries afterwards.
Often progress has to be made not by adding more theories but by deleting some assumptions that you have in your mind.
One reason why geocentrism held on for so long is we had this idea that objects naturally want to stay at rest.
This is the Aristotle in the notion of physics.
And so the idea that Earth was moving and how can we also fall over.
You know, once you have Newton's motion, you know, object motion remains in motion and so forth, then it makes sense.
But you had to conceptually, it's a very big conceptual leap to it to realize that the Earth is in motion.
It doesn't feel like it's in motion.
And like the biggest advances, you know, Darwin's theory of evolution, you know, is the idea that species are not static.
But you know, it's not obvious because you don't see evolution in your lifetime.
Well, now we can actually can, but you know, it seems permanent and static.
You know, right now we're going through and talking to a version of the Copernican Revolution where we think that human intelligence is the central universe.
And now we're actually seeing that there's very different types of intelligence that are out there with very different strengths and weaknesses.
And so our assessment of which tasks require intelligence which ones don't has to be reordered quite a bit.
And so, you know, it's trying to fit AI into sort of our theories of scientific progress and what is hard and what is easy.
We're struggling quite a lot. We have to ask questions that we've never been asked before.
Or maybe the philosophers had, but now we all have to deal with it.
This actually brings up a topic I've been very curious about.
So you mentioned Darwin's the revolution.
There's this book, The Clock of Universe by Edward Dalnack, which covers a lot of this era of history we're talking about.
And he has this interesting observation in there that the origin of species is published in 1859.
The Principia Mathematica is published in 1687.
So the origin of species comes out basically two centuries after the Principia.
And conceptually it seems like Darwin's theory is simpler.
There's a contemporaneous biologist to Darwin who reads the origin of species Thomas Huxley and says,
how stupid not to have thought of that.
And nobody ever says that about Principia.
The Shiding themselves are not having to be immune to gravity.
And so there's a question of, well, why did it take longer?
It seems like a big part of the reason is that the evidence for natural selection is cumulative and retrospective,
whereas Newton can just like, here's, here's my equations.
Let me see the Moon's orbital period and its distance.
And if it lines up, then we've made progress.
And so Lucretius actually had the idea, this idea that species adapted their environment in the first century BC.
But nobody ever really talks about it until Darwin, because Lucretius can't run some experiment
and people are forced to pay attention.
And so I wonder if wheel and retrospect end up seeing much more progress in domains,
which have this kind of tight data loop where you can verify them quite easily,
even though they're conceptually much more difficult.
I think one aspect of science is it's not just creating a new theory and validating it,
but communicating it to others.
So Darwin was actually an amazing science communicator.
He wrote in English in natural language times, people like it.
So in Nolene, I have to sort of get out of my technical mindset.
He spoke in plain English, you know, didn't use equations.
And he synthesized a lot of, you know, disparate facts yet.
Little pieces of evolution had been worked out in the past,
but he had this very compelling vision.
And again, still missing things like he didn't know the mechanism for her editorate.
He didn't have DNA.
But his writing style was persuasive, and that helped a lot.
Newton wrote in Latin, he invented, you know, entire new areas of mathematics just to explain what he was doing.
He was also from an era which was where scientists were much more secretive and competitive.
So, you know, academia is still competitive, it was even worse back in Newton's day.
So he held back some of his best insights because he didn't want his rivals to get any advantage.
He was also actually someone I'm fuzz in person from what I gather.
So it was actually only a couple decades after Newton where other scientists explained his work in much simpler terms
that they became a widespread.
So, yeah, the art of exposition and making a case and creating a narrative is also a very important part of science.
And if you have the data, it helps, but people need to be convinced otherwise they will not push it further.
Or they want to take an initial investment to learn your theory and really explore it.
And that's another thing which is really hard to reinforce and learn on.
How can you score, how persuasive you are?
Well, there's the entire marketing departments we're trying to do.
So maybe it's good that AI are not yet optimized to be persuasive.
So, yeah, there's a social aspect to science.
Even though we pride ourselves on having an objective side to it where there's data and there's experiment and validation,
we still have to tell stories and convince our fellow scientists.
And that's a soft squishy thing.
It's a combination of data and painting a narrative and it's been out of the gaps.
So even Darwin has said there are pieces of a theory he cannot explain.
But he could still make a case that in the future people would would would find transitional forms that they would find the mechanism of inheritance and they did.
Yeah, I don't know how you can quantify that in such a precise way that you can start to reinforce something.
Maybe that was to be forever the human side of science.
Take away I had from reading and watching your stuff on the cosmic distance ladder.
By the way, I highly, highly, highly recommend people watch your series with through the one round on the cosmic distance ladder.
But when take away was that the deductive overhang in many fields could be so much bigger than people realize where if you just said the right insight about how to study a problem.
You might be surprised at how much more you could learn about the world.
And I wonder if you think that's sort of a product of astronomy at the particular times in history that you're studying or is just that based on the data that is incident on the earth right now, we could actually divine a lot more than we happen to know.
Right. So astronomy was one of the first sciences to really embrace data analysis and and squeezing every last possible drop of information out of the information that had because because data was the bottleneck.
I mean, it still is the bottom. I mean, it's really hard to collect astronomical data.
Astronomers are the best or almost world class in extracting almost like Sherlock.
They're extracting all kinds of conclusions from the little traces of data.
I hear that a lot of quantum hedge funds, they're preferred higher as an astronomy PhD.
They also are very interested for other reasons in extracting signals from various random bits of data.
Okay, speaking of clever ideas, one of my listeners, Sean, solved the puzzle that Jane Street made for my audience and posted a great walkthrough on X.
For context, Jane Street trained her as net and then shuffled all 96 layers and then challenged people to put them back in the right order using only the model's outputs and training data.
You can't brute force this. There's more possible orderings than atoms in the universe.
So Sean broke the problem into two different parts. First, pair the layers into 48 different blocks and second, put those blocks in the right order.
For pairing, Sean realized that in a well-trained resonant, the product of two weight matrices in a residual block should have a distinctive negative diagonal pattern.
And this arises as a way for the model to keep the residual stream from growing out of control.
From this insight, he was able to recover the right pairings.
For ordering, Sean noticed that the model seemed to improve if he sorted the blocks by the size of the residual contributions.
Starting with that rough approximation, he combined a clever ranking heuristic with local swaps to recover the exact right order.
His full walkthrough is linked in the description. Don't worry if you didn't get to this puzzle in time though.
There's still one up about Factored LLM's that even Jane Street doesn't know how to solve.
You can find it at jainstreet.com slash thwarkash. All right, back to Terrence.
We do under explore sort of how to extract extra information from various signals.
Like I just put to pick one random study. I remember reading once that the people had discovered we're trying to measure how often scientists actually read these citations.
So how do you measure this? You could try to survey different scientists, but they had some clever trick.
So many citations have little typos. A number is wrong or a contribution is wrong.
And they measured how often a type word got copied from one reference to the next and they could infer whether an author was actually just copying and pasting a reference without actually checking it.
And so from that, they were able to infer some some measure of sort of how much attention people were paying.
So there are also clever tricks to extract.
So these questions you posed earlier of how can we assess whether a scientific development is fruitful or interesting or representative progress.
You know, maybe there are really useful metrics and or footprints of this of this of this phenomenon in data data.
So we can we can examine citations and and how often something is mentioned in the conference or something.
And maybe that there is there's a lot of social sociology of science research to be to be done. And that could actually detect these things.
Yeah, but we do forget the most problems on the case that you.
Okay, so I think this brings us nicely to the progress that from the outside, it seems like AI for math is making.
And I think you had opposed recently we pointed out that over the last few months, AI programs have solved 50 out of the 1100 odd or those problems.
But then I think I don't know if it's still correct, but as of a month ago, you said that there had been a pause because the low hanging fruit had been picked.
First of all, I'm curious of actually that is still the case that we have picked the low hanging fruit and now we're now we're at this plateau currently.
It does seem so. I mean, there's so activity at the other.
Yeah, so so 50 odd problems have been solved with AI systems, which is great, but there's like 600 to go.
And people are still keeping away at one or two of these right now.
And we're seeing a lot fewer sort of pure AI solutions now where the AI just one shots the problem.
So there was a month where that happened and that has stopped.
Not for lack of trying. I know three separate attempts to get frontier model AI to just attack every single one of the problems so much.
And they picked up some minor observations or maybe they they found that some problems already so from the literature, but there hasn't been any further AI purely powered solution yet.
People are using AI a lot currently. So someone might use AI to generate a possible proof strategy.
And then another person will use an separate AI tool to critique it.
Or we write it or generate some numerical data for it or do a literature survey.
And some problems have been solved by an ongoing conversation between lots of humans and lots of AI tools.
But it does seem like it was this is one off thing.
So maybe one analogy to these problems is like, imagine like there's all these that you're in some sort of mountain range of all kinds of cliffs and wars.
And maybe there's a there's a little wall, which is maybe like three feet high and one of this six feet high and then there's 15 feet high and then there's there's some mile high cliffs.
And you're trying to climb as many of these cliffs as possible.
But it's in the dark. We don't know which ones are told which ones are short.
And so, you know, we try to light some candles and make some maps and slowly we kind of figure out some of them are climbable.
Some of them we can identify some some partial track in the wall that you can reach first.
And then these these AI tools, they're kind of like these jumping machines that can kind of jump, you know, two meters in the air, you know, higher than any human.
And sometimes they jump in the wrong direction and sometimes they crash, but sometimes they they they can reach.
The tops of of the lowest, you know, walls that we couldn't reach before.
And so we basically set them loose in this mountain range hopping around and, you know, and then there's this exciting period where they could actually find all the all the low ones and they could reach them.
But then there's been of, you know, I mean, maybe if the next time there's a big advance in the models, then they will try it again and maybe a few more will be will be breached.
But it's a different style of doing mathematics than sort of the, you know, so normally we were to hill climb and, you know, we would we would make little markers and try identify partial things and, you know, these tools they either succeed or they fail.
And they they've been really bad at creating sort of partial progress or identifying intermediate stages that you should focus on first.
Again, going back to this this previous discussion, you know, we don't have a way of evaluating partial progress.
Yeah.
The thing that we could you can evaluate a one shot success or failure of solving a problem.
So there's two different ways to think through what you've just said and one of them is more bearish on the progress and one of them is more bullish and bearish on being.
Oh, they're only getting to a certain height of wall, which is not as high as humans are reaching.
And the second is that well, they have this powerful property that once they achieve a certain water line, they can fill every single problem that is available at that water line, which we simply can't do with humans where we can't make a million copies of you and give each of them a million dollars of inference compute and have you do a hundred years of subjective time research on a hundred different problems at the same time or a million different problems at the same time.
But once a eyes reach Terence Tower level, they could do that.
And once they reach intermediate levels, they could do they could do the intermediate version of that. So.
The same reason that we should be bearish now is the reason we should be especially bullish, not even when they achieve super human intelligence, but just when they achieve human level intelligence because they're human level intelligence is qualitatively wider and more powerful than our human level intelligence.
I agree. Yeah. So they excel at breadth and humans excel at depth, like human experts, at least. Yeah. So I think they're very complimentary.
But our current way of doing math and science is focus on depth because that that's where the human expertise because humans can't do breath.
But yeah. So we have to redesign the way we do science to take full advantage of of this breadth capability that we now have.
So as I said, we do, we should have a lot more effort in creating very broad classes of problems to work on rather than one or two really deep important problems.
I mean, we should still have the deep important problems and humans just to be working on them.
But but now we now we have this other way of of of of doing of doing science.
I mean, we can explore entire new fuels of science by by first getting the these broad model of the competent AI to sort of map it out and clear out all the easy, make all the easy observations.
Okay. And then identify certain islands of difficulty, which, you know, then human experts can come and work on. So I see very much a future of very complimentary science.
Eventually, you would hope to get both breadth and depth, you know, and somehow get the most best best best best of both worlds.
But I think we need practice with the breadth side. I guess such is too new. We don't even have the paradigms really to to to make full advantage of it, but we will.
And then science would be unrecognizable after that.
To this point about complementarity, the programmers have noticed that they're way more productive as a result of these actuals.
And I don't know if you as a mathematician feel the same way, but it does seem like one big difference between vibe coding and vibe researching is that with software, the whole point of the thing is to have some effect on the world through your work.
And if it leads to you better understanding a problem or you coming up with some clean abstraction to embody in your code that is instrumental to the end goal.
Whereas maybe with research.
The reason we care about solving the Millennium Pripes problems is presumably that in the process of solving them, are we discover new mathematical objects or better new techniques and those who understand our civilizations understanding of mathematics.
But the proof is sort of instrumental to the intermediate work.
I don't know if you agree with that dichotomy or if that in any way will explain the relative uplift we'll see in software versus research.
Right. So certainly in math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring the progress.
I think even in software, there's different types of software tasks. I mean, you know, like if you just kind of create a web page that does the same thing that 1000 other web pages do.
There's sort of no skill to be learned. Well, there's sort of some school, maybe that the infrastructure program that could pick up, but you know, for kind of boilerplate type code, definitely.
Yeah, it's, it's, it's, it's something that you should definitely offer off of the AI.
But you know, sometimes once you make the code, you know, you still can maintain it and, and, and, and, and this issues with upgrading it and making compatible with other things.
And, and that, I think, I, I, I feel that that program is our reporting, you know, that even if an AI can create the first prototype of a, of a tool, making it mesh with everything else and, and making it interact with the real world in the way they want.
I mean, it's that's an ongoing process. And if you didn't have the skills of that you pick up from from from writing the code, that made that may impact your ability to maintain it down the road.
So, certainly, mathematicians, you know, we've, we've used problems to build intuition and, and to, to train people to, to have a good idea as, as what's true, what, what to expect, what is, what is, what is, what is difficult.
And so, yeah, just getting the answers right away may actually, yeah, inhibit that process.
I mean, so, as, I mean, just think of vision theory and experiment before. So, in most sciences, there's a equal division between, there's a theoretical side and experimental side.
But in math has been almost unique is that it's almost entirely theoretical. We, we, we, we are, we pass a premium on sort of trying to, to have coherent, clean theories of, of why things are true and, and false.
And we haven't done much experiments as to, like, you know, maybe we have two different ways to solve a problem, which one is more effective.
We have, we have some intuition, but we haven't done large scale studies where we take a thousand problems and we, and we just test them.
But we can do that now. So, I think AI type tools we really will actually revolutionize the experimental side of math, where, where, you don't care so much about individual problems and, and the process of solving them.
But, yeah, you, you want to gather just large scale data about, about what, what things work, what things don't.
You know, same way that if, if you want to, if you're a software company and, and you want to, to roll out a thousand pieces of software, you don't really want to handcuff each one and learn lessons from each.
You just want to find what are the workflows that you scale.
So, we, we don't yet, we, we, we, the idea of doing mathematics at scale is at its infancy. But that's where AI is really going to revolutionize the subject.
Interesting. I feel like a big crux in these conversations about how much, how good AI will be for science is, I think you said this. Like, they, they're using existing techniques and modifying them.
And it would be interesting to understand how much progress one can make simply from using existing techniques. Like, how much of, if I looked at the top match journals, how many of them are, how many of the papers are coming up with whatever coming up with the technique means doing that versus using existing techniques.
And, and new problems. And what the overhang is where if you just applied every known technique to every open problem, would that just constitute a humongous uplift in our civilizations knowledge or would that not be that impressive and useful.
It's, this is a great question. And we don't have the data to fully answer it yet.
Certainly, a lot of work that human mathematicians do, you know, when you, when you take a new problem on the first things we do is we just find, we, we look at all the standards, things that work from similar problems in the past and we try them one by one.
And sometimes that works. And that's still worth publishing sometimes because the question was important. Sometimes they almost work and you have to add one more wrinkle to it. And that's also interesting.
But then, you know, the papers that go into the top journals are usually ones where you, you know, the existing methods can kind of solve 80% of the problem, but this is 20% which is resistant.
And a new technique has to be invented to fill in the gaps.
It's very, very rare now that a problem gets solved with sort of no reliance on past literature, where all the ideas come out of, of, of nowhere.
You know, that was more common in the past, but the math is so mature now that it's, it's, it would, it's just so much of a handicap to, to, to, to not use the literature first.
So, yeah, AI tools are really good at getting really good at the first part of it, just trying all the static techniques on a problem.
Often now actually making fewer mistakes in implementing them than than humans. They still make mistakes, but, but, but, I've tested these tools, you know, on, on, like, little tasks that I can do.
And sometimes they pick up errors that I make. Sometimes I pick up errors that they make. It's, it's about a tie right now.
But, yeah, I haven't yet seen them take the next step, you know, so, so when there are holes in, in the argument where none of the things are working to, to, how, then what do you do?
And then they can kind of suggest random things and it, but it, it, it, often I find that trying to chase them down to make them work, and finding they don't work, it waste more time that saves.
So, now, so, I think some fraction of problems that we currently think are hard will, will fall from this, this method.
I mean, especially the ones that haven't received enough attention.
So, like, with the early problems, you know, almost all of the 50 problems that were solved by AIs were ones for which basically there was no literature.
I mean, it was supposed to have more so twice. I think maybe some people tried to casually and they couldn't do it, but they never wrote up anything.
But it turned out that there was a solution, and it was just, you know, maybe combining this one obscure technique that, that not many people know about with some other result in literature.
And that's the kind of love, the median level for what AIs can accomplish. And that, that's really great. It clears out 50 of these problems.
So, I think you will see some isolated successes.
But the six, but what we found, so people have to have done large scale sweeps of these other problems.
And like, if you only focus on the success stories, the ones that get broadcast on social media, it looks amazing.
You know, like, you know, all these problems that haven't been solved before for decades, now that now they're falling.
But whenever we do a systematic study, any given problem, an AI tool has a success rate of maybe one or two percent.
It's just that they can buy a scale. And if you just pick the winners, it looks great.
So, I think there'll be a similar thing happening with, you know, there are hundreds of really prestigious difficult math problems out there.
A couple may make, you know, some AMA get lucky and I keep solving them. And there was there was some some backdoor to solve the problem that everyone else missed.
And that will get a lot of publicity.
But then people will try these fancy tools on their own favorite problem.
And they will again experience the one to two percent success rate.
Right.
So, there will be a lot of noise amongst the signal of sort of when they're working, when they're not.
We have to do, yeah, it's, it's, it's increasingly will be increasingly important to collect these really standardized data sets.
You know, there are efforts now to create a standard sort of challenge problems for, for AI to solve.
And not just rely on the AI companies to only publish their wins and not disclose the negative results.
So, that will maybe give more clarity as to where we're actually at.
Well, I think it's worth mentioning how much progress in AI constitutes already to have models that are capable of applying some technique that nobody had written down as applicable to this particular problem.
The progress is simultaneously amazing and disappointing.
It is, it is a very strange feeling to, to, to see these tools in action.
And, and, you know, but it also be, we put a climatize really quickly.
You know, I remember when, when, when Google's web search team up 20 years ago.
And it just blew all the others out of the water.
Like, you're just getting relevant hits on the front page.
Like, perfectly, almost, you know, exactly what you wanted.
And it was amazing.
And then after a few years, you just took over granted that you could, you could just Google anything.
And, yeah, so a lot of, yeah, I mean, 2026 level AI would be stunninged in 2021.
And a lot of it, you know, face recognition, natural speech.
Yeah, doing, you know, college level math problems.
We just take for granted now.
Right.
Yeah.
Okay, so speaking of 2026, yeah, you made a prediction in 2023.
I think by 2026, what was it that it would be like the colleague in mathematics or?
Yeah, I trust worthy co-author if used correctly.
Yeah.
Which is looking pretty good in retrospect.
Yeah.
I'm pretty pleased.
Yeah.
So, you know, let's, let's even continue the streak.
You personally are 2x more productive as a result of AI.
What year would you say that?
Yeah.
So productivity, I think, is not quite a one dimensional quantity.
Like, I'm definitely noticing that the style in which I do mathematics is changing quite a bit.
And the type of things I do.
So, for example, my papers now have a lot more code, a lot more pictures.
Because it's so easy to generate these things now.
So, some plot, which I've taken me hours to do now, I can do in minutes.
But in the past, I just wouldn't have put the plot in my paper in the first place.
I would just talk about it in words.
So, it's hard to measure what 2x means.
So, yeah, on the one hand, you know, I think the type of papers that I would write today,
if I had to do them without AI assistants, they would definitely take five times longer.
Interesting.
But I would not write my papers that way.
Five x.
So, yeah.
But it's because these are sort of auxiliary, I mean, you know, the, you know,
so things like doing a much deeper literature search,
supplying a lot more numerics.
Yeah.
I mean, they, they, they, they, they enrich the paper.
So, yeah, the core of what I do, like actually solving the most difficult part of a math problem,
that hasn't changed too much.
I still use pen and paper for that.
But, you know, there's a lot, there's a lot, lots of silly things.
I, I use an AI agent now to, to reformat, like sometimes,
all my parentheses are not quite the right size.
You know, I use the manually changed in my hand.
I can get an AI agent to sort of do all that quite nicely now in the background.
So, yeah, they, they, they really sped up lots of secondary tasks.
They haven't yet sort of sped up the core thing that I do.
But it's allowed me to sort of add more things to, to, to my papers.
Yeah.
But by the same token, like if I were to write a paper I wrote in 2020 again,
and not add all these extra features, but just have something of the same sort of level of functionality,
you know, then it actually doesn't have such as that much to be honest.
Yeah, so it's made, made the papers sort of richer and broader,
but not necessarily deeper.
You made this distinction between artificial cleverness and artificial intelligence.
And I would like to better understand those concepts.
What is an example of intelligence that is not just cleverness?
Yeah, so it's intelligence is famously hard to define.
It's one of these things that you can't ignore when you see it.
But when I, when I talk to someone, and we try to just collaboratively solve a math problem together,
there's this conversation where, you know, we, neither of us knows how to solve a problem initially,
but one of us has some idea and it looks promising.
And so then we have some sort of prototype strategy, and then we test it,
and then it doesn't work, but then we modify it.
And there's some adaptivity and, and, and, and continue improvement of, of, of the idea over time.
And eventually, you know, we sort of, we've, we've just mapped out what doesn't work,
what does work, and, and, and we can kind of see a path forward,
but it's evolving with our discussion.
And this isn't not quite what the air is, the air is kind of mimic this a little bit.
So to go back to this analogy of, of these jumping robots, you know, so, you know,
they can jump in fail and jump in fail and jump in fail,
but what they can't do is they kind of jump a little bit and they,
they reach some handhold, and they, but then they sort of stay there,
and then they pull out people up, and then they try to jump from there.
There, there isn't this cumulative process, which is sort of built up interactively.
It, it, it seems to be a lot more trial and error and just repetition,
brute force, you know, which can, you know, it scales,
and it can work amazingly well in, in certain contexts.
But, yeah, this, this idea, this, this sort of building up,
cumulatively from, from partial progress is kind of, is what's still not quite there yet.
Interesting. You're just a, if a Gemini through your cloud,
4.5, whatever, solves a problem.
Yeah.
It is not the case that it's own understanding of math as progressed,
or even if it works on a problem without solving it,
it's not that it's own understanding of math as progressed.
Yeah, you want a new session is forgotten what it just did.
Right.
It has, you know, it has no new skills to attach to, to, to build on, on, on related problems.
Maybe what you just did is part of one zero point zero is a one percent
of the training data for the next generation.
So maybe eventually some of it gets absorbed that, yeah.
So Terrence talks about the importance of decomposing, particularly in the early problems,
into a series of easier chunks.
Even if this doesn't result in a full solution, approaching problems in this way helps you
build up the intuitions and practice the techniques that you'll need to keep making progress.
But models today tend to struggle with these kinds of problem solving techniques.
That's where they will box comes in.
Liberal box helps you train models not just to get the right answer,
but to think the right way.
They've operationalized these reasoning behaviors into rubrics,
giving you the ability to evaluate every important dimension of a model's output.
These rubrics go beyond simple correctness.
Did the model reach for the right tools?
Did it check its own work and explore alternative paths?
How clear was its response?
These skills are useful across domains.
Math, physics, finance, psychology, and more.
And the becoming increasingly important as models take on harder open-ended problems,
some of which have multiple solutions and some of which we don't even know the solutions too.
Liberal box can get your rubrics tailored to your domain,
helping you systematically measure and shape how your models think.
Learn more at labelbox.com slash the workache.
One big question I have is, how plausible is it that if we just keep training AIs
and get better and better at solving problems in lean,
that they will continue to solve more and more impressive problems,
and then we will in retrospect be surprised at how little insight
we got from some lean solution to proving the Riemann hypothesis or something.
Or do you think it is a necessary condition also in the Riemann hypothesis,
even by an AI that is like totally doing it in lean,
that the constructions which are made, the definitions which are created,
even in the lean program, have to advance our understanding of mathematics.
Or do you think it could just be assembly could Google the Gook?
Oh, yeah, we don't know.
I mean, some problems have been basically solved by pure brute force,
a four-color theorem is a famous example.
We have still not found a conceptually elegant proof of this theorem.
It basically, and maybe we're never wrong.
I mean, some problems may only be solved over all by just splitting
some enormous number of cases and doing brute force
an insightful computer analysis on each case.
I mean, part of the reason that we fries problems like the Riemann hypothesis
is that we're pretty sure that something amazing
has to, a new type of mathematics has to be created,
or a new connection between two previous and unconnected areas of mathematics
has to be discovered to make this work.
We don't even know what the shape of the solution is,
but it doesn't feel like a problem that will be solved just by
exhaustively checking cases or something.
I mean, it could be false, actually.
So we could actually, there is an unlikely scenario
that the hypothesis follows, and that's just this,
this is the computer.
Oh, here's a zero off the line,
and a massive computer calculation verifies it.
That would be very disappointing.
I don't know.
I do feel that, you know,
fully autonomous one-shot approaches are not the right approach
for these problems.
I mean, I think you will get a lot more mileage out of the interplay
between humans collaborating with these tools.
And I can see one of these problems being solved by some smart humans
assisted by some extremely powerful AI tools.
But the exact dynamic may be very different from what we envisioned right now.
I mean, it could be a collaboration of a type that we just doesn't exist yet.
Yeah, I mean, there may be a way to generate, you know,
a million variants of the humans data function
and do some data analysis, AI as a data analysis.
And we would discover some pattern between connecting them,
which we didn't know about before.
And this lets you transform the problem into a different area of mathematics.
I mean, there could be all kinds of scenarios.
So suppose the AI figures it out and latent in the lean
is some brand new construction,
which, you know, if you realize the significance,
we would be able to apply it in all these different situations.
How do you recognize it, right?
Like if you just, again, a very naive question,
but if you come up with the equivalent of like,
the card comes with this idea,
oh, you can have this coordinate system where you can unify algebra and geometry.
But in lean code, it would just look like R to R
and it would look that significant or something.
Or similarly, I'm sure there's other constructions
which have this kind of property.
Well, the beauty of formalizing a proof in something like lean is that
you can take any piece of it and study it atomically.
So, you know, so when I read a paper with my humans,
which shows some some difficult problem,
you know, there's often some big sequence of lemurs and theorems and things.
And so, ideally, the author will talk their way through,
you know, what's important, what's not.
But sometimes they don't reveal what steps were the important ones,
in which ones are just kind of boilerplate standard steps.
But you can study each lemur in isolation.
And some of them, I can say, oh, this looks very standard.
This resembles something I'm familiar with.
I'm pretty sure there's nothing interesting going on here.
But this lemur, oh, that's something I haven't seen before.
And I could see why if you could, if you had this result,
that would really help prove them in result.
Like, you could, you know, you can assess whether some things up
are really sort of key to your argument or not.
And lean really facilitates that.
You can, you can, you can, you can, you know,
the individual steps are, I don't know, really precisely.
I think in the future, there'll be, you know,
there'll be entire professions of mathematicians who might take a giant,
lean generally proof and maybe, you know, do some ablation on it
or something. I'll try to remove steps of parts of it
and try to find it, find more elegant ways.
You know, maybe it's some other AI's to sort of do some reinforcement learning.
How can you make the proof more elegant?
And, and maybe other AI's were great,
whether this is this proof looks better or not.
One thing that will change quite a bit in the near future is,
is that until recently, writing papers was the most time consuming
and expensive part of the job.
And so you did it very rarely.
You know, you, you only wrote up your results once everything was
all the other parts of your argument were, were checked out and things.
Because it's just rewriting it again, refactoring was just a total pain.
But that's one thing that's become a lot easier now with modern AI tools.
So, you know, you don't have to have just one version of your paper.
You, you know, you can, once you have one, you know,
people can generate hundreds more.
So, yeah, one giant messy lean proof may not be very meaningful
or understand what on its own.
But, but other people can, can, can, can refactor it
and do a kinds of, of, of, of things with them.
We have seen, if, with the early problem website, you know,
the people will, will, will, an AI will, will generate a proof
and then he was three thousand lines of code that, that verify the proof.
But then we, people call it other AI's to, summarize the proof and, and then people
write their own proofs.
There's actually post processing or once you actually have one proof,
we actually have a lot of tools now to deconstruct it and, and interpret it.
It's a very nascent area of, of, of, of, of science or more mathematics.
But, I'm not as worried about, you know, so, so, so, so, so,
some people can certainly, what if the real hypothesis is proven with a completely incomprehensible proof?
I think once you have the out-of-fact of a proof, we can do a lot of analysis on it.
You posted recently that it would be helpful to have a formal or semi-former language for
mathematical strategies as opposed to just mathematical proofs, which is what Lee
specializes in.
I would love to learn more about what that would involve or what it would look like.
We don't really know, I mean, we've been very lucky in mathematics that we have worked
out the laws of logic and mathematics, but this is actually a fairly recent accomplishment.
I mean, it was started by Euclid, you know, a millennia ago, but only in like the early
20th century, did we finally, this dark here are the axioms of mathematics, or the standard
axioms of what goes EFC, and the axioms are first-autologic, and this is what a proof
is, and this we've managed to automate and have a formal language for.
There could be some way to assess plausibility of certain, you know, so you have a conjecture
that something is true, you test a few examples, and it works out like how does this increase
your confidence that the conjecture is true.
We have a few sort of mathematical ways to model this, like, invasion probability, for
example, but they're not, but you often have to, they often, you have to set certain
base assumptions, and there's a lot of subjectivity still in these tasks, so it is, it's not clear,
I mean, this is more of a wish than a plan to develop these languages, but just seeing how
successful having a formal framework in place like Lean has made deductive proofs so much
easier to automate and train AI on. If there was some similar framework, so the bottleneck
for using AI to create strategies and make conjectures is we have to rely on human experts
to, and the test of time to validate whether something is plausible or not.
If there was some semi-formal framework where this could be done semi-automatically in a way
that isn't sort of easily hackable to, you know, so of course, yeah, it's really important
with these formal proof assistants that there are just no, there's no backdoors or exploits
that you can do to somehow get your pure certified proof without actually proving it,
because reinforcement learning is just so, so good at finding these, these, these, these backdoors.
But yeah, if a strong framework that sort of mimics how scientists talk to each other in a
semi-formal way, you know, using data and, and argument, but also, you know, contracting
narratives and, and there's some subjective aspect of science that we don't know how to capture
in a way that that we can insert AI into them in a useful way. So yeah, this is a, this is a
future problem. I mean, there are research efforts to, you know, to try to create automated
conjectures and, and, and, and, and, and maybe there are ways to benchmark these and, and get some,
some way to simulate this, but this is, it's all very, very new science. Can, can you help me
get some intuition for, I have two sub-questions. One, it would be very helpful to have a tangible
sense of, it would be helpful to have a specific example of what is something like this would
look like that the way scientists communicate that we can't formalize yet. And two, it seems
almost definitionally paradoxical to say, building up some narrative or building up some natural
language explanation, and then also having something which you could have formalized. And,
I'm sure there's some intuition behind where that overlap is, and I'd love to understand that better.
Alright, so, so an example of a conjecture. So, um, Gals was interested in the prime numbers,
and he computed, he created one of the first mathematical data sets. He just computed the
first 100,000 prime numbers or so, hoping to find patents. And, he did find a patent, but maybe
not not the patent he was expecting. He found a statistical patent in the primes that, you count
how many primes there are up to 100, 100, 1000, 1 million and so forth. They get sparser and sparser,
but the, the, the, the, the drop-off in, in, in the density was inversely proportional to the
natural logarithm of, of, of, of, of, of, of, of the range of numbers. So, he conjectured what we're
not going to put the prime number theorem. The number primes up to x is like x divided by the natural
logarithm x. Um, and he had no way to prove this. Um, it was, it was data driven. Um, so this,
this was a conjecture. Um, it was revolutionary for its time because, um, it was maybe the first
really important conjecture of math that was statistical in nature, you know, so normally you,
you talk about patent like maybe the spacing between the primes has a certain regularity or
something, but, um, yeah, but this was really something which, it didn't tell you exactly how many
primes there were in any given range. It just gave you an approximate approximation that got
better and better as you, um, uh, went further and further out. But it, um, it, it helped, or so,
it, it, it started the field of local analytic number theory. Um, but it was the first in many
conjectures like this many of which got proved, which sort of started, um, um, consolidating the
idea that the prime numbers actually didn't really have a patent that they behaved like random,
um, uh, random sets of numbers with a certain density. Um, I mean, they had some patents like
like they get, they're almost all odd. Okay, so there's, there's some, there's, and they're not
actually random. They're what's called pseudo random. I mean, there's no random number generation
involved in creating the prime numbers. But, um, over time, it became more and more productive to
think of the primes as, as if they were just generated by some, some, some god-roading dice all
the time and just creating this, this random set. Um, and this allowed us to make all these other
predictions. Um, so there's a still open conjecture in, in a number three, we got the, the trim
prime conjecture that there should be infinitely many pairs of primes that are twins. This is two
apart, like 11 and 13. We can't prove that and there's actually good reasons why we can't prove it.
But, um, uh, but because of this statistical random model of the primes, we are absolutely
convinced it's true. Um, we, we know that if, if the primes were sort of generated by flipping
coins or something that we would, just by random charts, just like infinite monkeys that are
typified up, we would see, um, 20 pounds if you're over and over again. Um, and we have over time
develop this very accurate conceptual model of what the primes should behave like based on statistics
and probability. Um, but it's all mostly heuristic and non rigorous, um, but extremely accurate.
Um, so if the few times we were when actually can prove things about the primes, it has matched up
with the predictions of this, uh, what we call the random model of, of the primes. So we, we,
we have this conjecture of concept framework for understanding the primes that we,
everyone believes in. And, you know, it's the same reason why we believe the Riemann hypothesis
is true. Why we believe that cryptography based on the primes is basically, um, is mathematically
secure and such that it's, uh, it's, it's all part of this, this, this, this, this belief. Um,
in fact, one reason why we care about the Riemann hypothesis is that if the Riemann hypothesis failed,
um, we knew it was false. It means it would, it would be a serious blow to this model that,
that this, it would mean there's a secret pattern for the primes that we were not aware of.
Um, and, uh, I think we would very rapidly abandon any cryptography based on the primes because
if it was one pattern that we didn't know, that was probably more. And these patterns can lead to
exploits in crypto and yeah, it's gonna be, uh, it would be a big, big shock. Um, so we really
want to make sure that that doesn't happen. Um, so yeah, it's, it's, um, so we've been convinced of,
of things like the Riemann hypothesis and things of a time, but, uh, some of it is experimental evidence,
some is, uh, the few times we've been able to make theoretical results, they've always aligned.
Um, you know, it is possible that the consensus is wrong and, and we've all just missed something
very basic. Um, you know, they have been paradigm shifts in the past in scientific history.
Um, yeah, but we, we don't really have a way of measuring this. Um, I think partly because
we don't have enough data on, on, on how math and science develops. We, we have one timeline of
history and, you know, we have like, you know, a hundred stories of turning points in history.
If, if we had access to a million alien civilizations and each of the, the different
development of, of history and, uh, of science and different orders, then maybe we would have,
actually have a, have a decent shot at that, uh, at an understanding of, of how do we measure
what is, uh, um, progress and, and what is a good strategy, and we could maybe start formalizing it,
and, and actually having a framework. Um, maybe, um, what we need to do is actually start,
uh, creating lots of mini universes, uh, simulations of, of AI solving very basic problems, you know,
in arithmetic or whatever, but, but, um, um, but coming over their own strategies for doing these
things and, and, and having these little laboratories to test, I mean, there are people who,
who, who, who, who investigate like trying to, what's the smallest, uh, you know, new network that
can do tend to gym application and things like that. I think, I think we could actually learn
a lot of just some, from evolving, um, uh, small AIs on, on, on, on simple problems. We could learn a lot.
I was super excited when Mercury reached out about sponsoring the archives because I've been
making with them for years. I think I opened my first account with them in 2023. Something I've
come to appreciate over the last three years is that Mercury is constantly updating things and
adding new features. Take their newest feature insights. Insights summarizes your money in and out,
showing you your biggest transactions and calling out anything that deserves extra attention.
Like, maybe you're revenue from a particular partner has gone down, or you've got a big,
uncategorized purchase that needs to be investigated. It's a super low friction way for me to keep
tabs on my business and make quick decisions. For example, I tried to invest any cash that I don't
need on hand to keep running the business. With insights, with just a couple of clicks, I was
able to see exactly how much money I spent in each month of 2025. And that lets me know exactly
how much cash I'll need for the next year or so of operations. And then I can go invest the rest.
Mercury just keeps adding new features like this. Go to mercury.com to check it out. Mercury is
a Fintech company, not an FDIC in short bank. Faking services provided through choice financial
group and column NA members FDIC. You have to, uh, learn about new fields, not only very rapidly,
but deeply enough to contribute to the frontier. So in some sense, you're also one of the world's
greatest auto-dydax. How does, what is your process of learning about a new subfuel than math?
What does that look like? Yeah. So, um, I said that identified was kind of the, yeah, we talked
about depth and breadth before. And it's not purely human AI distinction. I mean, humans also,
split, uh, so I think it was, uh, Irving could split it into hedgehogs and foxes. And the hedgehog
knows one thing very, very well. And a fox knows a little bit about everything. So I definitely,
I didn't, you know, I, I think myself as a fox. Um, you know, I work with hedgehogs a lot. And
sometimes I can be a hedgehog if need be. But, um, yeah. So, um, I've always had a little bit of an
obsessive streak. If there's something which I read about, which I feel like I should understand,
I have the capability to understand this, but I don't understand why it works. There's a magic,
you know, that, um, you know, so someone who was able to use it type of mathematics,
I'm not familiar with and get over that, which I would like to prove, and I can't do it by myself,
but they could do it by, by their method. Then I wanted to find out what was their trick.
Um, it bugs me that they, someone else can, can do something which I think I, I can do it,
but I can't. Um, so I've always had that kind of obsessive completionist type, type streak.
Um, I've had to weed myself off computer games because, um, I, I, I, I, I, I started a game,
I want to play a de-completion with several levels and, um, so, um, that's one, one way in which I,
I'd learn new fields. Um, I collaborate with a lot of people who have taught me other types of
mathematics. Um, I just make friends with, I love mathematician who's working on another
area of mathematics, and I find their problems interesting, but I need to, but they have to teach me,
some of the basic tricks and, and what's known was not known and, and I learned a lot from that.
Um, I found that the writing about my, uh, what I've learned, I have a blog where I sometimes, um,
um, record things that I've learned, because in the past, when I was younger, I would learn
something and do his core trick, and it's okay, I'm going to remember this.
And the six months later, I, I, I've forgotten. I, I, I remember remembering it, but I don't,
but I can't reconstruct my arguments. And it, the first few times I was so frustrating to have,
I just did something and then lost it. Um, that sort of resolved, I should always write down
anything cool that I've learned. Um, and that's this is part of why, how this blog came about. Um,
I wanted to take you to read it by first. Um, it's something I often do when I don't want to do
other work, you know, like, like, like, there's some referee report or something, but there's,
there's, there's something that I feel slightly unpleasant for me to do at the time. And so, uh,
writing a blog, it feels creative and fun. Like, it's something that I do for myself. Um,
so maybe depending on, on the topic, it could be a quick, you know, half an hour or several hours,
but I, um, it doesn't, because it's something that I do sort of voluntarily, it doesn't feel like
it, it, it, it doesn't feel, uh, time flies when I write these things. As opposed to sort of doing
something which I have to do for administrative reasons, but it's just, it's, it's, it's, it's
just, it's drudgery. Okay. Those are tasks by the AIs really helping with nowadays, actually.
Is it, um, if, if like, civilization could, could from first principles decide how to use
Terry Tows' time. You know, it's like a limited resource. How, how, what is the biggest
difference between, in the, if the deal of ignorance got to decide how to use Terry Tows' time
versus what it does now? Um, okay. So podcasts wouldn't be happening. Yeah. So I get the, uh, as
much of the complaint about certain tasks that I don't want to do, but I have to do. So I,
as you, as you get more senior in academia, you get more and more responsibilities. I get some more
committees and, and whatever. Um, but I have also found that, um, a lot of events that I kind of
reluctantly went to because I was obliged to for one reason or another. Um, because there's
outside my comfort zone, I often find interactions with people who I wouldn't normally talk to,
like you, for instance. Um, and I would, I would learn interesting things and have interesting
experiences. Um, and I would have opportunities to, to, to, to, to, to, then network of other people
that I would never have done before. Um, so I do believe a lot in sensitivity. Um, I mean, I,
I do optimize my time in, and, um, when I, um, um, so there's some portions off of my,
off a day where I do schedule very carefully. Um, but I, I have been willing to sort of leave
some, some portions just, okay, I'm going to do something which is, which is not my usual thing.
And then maybe it'll be always my time, but maybe I will learn something. And, and more often
than not, it's, um, I've, I've, um, I feel like I've, I've gotten a positive experience,
which is not something I would have planned for. Um, and, yeah, so I believe a lot in
sensitivity. Um, and maybe there's a danger actually that, uh, you know, in the modern society,
it's not just AI, but we've become really good at optimizing everything. Um, and, and, and maybe
we are optimizing, we're not optimizing other optimizations, um, that, uh, you know, with, with,
with COVID, for example, um, we, we switched, um, like we switched a lot to remote meetings. Um,
and so everything was scheduled now. And so, uh, we kept busy at least in, in academia, you know,
we met almost the same number of people that we met, um, in person, but everything had to be planned.
Um, and you had to schedule things in advance. Uh, and what we lost out on was sort of the,
the casual, like, you know, I'm knocking on a hallway, just meeting someone, uh, for, you know,
while getting a coffee. Um, and this, this, this, this, um, um, yeah, serendipitous interactions,
that, um, you may think are not optimal, but actually are really important.
You know, when I was a graduate student, um, I would go down to the library, um, to look,
I had to look for a journal article. Yeah, I had to physically go to the library, check out the,
the journal and read your article. And sometimes the next article, you know, you can just
browse through and, and the next article is also interesting. Um, uh, sometimes it wasn't, but,
but you could accidentally find interesting things, um, which is something which has basically
been lost now because you can just type in, you know, if you, if you want to access an article now,
you just type it into, to a search engine or even an AI and you can get instantly what you want.
But you don't get, so the accidental things that, uh, you, you, you might have, um,
gotten, if you've done it more inefficiently. Um, so, um, yeah, there've been times when,
um, um, um, um, I spent a year once at the Institute for Advanced Study, which is, uh,
a great place to, uh, you know, there's no distractions. You know, you, you know, there's just
two research and like the first few weeks, you're there, like, it's great. You're getting all these
papers written up that you've been wanting to do for a long time. You're thinking about problems
for blocks hours of a time. Um, but I find that if I stay there for more than several months,
like I'd, I run out of, of inspiration somehow. Like I get bored, actually, you know,
so the internet a lot more. Um, you actually do need a certain level of distraction in your life.
It's somehow, uh, has enough randomness, um, and, and that, uh, and temperature, high temperature
beauty. Yeah. Um, so yeah, I don't know the, the optimal, uh, way to schedule my life. Uh, it just
seems to work. I'm very curious when you expect AI's that can like actually do,
uh, frontier math, better than the, at least as good as well as the best human mathematicians.
I mean, in some ways, they're already doing frontier math that is super intelligent that humans
can't do, but it's a different frontier from what we're used to. Um, I mean, you could argue
that Calcutta is we're doing frontier math, uh, that, that humans, uh, could not accomplish,
but it was, but it wasn't, you know, number crunching in, um, but, um, but replacing Terry Tau
completely. I mean, uh, question, what do you want me for? Uh, um, you just go another five casts
after. I'm not sure we, it might not be the right question to ask. Um, I think within a decade,
a lot of things that mathematicians currently do. Um, well, we spent a lot of the bulk of our time,
doing it, and a lot of stuff we put in our papers today can be done by AI. Um, but we will find
that that actually wasn't the most important part of what we do. Um, you know, um, 100 years ago,
a lot of mathematicians were just solving differential equations, um, like, people have needed,
physicists needed some exact solution to, to, to, to, to, to, to some system, and, and they were
just, they hired a mathematician to, to go through the calculus and, and work out the
solution to this fluid equation, whatever. Um, a lot of what, um, uh, 19th century mathematician
would do, um, you could make a call to, uh, mathematical, or, or with my alpha, or, or, uh,
computer algebra package, or, more, now more recently, and AI, and it would just solve the problem,
you know, in eight few minutes. Um, but we, we moved on with the, we, we worked on different
absolute problems after that. Um, you know, once computers came along, you know, uh, computers used
to be human, right? We were used to liberalistically log tables and, and, and, and, and, and work out
primes as cows did, and that is all being ourselves to computers. Um, but, but we, we moved on. Um,
in genetics, you know, uh, to, to sequence at the, at the genome of a single organism,
that was an entire PhD of a geneticist, you know, so it's so carefully, you know, separating all
the chromosomes and, and whatever. Um, and now you can just spend a thousand dollars in
Senator's sequencer and, and get it done. But genetics is not dead as a subject. Uh, you, you,
you move to a different scale, you know, maybe you study whole ecosystems rather than individuals.
At your point, but on the question of, well, when is most mathematical progress, almost all
mathematical progress happening by AI? So if you find out, oh, this year, I'm going to,
Millennium Prize, well, it has been solved. You'd put, you know, and 95 percent odds that
any I did it autonomously. Surely they're always such a year. Um, I guess, I mean, I, I, I,
I do believe that that hybrid, um, human plus AI's will, will dominate mathematics for a lot
longer. It, it's, it will depend, it will require some additional breakthroughs,
beyond what we already have. Um, so it's, it's going to be sarcastic. Um, you know, I think,
you know, AI's currently are very good at certain things, but, but really terrible at others. Um,
and, and why you can sort of add more and more frameworks on top to kind of reduce the error
rates and, and, and make them work with each other a bit more and so forth. Um, I, I, um, it feels
like we are, we don't have all the, uh, the ingredients to like really have a truly satisfactory,
sort of, uh, replacement for all intellectual tasks. Um, it's, it is complementary currently.
It's not, not, um, it is, it is not a replacement. Um, but maybe, uh, I mean,
because current level AI's will accelerate science in so many ways, uh, hopefully, you know,
I mean, new discoveries, new breakthroughs will happen, um, more, uh, more quickly. I mean, um,
it's possible that also by somewhat destroying stability, we, we actually inhibit certain
natural progress. Um, anything is possible, really at this point. I think that this, uh,
the board is very, very unpredictable at this, this point in time. What is your advice to
somebody who would consider a career in math, who is early in a career in math,
especially in light of AI progress? How should they be thinking about the career
differently if at all as a result of the AI progress? Yeah. So, uh, we live in a time of change.
It is, as I said, we live in a particularly unpredictable era. Um, and, uh, I think,
like, things that we've taken for granted for centuries may not hold anymore. Um, so, um, yeah,
the way we, uh, do everything, uh, not just mathematics, uh, will change. And, um,
um, you know, so I think, which is, you know, I mean, in many ways, I would prefer
the much more boring quiet era where things are much the same as they were 10 years ago or 20 years
ago. But, um, so I think one just has to embrace this, this, this is going to be a lot of change.
Um, and that, um, you know, the things that you study, some of them may become obsolete or
revolutionized, but, but some things will be retained. Um, and, um, so you, you somehow
always have to keep an eye on their, like, yeah, um, there'll be a lot of opportunities for things
that you, you wouldn't be able to do before. Um, so, I mean, in, in math, you know, you previously
had to basically build through years and years of education, being math PhD before, you could
contribute to the frontier of, of math research. Um, but now it's quite possible at the high school
level or whatever that you could get involved in math project and actually make a real contribution
because of all these AI tools and, and lean and everything else. Um, so there will be a lot of
non-traditional opportunities to, to learn. Um, so you need a very adaptable, um, mindset, um,
you know, there'll be, there'll be pursuing things as for curiosity, you know, for playing,
playing around and, and, uh, I mean, you still need to get your credentials for, I mean,
I will thank you for a while. It will still be important to, to sort of go through traditional
education and, and, and, and learn math and science and so forth, the old-fashioned way for a while.
But, um, yeah, um, but you should also be open to, to very, very different ways of, of, of doing
science, some of which don't exist yet. Um, yeah, so it's, it's, it's a scary time, but also very
exciting. Yeah. Awesome. That's a great note to close on. Terence, thanks so much. Yeah, thank you.

Dwarkesh Podcast

Dwarkesh Podcast

Dwarkesh Podcast
