technology

Yann LeCun’s $1B Bet

The Daily AI Show·Mar 11, 2026·1:03:26

About this Episode

The March 11, 2026 episode opens with a discussion about public skepticism toward AI, using polling data to frame how AI is being perceived politically and socially. The hosts then move through several major stories, including Yann LeCun’s new venture Advanced Machine Intelligence, a humorous token-cost comparison clip, and Andre Karpathy’s open-source auto research project for AI-driven model improvement. Later segments focus on self-improving agents, multi-model workflows and skills, and an AI-in-science feature on Zephyrus, a system that lets researchers query weather and climate data in plain English. The episode closes with a broader reflection on conversational access to complex scientific data and how that could reshape research workflows.

Key Points Discussed

00:00:44 AI Popularity and Public Perception

00:05:00 Yann LeCun’s Advanced Machine Intelligence

00:08:03 Karl Yeh Joins with the Token Cost Clip

00:12:08 Andre Karpathy’s Auto Research

00:21:12 Self-Improving Agents and Anthropic Institute

00:38:04 Multi-Model Workflows and AI Consensus

00:43:30 Turning Repeated AI Work into Skills

00:49:15 AI and Science: Zephyrus for Weather Data

The Daily AI Show Co Hosts: Andy Halliday, Beth Lyons, Jyunmi Hatcher, Karl Yeh

Hosts & Guests

Karl

Host

Jyunmi

Host

Beth

Host

The Daily AI Show Crew - Brian

Host

Andy

Host

and Eran

Host

Transcript

Oh, Loha, it is March 11th, 2026, and I am joined by Andy Beth, and I'm Junmi, and this

is the Daily AI Show.

Today, we're going to cover a whole bunch of AI news, and then I've got a little bit

of an AI in science story for you.

So let's dive in really quickly, and Andy, what story do you have for us today?

Well, I want to lead off with the question of just how popular is AI?

We see the huge numbers of people who are using it, but a new just this month poll by NBC News

of 1,000 registered voters showed that there are only two items on their list with lower

negative ratings than AI, and so I'm sure you're just wanting to know what those two things

are. They are Iran and the Democratic Party.

Those two have the biggest difference between the number of positive people who voted positive

on that particular subject and people who are voting negative, but there are some others

around that same cluster at the bottom of the list that I thought were interesting.

One of them is ICE, but that doesn't surprise you.

It's way down there, but AI is below ICE.

ICE has a negative 18 difference, and AI has a negative 20.

So wow.

Okay, but there was a big chunk of neutral feelings about AI where there wasn't so much

about the war or ICE, I thought, but yes, in the absolutely no.

Yes, no, yeah, lies, dam lies, and statistics, here we go, so the poll does require further

interpretation.

I'm frankly, I like AI.

I don't know what they're talking about, but a lot of it has to do with the lack of knowledge

about what the positive benefits are of using the full range of frankly inscrutable tools

that are available to us and the threats to society and otherwise.

So I can see why overall for voters, AI is a negative issue.

What can also see, for me, my experience of people who are younger than we are, it feels

very wrapped in like the dislike, absolutely not, feels very wrapped into climate and what

it's doing to the environment, right, which is also a runaway story.

So the idea of what it's doing to the environment, it has a seed of truth, but is interpreted,

I think more than it actually turns out to be, it doesn't mean that it won't turn out

to be that, right, but it feels a little more like they are, they're pretty clear on

what the benefits are wrong.

If you're not using it every day, you're not clear on what the benefits, but the pushback

isn't about the benefits that pushback is about environment.

I think just stepping back, it's interesting in my lifetime that if we looked at a snapshot

of what the important voter issues were, both positive and negative, across the decades,

I didn't foresee that artificial intelligence was going to be so omnipresent in the news

and in the dialogue around what's happening in the world.

It's really front and center, whether it's in respect of war and the application of AI

to war, or it's in the context of the significant income disparity that's emerged since the 1970s

and so on, and being accelerated by AI, all of those things, AI is touching on every one

of those things now, and it's really centered, it's just centerpiece in my life, frankly,

now I didn't see that coming.

No.

And you saw all kinds of things coming, Andy?

I thought I did, but like,

certainly wasn't able to take advantage of them.

Right.

Right, right.

Beth, do you have a story that's a, that's a particular interest?

I do.

So we have, we have shared Jan LeCoon's vision or pushback on AI and large language models

many times on this show, right?

Like he's fairly famous saying that the large language model isn't as smart as your cat,

because your cat has a sense of the world, and he also this war was recently that he left

meta, right?

The confirmed leaving after the scaled person came and took over AI at meta, not Brian

scaled.

Alex one.

Yeah.

You?

Yeah.

So Le Jan LeCoon's advanced machine intelligence just emerged.

So that's his company with 1.03 billion seed round.

He's a touring award winner.

I'm reading this from the run, when the run down.

So he left in November after 12 years with fair, telling Mark Zuckerberg, you could build

world models faster, cheaper, and better on his own.

So the seed round is significant at one, at one billion, basically, but the other significance

is that it's the biggest seed round in investment in Europe at this point.

Yeah.

He's over in France.

He moved from Metta's headquarters, I think, or maybe he was working from France, but

he's a French guy.

Yes.

Yeah.

And by the way, the valuation that the investors gave that company, which hasn't done anything

yet, although behind the scenes, I'm sure they've got a lot.

It's three and a half billion.

So it's already it's a unicorn before it ever starts, it's a unicorn in the money provided

as the seed funding.

How do you spend a billion dollars on developing world models well?

Nvidia can tell you because they're building world models and they say it way more than

that.

And Nvidia.

Nvidia is part of the backers and video Samsung, Bezos, Expeditions, Eric Schmidt and Mark

Cuban are among the backers.

The headquarters is in Paris.

And that's interesting because we had a little tour when Brian was in Paris of like an incubator

that that was in a it was like in a it was a converted railway station of firemen or

so that's right.

I'm not saying that he's there, but I'm not saying he's not there.

And that would be very cool.

And we're starting to see posts of people saying, Hey, I'm pleased to announce that I'm

going to be working with Yann LaCoon at Advanced Machine Intelligence.

So very cool.

Yeah.

Carl's joined us.

So Carl, do you feel I'll throw you into the deep end?

Do you have a story that you want to talk about today?

Oh, yes, I do.

Hold on.

Give me one second.

There was something that I posted yesterday to the chat and I thought it was very funny.

I'm just going to let's see if I can play it.

Yeah.

Okay.

So I want to play this and you let me know you all think of it.

Okay.

How many tokens a year is my employer getting from me in terms of productivity?

Let me remind math real quick, assuming you're a full time human, 40 hours a week, 50

weeks a year.

That's about 2000 hours.

If we figure your brains doing 20,000 tokens a day, like we talked about, that's roughly

100,000 tokens per work week, multiply by 50, 5 million tokens a year.

So yeah, your boss is basically getting 5 million little thoughts, decisions, emails,

and why is this spreadsheet broken again, moments for whatever they're paying you.

How to make you wonder who the real bargain is, huh?

So 5 million tokens a year they get from me.

What is the cost of getting 5 million tokens with AI right now?

Right now, March, 2026, prices have dropped hard since last year.

For 5 million tokens, depends on the model, but let's keep it real.

With something like OpenAI's GPT 5.2, their current top dog, you're looking at about 9 bucks

for input heavy work.

They don't look at so many 5 million input, 14 output, if it's half and half, call it

12 to $15 total.

Cheaper options?

Grocks fast here from XAI is stupid cheaper, like $1 or less for 5 million if you're mostly

prompt to, want generating nothing.

God, still around 3 to 15 per million, so maybe 20 bucks tops for a balanced run.

Bottom line, your 5 million human tokens a year, and AI could replicate the raw output

for under 20 bucks, maybe 5 if you go bud, makes you think, are we the expensive part

now?

Anyway, so yeah, that was literally, that's a good one, thanks for finding that, that's

excellent.

So it's like, then I thought, hold on a second, the big arch has worked more than 9 bucks,

so the new McDonald's sandwich is worth more than my entire output for the year.

I don't know, I think there's something wrong with the token math there, you know, a token

is a, is a smaller atom, if you will, than, you know, a completed thought.

I think there's a lot more tokens being generated than 5 million a year by a human rate.

Well, I mean, there's a reason that everybody's trying to figure out how to power AI at the

efficiency of the human brain with selective recall, with auto pruning recall, right, our

memories.

And yeah, well, that was fun, I like, I was, I was very entertained by that, especially

though, if you were, if you were just listening to us and you couldn't see the, the face expression

of the guy who's asking the questions to the AI model and I don't know which AI model

it was, I've never heard that particular voice before, anyway, his expression was really

wonderful.

You acted that one out beautifully.

Yeah.

It was also created by AI and not edited because 5.2 has not been the premier model for

since the training data, like many models have happened since the training data stopped,

right?

Right, right, right, right, right, or it's an old one.

Well, speaking of how AI can do more things more efficiently and with less tokens, I wanted

to talk a little bit about Andre Carpathy's auto research.

Have you guys covered that at all?

No, I'm just, you know, in hell, this came out, I think he released it, I like on the

eighth.

But essentially, so what he did was he open sourced a project called auto research.

In his ex post, he got 8.6 million views for this and the GitHub repo has crossed

8,000 stars in the last couple of days.

And so the implications are, you know, pretty big for a weekend project.

So what he did was he created a 300 or 630 line Python script and you give, you give an

AI agent a language model training setup, but it's for a single GPU and you just point

it at this training script and go on with your day.

So the agent modifies the code, runs a five minute training experiment, checks if the

model improved, keeps or discards the change and repeats.

So by morning, he had roughly 100 completed experiments and he didn't have to touch anything.

So the way he puts it is, the goal is to engineer your agents to make the fastest research

progress indefinitely and without any of your own involvement.

Now what happens next is pretty interesting.

So a company called hyperspace AI distributed the loop across a peer-to-peer network.

And on the eighth, 35 autonomous agents rent 333 experiments unsupervised.

The interesting finding the agents, the agents on powerful, sorry, he released an unpowerful

age 100 GPUs, used brute force to find aggressive learning rates.

While agents on weaker laptop hardware got more clever, they focused on initialization

strategies and normalization choices.

So different hardware, different optimizations both seem to work well.

At the same time, Shopify CEO Toby Lutkey adapted the framework overnight and reported

a 19% improvement with the agent optimized a smaller model outperforming a larger model

that had been configured manually.

So why does this matter?

This is AI doing AI research.

It closes the loop, researchers have been theorizing for years, AI improving the very training

code that produces better AI.

And because it's 630 lines, open source, it's under the MIT license and can run on a single

GPU, anyone can run it.

So not just big labs, anyone with a graphics card and a markdown file.

The humans roll ships from experiment to experimental designer, you define the problem

and what better means the agent grinds through the search space while you sleep.

So in terms of ML research, developing better AI, you can have your AI developing better

versions of itself.

And you can do it on a single GPU.

That was the thing that shocked me the most when I read about it is that this is single

GPU ML research and anyone could do it.

Anyone I can run it right now on my social GPU.

So I have some questions about what the actual setup is.

There's 630 lines of Python code.

That's a little package.

And then there's a GPU.

We've talked about those two components.

But there is not an agentic LLM thing that's working that set of 630 lines of code.

What's being improved actually?

Is it the 630 lines of code that self improves or is it an LLM that's getting improved?

Like a local model that you're basically giving the task of your rumors, the 630 lines

to kind of move forward.

Right.

So the script is running the loop of ML experiments.

So whatever your experiment stack is to start developing your LLM or your other ML based

AI model is what this is running.

So it's doing the improvement, that improvement loop or that improvement stack for all of

your experiments.

And so if you've done ML research, you're just going to have this huge list of experiments

that you're trying to run on your A on whatever model you're trying to build and improving

it from there and there.

I think the key takeaway is that you are doing AI, it's an AI self improvement loop.

It's tiny and runs on relatively insignificant hardware requirements.

All right.

It's got like a new fine tuning approach to a model that you may have running locally.

Yeah, you just have a local running model because you can't change the parameters on,

you know, a cloud-based model unless you have an isolated instance somewhere in the cloud.

You know, am I right about that?

Yeah, you're not like, you know, you're not improving like a chat GPT or a GPT 5.4 or whatever.

This is more for building your own model.

Yeah, so it's a tiny model because this is when I talked about it with GROC and on the show,

the other day, GROC referred to it as a toy lab, right?

So you're not even necessarily able to do something in like a general language model,

although it may be that day-g-changing and doing other things like Jumi just said makes

that more possible.

So in a follow-up post on Monday, March 9th, Andre said three days ago, I left auto research

tuning NanoChat for about two days on Depth equals 12 model.

And it found 20 changes that improved the validation loss.

He tested the changes and all of them were additive and transferred to larger

depth equals 24 models, right?

So he's doing the research on a tiny model.

It can move to its generalizable or it can move to the larger model

and stacking up all of the changes.

Today, I measured that the leader boards time to GPT 2, which is an internal measure, I'm sure,

drops from 2.02 hours to 1.8 hours.

That's an 11% improvement.

So that's the kind of results that this could give you.

And then you can make the inferences not AI inference.

You can make the assumptions or do the tests from the small tests to the larger test

and then see improvement being stacked.

And the real key for this is that he says that I'm very used to doing the iterative optimization

of neural network training manually.

You come up with ideas, you implement them, you check if they work, better validation loss.

You come up with new ideas based on that and you do that.

What he did in leaving it alone, he was very surprised that on top of the tuning that he'd

ever, that he'd already done over a good amount of time, this was so successful.

Cool. Yeah, Prometheus unbound.

Yeah, those first steps anyways, right?

Is anybody okay?

We're on the slide, right?

I mean, when we talk about Prometheus, we're like, okay,

so it was this and then, boom, it was super fast.

But there was a slow ramp up to the curve.

You just didn't notice that it was the curve when the ramp up was happening.

It's like, oh, this is interesting, why does that matter?

Oh, there's a lot of talk about self-improving agents, agents with persistent memory, right?

Those are the things that we're talking about.

That is the starting of the descent before the boom, I think.

Do you agree, Andy?

I see that there's probably a tipping point that's coming where the rapid acceleration of

addition of context, retention in the form of persistent memory, and also self-improvement

reinforcement learning techniques, like the 630 lines of code for tuning a model,

combined to really rapidly advance the capabilities toward AGI.

And the anthropic just four hours ago introduced the anthropic institute on that note,

because to facilitate and seed conversations about powerful AI, because we believe

being forewarned is being forearmed, the anthropic institute will tell the world what we are

seeing and expecting for the technology we built. It will lead to new research into the challenges

posed by more powerful AI and partner with others to address them.

This is a little bit, you know, like a follow-up to their

relationship breakup with the Department of War, but also, I think that all the labs are kind of

saying, hey, everybody wake up, we just need to have a little bit more of a conversation about

what's possible now and what we see coming in the relative near future.

And so a weave off of that story, did you talk, have you talked about the open AI robotics

leader leaving there? No, we had it teed up, but yes, it's a relevant story.

Go on Saturday, Caitlin Klonowski, the executive leading open AI's hardware and robotics

team, publicly resigned. Her statement was direct. AI has an important role in natural

national security, but surveillance of Americans without judicial oversight and lethal autonomy

without human authorization are lines that deserve more deliberation than they got.

She clarified in a follow-up, my issue is that the announcement was rushed without the guard

well as defying. It's a governance concern first and foremost. So a little bit of a context.

In days before the Pentagon had been negotiating with Anthropic,

about deploying AI on classified networks, Anthropic pushed for strict limits,

no mass domestic surveillance, no fully autonomous weapons, those negotiations collapsed,

and then Pentagon did something unprecedented. It designated Anthropic a supply chain risk.

That's not symbolic. It could force companies like Nvidia to sever commercial ties with Anthropic.

Anthropic says it will fight the designation in court, which has happened. Then very quickly,

opening AI announced that its own agreement along its technology and classified environments,

even CEO Sam Altman reportedly acknowledged the rollout appeared opportunistic and sloppy.

Kalanowski's departure is especially significant because she wasn't a policy person,

she led robotics and hardware. When the person running your robotics division resigns over

concerns about lethal autonomy, that someone with direct line of sight into what these technologies

can do, saying the process wasn't right. An opening AI spokesperson said the agreement

creates a workable path for responsible national security uses of AI,

with red lines against domestic surveillance and autonomous weapons.

The bigger picture. Anthropic supply chain designation sends a chilling signal to any AI

company that might want to negotiate limits on military use, and Kalanowski's resignation

sends a different signal. The people inside these companies believe the pace of

deal-making is outshining or outrunning the pace of governance.

So a significant move. As we've seen over the last months or year,

and this might be due to the limited talent pool or because of meta's, you know,

enormous sums they've been throwing around, the leaders and talent of these programs

or do have a spotlight on them. So anytime one of them makes the move, that becomes a significant

significant bit of news, and why they make the move becomes more significant as well.

So to have the head of open AI's robotics team say, hey, this is not done correctly,

and I'm out of here, I think is a big step in saying, okay, well, it's time to re-evaluate

because the people working on the stuff doesn't seem to align with what's happening outside of their

their work, I guess, direct purview.

So, Gwen asked a question here in the thread. It's saying, do you think there's something

going on with the type of boots on the ground there will be? And I think you're talking about

the possibility of autonomous killing weapons that are robotic, and we have those in a sense in

the form of autonomous drones that have pre-programmed to, you know, take action in, you know,

battle space, but it's a very fearsome concept of the idea of having, you know, really fast moving,

humanoid or dog type or other format robots being able to rapidly dominate the space that's

beyond the line of advance in a military confrontation, right? Or let's just say, what if China unleashed

autonomous killing robots that, you know, didn't have a whole lot of discrimination, but we're just

designed to kill anything that looked like it was wearing a military uniform in Taiwan, right?

And just release them and let them go. That's really scary because these machines can move

much faster. You know, the things we've seen in Star Wars, you know, the drone armies marching

very slowly, that's not the way it's going to look, right? These things will be very fast. You've seen

the dancers, the humanoid robot capabilities already, the dancers on stage, they can outperform

human dancers. You know, they're not going to be dumb droids. They're going to be very, very fast

and accurate, and it's going to be scary, and it's a whole new level of human confrontation that's

possible with this. So yeah, I think that someone working in robotics inside OpenAI saying,

no, I really can't sign up for this. That's an important demonstration of an ethical

red line that, you know, that person has against, you know, involving themselves in the

forward development of these kinds of capabilities. Right. I think also the boots on the ground

is referencing the talks that's coming from the Department of Defense and the president

of we're looking at boots on the ground in the war in Iran. Yeah.

Well, moving on from the doom and gloom, I wanted to just mention to everyone who's been waiting

for Google to come forward with additional tools that will improve their abilities to hopefully

catch up with and match the abilities of Claude in Excel. Google has just released what they

call context-aware generation across the entire Google workspace. So in docs and slides,

Gemini can create full drafts or design, you know, aligned presentations using your own files,

your calendar, and web data. It'll go out and search for you and then build these documents.

In Sheets, it will populate missing fields for you. It'll use web data to get that information

if necessary. It'll also automate tools that build entire dashboards from a single prompt.

So you can now talk to Gemini. I've not tried this yet, but I'm grateful that Google is finally

coming forward with some additional tools in Sheets because I've tried to move away from Excel

and work only in Sheets in order to sort of centralize both my storage and my

context, as Google would say, around, you know, Sheets that are actually operating models,

you know, for various tasks that I do. So it'll do cross-file analysis, meaning it'll provide you

with answers and summaries with direct citations across your full set of documents, whether Sheets

or Google docs or slides. So you can start to ask Gemini now things about and ask it to act on

and change those files in your Google app suite, now called Workspace.

Hmm, all right. And they also dropped yesterday or announced yesterday the embeddings,

model, multimodal embeddings, which is being able to process video, audio, text, image,

an image that made a specific, it called that specifically the ability to process PDFs, right?

So not just text and image, but text with image. The multimodal is supposed to have enough

understanding of the various ways that information is communicated in those modalities

that it can do a comprehensive analysis. Yeah, so this is a capability that was developed at deep

mind Google's division for really advanced research. And what that enables is embedding into a

retrieval database. All the different modalities like so you have video and audio and anything that

you put into it is semantically clustered so that they're all the retrieval of data is both text

and video are all not both across all of those different modes. And what is very interesting and

Brian will be very interested in hearing this is that this allows developers to build search tools

that could find, for example, find the moment in video that matches this text description.

And so now you've got a single vectorized data store that has all the information semantically

coded in very high dimension space. I saw something about with this new embedding model. It has

a clever way of reducing a 3000 plus dimension vector space down to 1500, which is the typical

sort of maximum dimension space for vector databases. So anyway, that's really an important

development, especially with respect to our current project Bruno, which has a video search capability

built into it. We're currently using Gemini as a way of doing that, but I'm not I'm not certain.

I'm not certain, you know, what the vector database is that Brian's implemented. So we'll have to

talk to him about that. But this is important use to that project. It's yeah, man, it's the

atomization process. The the slight caveat to that is that it is $12 per one million tokens

input. So and I'm not entirely sure like one million tokens will find out because we're

totally going to do it for the show. But like what does that mean in terms of a show like ours

with so like what does the one hour show cost and tokens embedded? Yeah. Right. Because right now

using the levels of Gemini that that we're using that I'm using with the atomization that I do

afterwards that is like under a dollar. And in fact, the atomization itself is like I don't know

15 cents maybe. But what else the other thing that I learned was that it looks like basically

you're paying the highest that you need. So if you are processing a video that has audio and a

screen that has an image with text on it, right? Like a PowerPoint or something like that.

You're paying the $12 million, you're paying the $12 per the million cost, not also processing

audio, not also processing these other things. And I'm sure the semantics like understanding of

what's happening on the screen as well. So more experiments to come. Yeah, always something new and

it really does bring up the Andy. What was the what was that term about just waiting for the

technology to go? Oh, yeah. It's called the waiting equation, waiting question. Do you have

a overall more economic result and faster result by waiting for the technology to catch up rather

than investing early on in the cycle? But you should still be using the technology, right?

Rachel on a post, Rachel Woods had a post, I think on LinkedIn the other day, which is probably

a week ago or two weeks ago, but saying I cannot imagine being the company or the person now

who's looking at what's happening and saying, you know, I'm still going to wait till it stabilizes,

right? I'm going to wait till everything's decided and then I'm going to jump in.

And how like what a learning curve you would have for them? Oh, absolutely. Like just getting started now

as opposed to having a year in or more. Yeah, I think conceptually just understanding those

foundations is probably that biggest hurdle. I wonder, I guess this would be a question for

chatter. Any of our viewers later is if you are new to AI, what has been the largest hump or the

most difficult, you know, learning curve that you've had to experience? Because I think I still

imagine my head that like when I was first getting into it, it was just the trying to deluge myself

in information and then just constantly learning more and more and spending large portions of the

day just absorbing enough to then start making forward steps. So I'm wondering if that's still the

if that's still the difficult part of getting into AI is that, okay, I need to do a lot of reading.

I need to do a lot of watching of YouTube videos and you know, I need to go back and watch

all 678 episodes of the daily AI show to catch up. Which you don't have to, but we'd like it if you did.

So, right, so before I move on to AI in science, I just wanted to check if there was any other

stories that either of you wanted to cover. Yeah, I have something I can probably knock out in a

couple of minutes. This is a, this is around the idea that you get better results if you as a

marshaler of all of your AI resources involve multiple models in the approach to whatever it is

that you're doing. And I've talked about this on the show previously about vibe coding and using

Gemini as a coach in working with loveables, you know, clawed based coding agent. And the combination

of those two is better. And I've actually had recent experience where loveable actually did

something that clawed code didn't quite fully understand. And I couldn't quite figure out because

it's not transparent on loveable, you know, what it was that loveables multi-agent approach was

doing that got to a better result than what clawed coded proposed. And, you know, clawed acknowledged,

oh, that's a much better idea, right? That kind of thing. So, I haven't been talking to each other.

Well, there's a company that is a Boston based startup called Collective IQ. Collective,

meaning multi-models. It started as a platform for enterprises in the procurement space.

And what the CEO of that procurement company felt was, and he's very technically proficient,

obviously, because he started this startup that has a really incredible capability available now

to do this collective IQ approach. He found that it was really constraining to have to decide

on a single model. I'm going to be a chat GPT person and I'm going to use that model. It was much

better. They got better results when they had this sort of collaboration happening across models.

So, they unveiled this new product that they're offering to enterprise called an AI consensus

platform. And it's designed to simultaneously query chat GPT clawed Gemini Grock and up to 10

additional large language models before synthesizing the result into a single annotated answer

that shows where the, and it doesn't just go and get individual answers. It actually has the

coordinates, a dialogue across those models. So, one model can be acting in a role that

challenges another one and also offers its own result. So, the highlights then for the user

where the models agree, it flags the disagreements and aims to reduce hallucinations and bias

that plague any single model approach to work with AI. So, I just wanted to say, this is why

I have so many subscriptions to different AI models. I don't want to give any of them up because

I feel like each one of them has something to offer. Right, right, because some things are,

each model has its own sort of expertise or seems to work best in a given situation.

And we also know that from posing, from giving another model what a different model has said,

you get a different level of conversation, right, the same as if you put five people in a room

to have a conversation about a topic that's different than going out and doing five separate

conversations, right? There is an additive, maybe a little competitive process of like, no,

well, if they gave you that, I'm going to give you something just a little bit better, right?

So, Andy, how would we best implement this? Would it be just an extra step in our questions

that we're asking these models? Well, I think what Claude Cowork has made possible for me is

eliminating the necessity of copying and pasting things from Gemini back into, you know,

Claude or vice versa. Instead, now Claude Cowork can look at all of my transaction and interaction

with lovable, for example. And I just say to what, when I want to get advice or perspective or

new plans from Claude Cowork, I just ask, using whisper flow, I ask Claude Cowork to look at,

you know, my most recent conversation and it will then scroll through the entire thing.

And it digests that very rapidly within 10 or 15 seconds and then says, okay, here's what I think

you should do. Or, you know, this is what I would do as an approach. And so, I'm, you know,

co-work to TLDRs, co-works automating the conversation with anything that's open in the browser for you.

Yeah. There is, I just want to drop one more thing. And one of the things that's very cool about

what's happening with social media is that I can, I can read tips from people who posted

in a different language. And this comes from, so deep dive, who posted, with, who posted

originally in Korean. And basically what they're doing is telling Claude, code, Claude sessions,

scrape all my Claude sessions, stored on this computer, organize the tasks I do frequently,

classify which ones would be good to put into skills, right, repeated prompt collection, plug-ins,

external integration, agents, autonomous agents, or Claude MD. And then make those, right,

if it's a repetitive task, it can be automated. And auditing your own use, Andy, right? Like,

that's very much in line with what you're talking about. And, and how cool, right?

Yeah. Yeah. No, that's, that's super interesting. I'm, I'm always interested in the ways that I can

make my own day-to-day easier or automated, I guess, in the sense. Just as a complete side note,

Andy, I know you were saying you used whisper flow. So I found an open source project, which I

really like called the whisper ring. So not as polished as whisper flow. But it gets the job done.

And I've been using that this weekend. And that's, that's been a, an overall improvement on,

on my interaction. Though I am kind of a little irritated with anthropic and Claude because I keep

on hitting my limits. What's going on? Shouldn't I be able to work all day without having the,

you know, limit? By the way, that's another one of the advantages of using multiple models.

So I, I'll often just pass it over to another model. If I'm starting, you know, if I'm

fearing the approach of either a compaction or, you know, a, you know, a strict limit. I,

it's been a while now, but I, but I'm not spending hours at the terminal. My, my, my, my terminal,

my, my workspace. I'm not spending hours at it. I, I get it maybe in bits and bits here, you

know, I get maybe a total of two hours a day outside of all the other responsibilities I have.

But I can get a lot done in two hours, especially when I'm using multiple models. And I'm spreading

the token limits across multiple models. It's true. It's true. I, it's, it's probably my own

fault for wanting to use Opus 4.6 thinking for every single question. Um, but, and thinking high,

like, yeah, thinking medium. I was like skating along. I was thinking medium and, uh,

had several conversations like, what? Why, why would I tell you that March 10th is a Tuesday?

Do you tell me that March 3rd is a Monday? Let's just talk about that, right? Like, let's just

have some perspective conversation. Uh, and the answer is training data more than, uh, more,

not doing date path. So, Jun, uh, you'll be interested to know. And perhaps you missed it because

we only mentioned it in fast passing on the show. But in benchmark performances,

in a gentick coding, Sonnet 4.6 matches or slightly exceeds Opus 4.6. So, at least in my use case,

I use Sonnet. I don't use Opus at all. Well, that, that's probably the, uh, the comparison I need

to do. I've been doing a lot of, uh, replicating of, uh, systems or projects that I've created

in, uh, in chat GPT. And I want to, and I've replicated them over into Claude to see what kind of

differences. And, um, if I can get better answers, you know, through that process as well. So setting

that stuff always takes up more, uh, more time because you're asking these more in-depth questions

like, Oh, I have, I have 300 files, source files to upload. You can only take 20 at a time. Okay,

well, let's, let's start, uh, absorbing it. Yeah. Also, by the way, if you haven't already, uh,

if you try and export your data from OpenAI, it took me three days to get it. So, just as a note,

what was it? Wanted to take you three days? Three days. Three days. For what? To get an

export of all my data from OpenAI. Oh, yeah. Okay. So just because there was a final march, so much data

there. Yeah. Yeah. Yeah. Um, I'm sure the, the process is also a low priority on their, you know,

list of things for compute. But, um, but just as a note, it could take you three days or, or more

because they don't, they don't make a promise of that. So if you're planning to move data over,

or do a backup process of your, of your data, know that it, uh, the request may take, uh, three days

to process. But that's a side note. Okay. Unless there's anything else that you all want to cover,

I think it's time for AI and science science science. Okay. So today's AI and science story is

about something that sounds deceptively simple, but could change how an entire scientific discipline

works. Researchers at UC San Diego have built an AI agent that they're calling Zephyrus. They

can answer plain English questions about weather and climate data. That might not sound

revolutionary at first, but to understand why it matters, you have to understand the problem

it's solving. So right now, if a client's, uh, a climate scientist or meteorology student wants

to ask a question, Mike, what was the average wind speed over the Gulf of Mexico last Tuesday?

They can't just, you know, ask, they have to write code, they have to know which data set to pull

from, how to query it, how to, uh, format that output, and how to interpret the results.

The barrier of entry isn't the science, it's the data engineering. So weather and climate

science sits on top of some of the most complex high volume data sets on the planet.

We're talking about constantly fluctuating streams of temperature, atmospheric pressure,

humidity, wind speed, precipitation, uh, all collected by surface, uh, balloon satellites,

radars, ocean bullies, ships, and aircraft. Making sense of all that data has always required

specialized technical skills that have nothing to do with understanding the atmosphere.

So what, so what the UC San Diego team did is build an AI agent that acts as a bridge.

So Zephyrus takes a natural language question, question in plain English, translated into code,

uh, queries the relevant AI, uh, weather forecasting models, retrieves the data, runs the analysis

and gives you back a plain language answer. The researchers, including Duncan Watson-Paris from

Scripps Institute of Oceanography and Rose U from Department of Computer Science,

describe their goal as lowering the barrier to entry to analyzing critical data.

They specifically want students and early career researchers to be able to interact with

these data sets without needing years of coding experience first. As you put it, Zephyrus is a

crucial step towards creating AI co-scientists that dramatically lower the barrier to entry,

allowing students and researchers everywhere to access and reason without critical weather

and climate data at unprecedented speeds. Now, it's important to be honest about

what Zephyrus is right now. The team reports it performs well on basic and intermediate tasks,

but it still struggles with complex multi-step queries. This is a first generation tool, not a

finished product. They're presenting it at the International Conference on Learning Representations,

the ICLR, in Rio de Janeiro this April. But here's why this story is bigger than one research paper.

Zephyrus arrives at a moment when AI weather forecasting itself is going through a revolution.

Just last month, the NOAA, the National or Oceanic and Atmospheric Administration,

deployed a new suite of AI-driven global weather prediction models into operational use.

Their system, called AIGFS, can produce a 16-day weather forecast using just 0.3%

of the computing resources of the traditional global forecast system. And it finishes in about

40 minutes. To put that into perspective, the traditional system requires

vastly more supercomputer to produce essentially the same forecast.

All right. The system was built on Google DeepMind's GraphCast model, then fine-tuned with NOAA's

own global data assimilation system data. That additional trading with NOAA's data

improved performance, especially when using GFS-based initial conditions. And it already

shown the results. Significantly better tropical cyclone track predicts at longer lead times.

The trade-off is that intensity forecasting still needs improvement, which future versions

will address. But for track accuracy, which tells you where the hurricane is going,

AI system is already outperforming the traditional model.

Yeah. We're with Yavro. Wow.

See if I can get through this. Excuse me. Separately, researchers at the University of

Washington published a study showing something even more ambitious. They built an AI model

called DLESim that can simulate a thousand years of Earth's current climate in just 12 hours.

And running on a single process. The same simulation on a state-of-the-art

supercomputer would take approximately 90 days. And the model just didn't run fast, it ran well.

The researchers compared its output to the four leading models from IPCC's

coupled model Intercomparrison project. Excuse me, I'm so sorry. Which is the gold standard

physics-based climate model that run on supercomputers and inform the intergovernmental panel on climate

change. The DLESim simulated tropical cyclones and the Indian summer monsoon better than those

leading models. In mid latitudes, it compared a month to month and year to year weather variability

at least as well including atmospheric blocking events. Those are the ridges that keep regions stuck

in heatwaves or cold snaps by deflecting incoming weather systems. The lead researcher,

Dale Duran, frames the tools purpose this way. Helping scientists answer whether a given

extreme weather event is the kind of thing that happens naturally within our current climate

or something that defies the odds. When we talk about 100-year storms, that seemed to happen

every few years now. This is the kind of tool that helps determine whether the climate itself

has shifted. Meanwhile, at Boston University, climate scientists Elizabeth Barnes, the universities

in inaugural Dalton family chair in environments of data science and sustainability,

is working on a complementary piece of the puzzle. Her focus isn't just on making AI

predictions better, it's on making them trustworthy. So Barnes works on explainable AI,

cracking open the black box to understand how the models learn from the data and quantifying how

much certainty we should attach to any given prediction. As she puts it, uncertainty,

quantification is huge in earth sciences. We do a lot of predictions. You can think of it as

weather forecasting, but we go out years to decades into the future. We don't know exactly what's

going to happen, so uncertainty has to be a part of what we produce. This matters because the AI

model that gives you a confident wrong answer about next week's hurricane track is worse than

no answer at all. The ability to say, here's our prediction, here's how confident we are,

and explain why is what separates useful tool from a dangerous one.

Okay. So, with all of these examples, what you have is a convergence across multiple fronts.

AI weather models are getting dramatically more accurate, outperforming, performing

physics-based models that took decades to develop. They're getting dramatically faster

from 90 days on a supercomputer to 12 hours on a single processor. They're getting dramatically

cheaper. Noah's system uses 0.3 percent of the compute of the system it's supplementing.

Researchers are working on making the models explainable and trustworthy and not just fast,

but all of the capabilities only useful of scientists can actually access it and

interrogate the data it produces. That's the gap Sephiris is trying to fill. Instead of writing code

to query a data set, you ask a question in English. Instead of spending hours on data wrangling,

you spend your time on the science. So, the researchers see meteorology as a test case for

something much broader, weather prediction. They know is a perfect proving ground because it combines

large complex data sets that change over time with the need to reason about those data in plain

language. If it works here, the same approach could work in genetics, material science,

epidemiology, any field drowning in data that's hard to access.

As Watson Paris puts it, we want to increase the speed in which we can reason about multimodal

data and learn about the earth by making it easier for students and young scientists to interact

with different data sets. The tool isn't there yet. It handles the basic and intermediate queries,

but stumbles on complex ones. But the vision, that you should be able to talk to your scientific

data and have a talk and have it talk back in the language you can understand is one that could

reshape how research gets done across the sciences. And the timing is right. The AI models generating

the data already, the computing infrastructure is ready for the most part. What's been missing is

the interface between the scientists and the data and Zephyrus is the first step toward filling

that gap. So that is your AI and science story. AI in weather prediction and weather science

seems to be moving along at a steady clip there. Zephyrus's original paper, I think, came out

in October of last year, and now they've moved on to presenting their actual model and their

program in April. So not a lot of time has gone from initial release of the paper to

putting it into circulation, if you pardon the pun.

I like the general conclusion that this is the development of the ability to talk naturally

to a data set that's very complex and changing continuously. And it's a more natural interface,

obviously, for humans to have an ongoing conversation about something in order to develop

understanding and action. And I think broadly that this beautiful example of how we're at the

edge of the application of AI to really complex data in weather forecasting is really an analog

for what has happened in the AI era since the emergence of chat GPT-3, which is now you can talk

to the computer. And it's a conversational dialogue. And now with the latest frontier models,

you're talking to a computer that is much, much bigger and more knowledgeable than anything we

imagined before. The training that can be condensed into a single frontier model of only a trillion

plus parameters is astounding. The level of knowledge and now reasoning capability that's

packed into that, that you can speak to naturally. You can speak back to you in whatever

style or language you ask it to. That's really, that's amazing to me and I'm really happy that I

got to live to experience it. Yeah, strong. Well, all great topics that we've covered today.

And if you want to continue that conversation, please join our community at dailyaishowcommunity.com.

To continue that conversation and to share all of your ideas. We're going to wrap it up

for today. So I want to thank everybody who joined us in chat and everyone who's just watching

a little later. You can always catch us Monday through Friday, same bad time, same bad channel.

Please make sure if you've enjoyed, if you've enjoyed any of this, please make sure to click the

things that make the things. And remember to keep your minds and your hearts open. Aloha.

Yann LeCun’s $1B Bet

About this Episode

Hosts & Guests

More from The Daily AI Show

The Acoustic Trust Conundrum

Google TurboQuant Changes Everything

Anthropic Strikes Back: Return of the AI

Sora Shuts Down, AI Science Speeds Up