news

What’s Changed with Codex: Key Updates

Open AI·Apr 17, 2026·14:56

About this Episode

In this episode, we spotlight the key updates made to Codex. Join us as we discuss how these changes facilitate better coding experiences. See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Hosts & Guests

AI News

Host

Transcript

Welcome to the podcast.

I'm your host, Jaden Schaefer.

Today on the show, I want to talk about OpenAI fighting back

about against the giant onslaught of features

anthropic has been pushing some negative PR.

Anthropic has been getting, but at the same time,

an incredible new tool called Claude Design

that just came out.

I also want to talk about where some VC dollars are going

in the AI space, some surprisingly interesting things there.

And a new term called token magazine.

In addition, OpenAI just massively beefed up

codex for desktop control, memory, and in-app browser

and over 100 plug-in integrations.

Basically, this is them swinging directly

at Anthropics Claude code and Claude co-work.

And I think it matters a lot for where editing agents

is going to be going in the future.

So let's get into it.

Before we do, I wanted to mention AI box.

The thing I keep hearing from people

is that they're paying for a chat,

GPT, Claude, Gemini, perplexity, even mid-journey.

And by the time you get all of that added up,

you're likely 70 or $80 a month across a bunch

of different logins.

AI box gives you access to over 80 different AI models

in one interface.

All of this is just $8.99 a month.

If you want to get access to it,

there's a link in the description to AIbox.ai.

And in addition, we have something

called the AI box builder, where you can essentially

link together multiple AI models.

We build it the entire workflow for you

and you vibe build tools without needing to know any code at all.

I'm not a developer.

I built this for other people that are not developers.

If you want to check it out, there's a link in the description

to AIbox.ai.

Okay, the first thing I want to talk about

is a company called Factory.

So this is an AI coding startup

that focused specifically on enterprise engineering teams.

They just closed $150 million series A

at a $1.5 billion valuation.

Coach Sla, Ventures, Led the Round, Sequoia,

Insight Partners, and Blackstone

all were participating in this.

The founder is named Mattin Grinberg.

He was a physics PhD student at Berkeley.

He basically cold emailed Sequoia partner,

Sean McGuire in 2023.

And they apparently were good friends.

They bonded over physics research.

McGuire convinced him to drop out

and Sequoia seeded the company.

Their customer list already included Morgan Stanley,

Ernst & Young, and Palo Alto Networks.

So obviously this is a very enterprise-focused,

you know, they're not targeting

individual developers in any way.

They're focused on the enterprise.

Grinberg's pitch is why Factory

I think is kind of differentiated their model flexibility.

They basically can let you switch

between Claw, Deepseek, whatever makes sense.

Although honestly, Cursor does that too,

as do I think most of the serious players at this point.

What I think this does tell us is that even

with Anthropic and OpenAI,

and Cursor already in the market,

enterprise AI coding still has room

for some category specific players.

Morgan Stanley isn't gonna let some, you know,

random developer tool run inside their network

unless it's built with their compliance

and security posture in mind.

And I think that's basically the gap

that Factory is filling.

And I think $1.5 billion in their current valuation

says that VCs are believing there is a real gap here.

Of course, Clawed Code is trying to get in there as well.

And you can look at things like Cognizant

just getting, you know, all of their employees,

350,000 of them onto Clawed and L of the Anthropic tools.

So I think there's probably competition

from a lot of players, but it's interesting

that they're carving out a niche there.

Okay, the next thing I wanna talk about

is a brand new tool from Anthropic called Clawed Design.

It's a research preview right now.

It's available to pro max teams and enterprise subscribers.

And it's powered by a Clawed Opus 4.7,

the model that just came out a day or two ago.

So this is what Anthropic just shipped.

And basically you can describe what you want

in PitchDec, a one-pager, a landing-page prototype,

and Clawed generates a first draft.

You've probably seen it kind of make web pages before.

So it's interesting because you actually can use Clawed Design

to kind of come up with the mockups ahead of time.

And then you can refine it by either directly editing it

or just talking to it.

I actually appreciate both of these options.

As I've used a lot of tools like,

I mean, I don't wanna throw too much shade at level

because I know they do have some direct editing features.

It actually never worked super good for me in the past.

Maybe they're better now.

But in the past,

levelable would have something where you could

describe the website or whatever you're trying to build.

It would generate the design.

And then you're supposed to be able to click on it

and edit directly.

It never worked.

And sometimes when I do the chat after I do that,

it would like undo them.

It would just kind of bad.

So I think Clawed has cracked this a little bit better.

Maybe levelable is there as well.

But you're able to export as PDFs, URLs, PpTX files.

And you can send the outputs straight to Canva.

So Canva has a big integration with them.

And you can keep all of your collaborations there.

It can also read your company's code base

and design files to apply a consistent design system

across all of your outputs,

which is actually I think the more interesting piece

if you kind of look at this technically.

Andthropic is positioning this

as kind of complimentary to Canva.

They're not they're saying like,

we're not going to compete with them.

This is a compliment.

The target audience is specifically people

who aren't designers.

So founders or PM startup operators

who need to make something look presentable really fast.

What I think about this is that Anthropic

is continuing to move up the stack earlier this year.

They launched Clawed Co-Work,

then Agentec plugins for specific departments.

And I think now this,

I think where they're going with this

is that they're not just trying to be an API company.

They want to actually own actual workflows and surface area.

It's the same play that OpenAI has been making.

And I think you're going to hear more why this matters

when we go into the deep dive later on in the episode.

But also just show it shout out to Google

who's been doing this.

And basically every vertical Google has Stitch,

which is a very similar design tool as well.

So yeah, I think we're going to see a lot of these players

get more into the software itself

beyond just the models, which is pretty interesting.

Okay, the next thing we want to talk about is token Maxine.

So if there's a funny story

on Antiquence recently,

where they're talking about token Maxine,

basically it's the pattern of companies and developers

when they're bragging about how many tokens

are AI coding tools burned through.

As if the more tokens that they're using

means that they're more productive.

I think the actual data that they've been crunching

is very different.

So tools like Clawed Code, Curse or Codex,

they're all generating way more accepted code

on the first pass, 80 to 90% of initial acceptance rates.

But when you look at the same code two weeks later,

the effective acceptance drops to 10 to 30%

because engineers were constantly rewriting it, right?

So basically what's going on here is when they're like,

look, we're writing like all of our code,

50% of our code, a lot of companies are like,

50% of our code's written by like Clawed

and it sounds amazing.

And they're like,

and it's all perfect and great or high level of it.

And maybe that's true.

And I'm not saying that that's necessarily bad.

I use Clawed Code heavily as well with my startup.

But what's interesting is I think people that pretend

it's basically a marker of productivity

because what they found get clear was doing this specifically.

And they found that AI users have 9.4 times higher codechern

than non-AI users.

And Pharaoh AI found codechern increased 861%

under high AI adoption.

And then you also had Jellyfish,

which was looking at 7,500 engineers

and found that the teams with the biggest token budgets

got two times the throughputs at 10 times the token costs.

And I think basically it's just a reality check, right?

The productivity gains from AI coding are real,

but they're also a fraction of what the output number suggests,

right?

If you're like, I can write a million lines of code a day now

and I used to only be able to write this amount.

OK, well, you definitely are getting real productivity gains.

That's not an argument.

But I just think it's important to check ourselves

and the million lines of code are not all good

because if you look at it three weeks later,

a big chunk of it has to be written or fixed, which is fine.

I mean, a normal developer writes code and read and works

on it and optimizes it and fixes it.

I think senior engineers, interestingly,

are less accepting and then AI of AI code than juniors

and problem because they know which parts are subtly wrong,

right?

So when there's like a code push, then

it's less likely to accept that code push.

If you're a manager thinking about how to measure AI ROI,

I think counting merged and shipped is important

and not just how much is generated.

OK.

Let's get into physical intelligence.

This is a robotics foundation model startup.

It just published research on a new model called pi 0.7.

And I think this might be the most novel kind

of technical story of the day.

But basically, what they're claiming right now

is that pi 0.7 can perform tasks.

It was never specifically trained on by composing skills

it learned in other contexts.

So the example that they're kind of highlighting all of this,

they have like an air fryer.

And the robot had only briefly seen this air fryer

in training.

So it wasn't trained to operate the air fryer.

And anyway, it briefly seemed in training.

And I think it only seemed like two short clips of an air fryer.

Then they gave it a step-by-step verbal instruction.

And it figured out how to operate it.

And in some broader testing, the generalist model

actually matched specialized models on jobs

like making coffee, folding laundry, and assembling boxes.

Reach searchers said that the generalization ability

was really surprising to them.

So basically, you train a special,

you train the robot specifically to fold laundry,

and yeah, it does good.

And then they have a new model where

it's just kind of a generalist at everything.

It's not trained to fold laundry,

but you explain step-by-step out of fold laundry.

And it does it just as good as the robot,

or almost just as good as the robot

that was specifically trained on this.

That is fascinating.

And I think especially when you look at, you know,

I would say quote unquote, general models,

like opening iron and theropic that they're building

that can do a lot of different things generally good.

It's kind of good news for them

because you may not have to have models specifically trained

on just a specific task when you just talk about,

you know, kind of like physical robotics and stuff.

So on the business side, physical intelligence

has already raised over a billion dollars.

They were last valued at 5.6 billion.

They're reportedly in talks to nearly double that

to 11 billion.

They're co-founder Laci Grum has a track record of backing,

Figma, Notion and Ramp.

So I think obviously this is why VC dollars

are going to Laci.

I think something that is interesting,

I caveat that I'll put on this if I'm trying to be honest.

Like basically pi is 0.7 still can't handle

a lot of multi-step tasks, right?

And it's not doing this autonomously without any coaching.

I think the robotics field doesn't really have

a lot of clean benchmarks like LLM's do.

You know, we have like humanities last exam

and like all these different benchmarks that we give AI models

on, you know, like engineering and math and other areas.

And we can tell exactly how good they are at those tasks.

There's not a lot of that with robotics.

I'm sure there's going to be more as we get more into it.

So basically for robotics,

so you kind of have to just trust whatever their demo is.

But I think if this kind of generalized behavior

is going to be something that we're looking at,

it's pretty significantly stepping us towards robots

to actually work in really messy real world environments.

So this is something I'll be closely watching

over the next six months or so.

Okay, OpenAI has just released a new,

a whole bunch of new features to their desktop app,

which is called codec, something I've tried in the past

but have opted for anthropics called codec,

codec and codec and recent, you know, weeks

in the last month.

And I think OpenAI sees that and they really want to make

a big push to win people back

or to have people try OpenAI codecs for the first time.

So huge upgrade to codecs.

I'll walk through a couple of things that are new

because I think OpenAI is basically swinging directly

at anthropics called code,

which honestly has been really crushing it, right?

So this is what they added.

First codecs can now run in the background on your Mac,

which is phenomenal, right?

It can open up applications, it can click around,

it can type into your desktop

while you keep working on something else.

This is actually something I like.

Clawed sort of does this,

but I'm going to be honest,

even with ClawedCowork a lot of times,

if I have like an automated task running,

like I've got a bunch of things that I'm just like,

you know, every day at 9am do this,

every day at noon do this.

And it's like grabs like analytics or grabs data

or goes and gets me a report on something.

So actually the thing that I love using it for

is if there's no API for a service,

I'll just have it go log into the account

and go grab the data I need and bring it back to me.

Hopefully those companies offer APIs in the future,

but for now, that's what I do.

In any case, it is annoying with ClawedCowork

that lots of times when those automated tasks

start happening.

So then this Chrome browser pops up on my screen.

If there's, if it can't do it like in the background

and all of a sudden it's like clicking on things

right in front of me and I'm like swatting flies,

like trying to get this thing away

will I keep working on something different.

Should I have its own computer like possibly,

but a lot of the times I just have it running on the side.

In any case,

OpenAI is trying to combat that

and have it work on things in the background.

So it's not just writing code in an editor,

it's actually operating your entire machine.

So this is what I'm excited about.

Computer use from OpenAI,

which they sort of have done for a long time.

They were the OGs way before Anthropic

was shifting things in this,

but they really just felt stale and bad.

I've tried a lot in the past.

And trust me, if OpenAI agents back in the day

were like six months a year ago

were as good as what Anthropics doing now,

I'd be shouting them from the rooftops,

but it seems like now they're making them come back.

So they can also run multiple agents in parallel

without interfering in your desktop,

which means that you can have one fixing a bug

and you can also have one running tests,

one writing docs all at the same time.

They also have a new in app browser.

So it can hit web applications directly.

They have 11 plug-in integration.

So code rabbit, GitHub, or GitLab issues.

They have a bunch of new exciting things

with their memory feature.

So it can remember previous sessions.

They have an image generation that is now inside of Codex,

which to be fair,

Claw does not have any sort of image generation.

They also rolled out a pay-as-you-go pricing

specifically for enterprise and business customers.

So will I think Anthropics is definitely a head right now?

I would say that the plug-in ecosystem

is probably part of one of the most underrated pieces

of this entire announcement

because they have 11 different plug-ins at launch.

They're gonna be adding more.

And with Clawed co-work, I mean, it's awesome,

but I have like maybe like four things,

you know, my Google calendar and Chrome,

like synced apps, a bunch of like Google tools and GitHub.

And but like beyond that,

there's so many different tools I use

that don't integrate very well with it.

It's gotta go and use my Chrome browser to access them.

So anyways, I think a lot of these integrations

that OpenAI is pulling in are going to be very useful.

All right, that is the show.

If you're getting value from these episodes,

please drop a comment over on Apple Podcast

or leave a couple stars over on Spotify.

You hit the about tab on Spotify to drop a review.

Basically the reviews help the show reach way more people.

It boosts it in the algorithm.

It helps it out a ton.

If you haven't done it already,

I would be eternally grateful.

Also, if you wanna consolidate the AI subscriptions

you're already paying for,

go check out aibox.ai.

There is a link in the description,

80 plus models, plain English automations,

899 a month, I'll catch you guys all in the next episode.

What’s Changed with Codex: Key Updates

About this Episode

Hosts & Guests

More from Open AI

Understanding Anthropic’s Launch of Opus 4.7

A New Era: Anthropic's IPO Insights

Insights into AI Cloning Technologies

Meta Gemini 4 Launch Highlights