Loading...
Loading...

Welcome to the podcast.
I'm your host, Jaden Schaefer.
Today on the show, I want to talk about OpenAI fighting back
about against the giant onslaught of features
anthropic has been pushing some negative PR.
Anthropic has been getting, but at the same time,
an incredible new tool called Claude Design
that just came out.
I also want to talk about where some VC dollars are going
in the AI space, some surprisingly interesting things there.
And a new term called token magazine.
In addition, OpenAI just massively beefed up
codex for desktop control, memory, and in-app browser
and over 100 plug-in integrations.
Basically, this is them swinging directly
at Anthropics Claude code and Claude co-work.
And I think it matters a lot for where editing agents
is going to be going in the future.
So let's get into it.
Before we do, I wanted to mention AI box.
The thing I keep hearing from people
is that they're paying for a chat,
GPT, Claude, Gemini, perplexity, even mid-journey.
And by the time you get all of that added up,
you're likely 70 or $80 a month across a bunch
of different logins.
AI box gives you access to over 80 different AI models
in one interface.
All of this is just $8.99 a month.
If you want to get access to it,
there's a link in the description to AIbox.ai.
And in addition, we have something
called the AI box builder, where you can essentially
link together multiple AI models.
We build it the entire workflow for you
and you vibe build tools without needing to know any code at all.
I'm not a developer.
I built this for other people that are not developers.
If you want to check it out, there's a link in the description
to AIbox.ai.
Okay, the first thing I want to talk about
is a company called Factory.
So this is an AI coding startup
that focused specifically on enterprise engineering teams.
They just closed $150 million series A
at a $1.5 billion valuation.
Coach Sla, Ventures, Led the Round, Sequoia,
Insight Partners, and Blackstone
all were participating in this.
The founder is named Mattin Grinberg.
He was a physics PhD student at Berkeley.
He basically cold emailed Sequoia partner,
Sean McGuire in 2023.
And they apparently were good friends.
They bonded over physics research.
McGuire convinced him to drop out
and Sequoia seeded the company.
Their customer list already included Morgan Stanley,
Ernst & Young, and Palo Alto Networks.
So obviously this is a very enterprise-focused,
you know, they're not targeting
individual developers in any way.
They're focused on the enterprise.
Grinberg's pitch is why Factory
I think is kind of differentiated their model flexibility.
They basically can let you switch
between Claw, Deepseek, whatever makes sense.
Although honestly, Cursor does that too,
as do I think most of the serious players at this point.
What I think this does tell us is that even
with Anthropic and OpenAI,
and Cursor already in the market,
enterprise AI coding still has room
for some category specific players.
Morgan Stanley isn't gonna let some, you know,
random developer tool run inside their network
unless it's built with their compliance
and security posture in mind.
And I think that's basically the gap
that Factory is filling.
And I think $1.5 billion in their current valuation
says that VCs are believing there is a real gap here.
Of course, Clawed Code is trying to get in there as well.
And you can look at things like Cognizant
just getting, you know, all of their employees,
350,000 of them onto Clawed and L of the Anthropic tools.
So I think there's probably competition
from a lot of players, but it's interesting
that they're carving out a niche there.
Okay, the next thing I wanna talk about
is a brand new tool from Anthropic called Clawed Design.
It's a research preview right now.
It's available to pro max teams and enterprise subscribers.
And it's powered by a Clawed Opus 4.7,
the model that just came out a day or two ago.
So this is what Anthropic just shipped.
And basically you can describe what you want
in PitchDec, a one-pager, a landing-page prototype,
and Clawed generates a first draft.
You've probably seen it kind of make web pages before.
So it's interesting because you actually can use Clawed Design
to kind of come up with the mockups ahead of time.
And then you can refine it by either directly editing it
or just talking to it.
I actually appreciate both of these options.
As I've used a lot of tools like,
I mean, I don't wanna throw too much shade at level
because I know they do have some direct editing features.
It actually never worked super good for me in the past.
Maybe they're better now.
But in the past,
levelable would have something where you could
describe the website or whatever you're trying to build.
It would generate the design.
And then you're supposed to be able to click on it
and edit directly.
It never worked.
And sometimes when I do the chat after I do that,
it would like undo them.
It would just kind of bad.
So I think Clawed has cracked this a little bit better.
Maybe levelable is there as well.
But you're able to export as PDFs, URLs, PpTX files.
And you can send the outputs straight to Canva.
So Canva has a big integration with them.
And you can keep all of your collaborations there.
It can also read your company's code base
and design files to apply a consistent design system
across all of your outputs,
which is actually I think the more interesting piece
if you kind of look at this technically.
Andthropic is positioning this
as kind of complimentary to Canva.
They're not they're saying like,
we're not going to compete with them.
This is a compliment.
The target audience is specifically people
who aren't designers.
So founders or PM startup operators
who need to make something look presentable really fast.
What I think about this is that Anthropic
is continuing to move up the stack earlier this year.
They launched Clawed Co-Work,
then Agentec plugins for specific departments.
And I think now this,
I think where they're going with this
is that they're not just trying to be an API company.
They want to actually own actual workflows and surface area.
It's the same play that OpenAI has been making.
And I think you're going to hear more why this matters
when we go into the deep dive later on in the episode.
But also just show it shout out to Google
who's been doing this.
And basically every vertical Google has Stitch,
which is a very similar design tool as well.
So yeah, I think we're going to see a lot of these players
get more into the software itself
beyond just the models, which is pretty interesting.
Okay, the next thing we want to talk about is token Maxine.
So if there's a funny story
on Antiquence recently,
where they're talking about token Maxine,
basically it's the pattern of companies and developers
when they're bragging about how many tokens
are AI coding tools burned through.
As if the more tokens that they're using
means that they're more productive.
I think the actual data that they've been crunching
is very different.
So tools like Clawed Code, Curse or Codex,
they're all generating way more accepted code
on the first pass, 80 to 90% of initial acceptance rates.
But when you look at the same code two weeks later,
the effective acceptance drops to 10 to 30%
because engineers were constantly rewriting it, right?
So basically what's going on here is when they're like,
look, we're writing like all of our code,
50% of our code, a lot of companies are like,
50% of our code's written by like Clawed
and it sounds amazing.
And they're like,
and it's all perfect and great or high level of it.
And maybe that's true.
And I'm not saying that that's necessarily bad.
I use Clawed Code heavily as well with my startup.
But what's interesting is I think people that pretend
it's basically a marker of productivity
because what they found get clear was doing this specifically.
And they found that AI users have 9.4 times higher codechern
than non-AI users.
And Pharaoh AI found codechern increased 861%
under high AI adoption.
And then you also had Jellyfish,
which was looking at 7,500 engineers
and found that the teams with the biggest token budgets
got two times the throughputs at 10 times the token costs.
And I think basically it's just a reality check, right?
The productivity gains from AI coding are real,
but they're also a fraction of what the output number suggests,
right?
If you're like, I can write a million lines of code a day now
and I used to only be able to write this amount.
OK, well, you definitely are getting real productivity gains.
That's not an argument.
But I just think it's important to check ourselves
and the million lines of code are not all good
because if you look at it three weeks later,
a big chunk of it has to be written or fixed, which is fine.
I mean, a normal developer writes code and read and works
on it and optimizes it and fixes it.
I think senior engineers, interestingly,
are less accepting and then AI of AI code than juniors
and problem because they know which parts are subtly wrong,
right?
So when there's like a code push, then
it's less likely to accept that code push.
If you're a manager thinking about how to measure AI ROI,
I think counting merged and shipped is important
and not just how much is generated.
OK.
Let's get into physical intelligence.
This is a robotics foundation model startup.
It just published research on a new model called pi 0.7.
And I think this might be the most novel kind
of technical story of the day.
But basically, what they're claiming right now
is that pi 0.7 can perform tasks.
It was never specifically trained on by composing skills
it learned in other contexts.
So the example that they're kind of highlighting all of this,
they have like an air fryer.
And the robot had only briefly seen this air fryer
in training.
So it wasn't trained to operate the air fryer.
And anyway, it briefly seemed in training.
And I think it only seemed like two short clips of an air fryer.
Then they gave it a step-by-step verbal instruction.
And it figured out how to operate it.
And in some broader testing, the generalist model
actually matched specialized models on jobs
like making coffee, folding laundry, and assembling boxes.
Reach searchers said that the generalization ability
was really surprising to them.
So basically, you train a special,
you train the robot specifically to fold laundry,
and yeah, it does good.
And then they have a new model where
it's just kind of a generalist at everything.
It's not trained to fold laundry,
but you explain step-by-step out of fold laundry.
And it does it just as good as the robot,
or almost just as good as the robot
that was specifically trained on this.
That is fascinating.
And I think especially when you look at, you know,
I would say quote unquote, general models,
like opening iron and theropic that they're building
that can do a lot of different things generally good.
It's kind of good news for them
because you may not have to have models specifically trained
on just a specific task when you just talk about,
you know, kind of like physical robotics and stuff.
So on the business side, physical intelligence
has already raised over a billion dollars.
They were last valued at 5.6 billion.
They're reportedly in talks to nearly double that
to 11 billion.
They're co-founder Laci Grum has a track record of backing,
Figma, Notion and Ramp.
So I think obviously this is why VC dollars
are going to Laci.
I think something that is interesting,
I caveat that I'll put on this if I'm trying to be honest.
Like basically pi is 0.7 still can't handle
a lot of multi-step tasks, right?
And it's not doing this autonomously without any coaching.
I think the robotics field doesn't really have
a lot of clean benchmarks like LLM's do.
You know, we have like humanities last exam
and like all these different benchmarks that we give AI models
on, you know, like engineering and math and other areas.
And we can tell exactly how good they are at those tasks.
There's not a lot of that with robotics.
I'm sure there's going to be more as we get more into it.
So basically for robotics,
so you kind of have to just trust whatever their demo is.
But I think if this kind of generalized behavior
is going to be something that we're looking at,
it's pretty significantly stepping us towards robots
to actually work in really messy real world environments.
So this is something I'll be closely watching
over the next six months or so.
Okay, OpenAI has just released a new,
a whole bunch of new features to their desktop app,
which is called codec, something I've tried in the past
but have opted for anthropics called codec,
codec and codec and recent, you know, weeks
in the last month.
And I think OpenAI sees that and they really want to make
a big push to win people back
or to have people try OpenAI codecs for the first time.
So huge upgrade to codecs.
I'll walk through a couple of things that are new
because I think OpenAI is basically swinging directly
at anthropics called code,
which honestly has been really crushing it, right?
So this is what they added.
First codecs can now run in the background on your Mac,
which is phenomenal, right?
It can open up applications, it can click around,
it can type into your desktop
while you keep working on something else.
This is actually something I like.
Clawed sort of does this,
but I'm going to be honest,
even with ClawedCowork a lot of times,
if I have like an automated task running,
like I've got a bunch of things that I'm just like,
you know, every day at 9am do this,
every day at noon do this.
And it's like grabs like analytics or grabs data
or goes and gets me a report on something.
So actually the thing that I love using it for
is if there's no API for a service,
I'll just have it go log into the account
and go grab the data I need and bring it back to me.
Hopefully those companies offer APIs in the future,
but for now, that's what I do.
In any case, it is annoying with ClawedCowork
that lots of times when those automated tasks
start happening.
So then this Chrome browser pops up on my screen.
If there's, if it can't do it like in the background
and all of a sudden it's like clicking on things
right in front of me and I'm like swatting flies,
like trying to get this thing away
will I keep working on something different.
Should I have its own computer like possibly,
but a lot of the times I just have it running on the side.
In any case,
OpenAI is trying to combat that
and have it work on things in the background.
So it's not just writing code in an editor,
it's actually operating your entire machine.
So this is what I'm excited about.
Computer use from OpenAI,
which they sort of have done for a long time.
They were the OGs way before Anthropic
was shifting things in this,
but they really just felt stale and bad.
I've tried a lot in the past.
And trust me, if OpenAI agents back in the day
were like six months a year ago
were as good as what Anthropics doing now,
I'd be shouting them from the rooftops,
but it seems like now they're making them come back.
So they can also run multiple agents in parallel
without interfering in your desktop,
which means that you can have one fixing a bug
and you can also have one running tests,
one writing docs all at the same time.
They also have a new in app browser.
So it can hit web applications directly.
They have 11 plug-in integration.
So code rabbit, GitHub, or GitLab issues.
They have a bunch of new exciting things
with their memory feature.
So it can remember previous sessions.
They have an image generation that is now inside of Codex,
which to be fair,
Claw does not have any sort of image generation.
They also rolled out a pay-as-you-go pricing
specifically for enterprise and business customers.
So will I think Anthropics is definitely a head right now?
I would say that the plug-in ecosystem
is probably part of one of the most underrated pieces
of this entire announcement
because they have 11 different plug-ins at launch.
They're gonna be adding more.
And with Clawed co-work, I mean, it's awesome,
but I have like maybe like four things,
you know, my Google calendar and Chrome,
like synced apps, a bunch of like Google tools and GitHub.
And but like beyond that,
there's so many different tools I use
that don't integrate very well with it.
It's gotta go and use my Chrome browser to access them.
So anyways, I think a lot of these integrations
that OpenAI is pulling in are going to be very useful.
All right, that is the show.
If you're getting value from these episodes,
please drop a comment over on Apple Podcast
or leave a couple stars over on Spotify.
You hit the about tab on Spotify to drop a review.
Basically the reviews help the show reach way more people.
It boosts it in the algorithm.
It helps it out a ton.
If you haven't done it already,
I would be eternally grateful.
Also, if you wanna consolidate the AI subscriptions
you're already paying for,
go check out aibox.ai.
There is a link in the description,
80 plus models, plain English automations,
899 a month, I'll catch you guys all in the next episode.
Open AI
