news

OpenAI Launches ChatGPT 5.4

AI Breakdown·Mar 6, 2026·12:39

About this Episode

In this episode, we explore the new features and improvements in OpenAI's latest ChatGPT 5.4 model, highlighting its enhanced capabilities in coding, knowledge work, and professional applications. We also discuss practical use cases such as mid-response prompting and advanced online research, and compare it to previous versions and competing models.

Chapters

00:00 Introducing ChatGPT 5.4

01:16 Professional Capabilities and Scale

03:12 Benchmarks and Computer Use

08:04 Cool Features: Steerability & Research

10:25 Limitations and Regulatory Concerns

Links

Get the top 40+ AI Models for $8.99 at AI Box: ⁠⁠https://aibox.ai

AI Chat YouTube Channel: https://www.youtube.com/@JaedenSchafer

Join my AI Hustle Community: https://www.skool.com/aihustle

Hosts & Guests

AI Breakdown

Host

Transcript

Viscally responsible, financial geniuses, monetary magicians.

These are things people say about drivers who switch their car insurance to progressive

and save hundreds.

Because progressive offers discounts for paying in full, owning a home, and more, plus

you can count on their great customer service to help when you need it so your dollar goes

a long way.

Visit progressive.com to see if you could save on car insurance.

Positive casualty insurance company and affiliates, potential savings will vary, not available

on all states or situations.

Every day, excessive delays and denials from big insurance keep patients from accessing

the care they need.

And when care is urgent, these delays can be disastrous.

These practices cost billions in wasteful spending, driving up costs for American families.

But while big insurance put up barriers, America's hospitals and health systems are in your

corner.

Navigating endless reviews and appeals to get you the care you need when you need it

most.

They curb these harmful practices and put the focus back on patients, brought to you by

the coalition to strengthen America's health care.

Opening up has just rolled out chat GPT 5.4.

There's actually a couple of cool features in here that I'm really excited about, that

I've been wishing chat GPT has been able to do in the past and they finally launched it.

And of course, if you look at all of their marketing, it's going to just basically be

them saying, this is our most capable model yet.

And of course, it's the most capable model if it wasn't, I mean, what would they even

be making an update for?

So I'm just going to get past all of the hype and all of the buzz from what they said

in the launch.

And I'm going to tell you some really interesting use cases and some ways that I actually think

this is useful GPT 5.4.

Before we get into all of that, if you want to try all of the latest models, go check

out my startup AI box dot AI.

We have the latest models from the top 15 different AI companies, everything from GROC

to Gemini, to Anthropic to OpenAI, 11 labs for audio, tons of cool image generation

models.

And we're 50 models on the platform total.

You can try all of them side by side and it's only 899 a month.

So much cheaper than Chad's GPT, but you get way more models.

And of course, you can also use it to automatically create AI workflows that can complete tasks

for you that are automated.

So there's a ton of cool stuff going on, but go check out AI box dot AI if you want to

get access to all of the top models for only 899 a month.

And it's 20% off if you can annual plan as well.

So there's a lot of cool stuff there.

All right, let's get into what's going on.

The first thing I want to mention here is that this is called GPT 5.4 thinking.

They have a higher performance variant that is known as GPT 5.4 pro, but both of these

together are designed to kind of handle everything from some complex analysis.

They do a lot of coding, a lot of long running workflows across a lot of different professional

software tools.

And they're kind of dubbing this as like their professional work tool.

They're trying to get into, you know, into the hands of more working professionals.

And this is coming right on the backs of them, signing a whole bunch of deals with a bunch

of different consulting firms that are going to allegedly get chat GPT into more businesses

and in kind of the professional environment.

And at the same time, they're having kind of this, you know, they're locked in a battle

with even Google's in this right now, but really with Anthropics Cloud Code, their Codex

tool.

They're really trying to push forward in kind of how software is using AI models and how

computer use is going on.

So this is where they're really focusing.

One of the most, one of the biggest changes basically about this is the scale.

So in the API GPT 5.4 has a context window of up to a million tokens, which basically

lets them work with huge documents, really big conversations, big data sets.

And really, I mean, if you think about this, a huge benefit is going to be coding where

you can look at bigger code, you know, code bases to actually work with.

So something, this was something that Anthropics was really crushing at and then opening

eyes, trying to get into this opening.

It also says that the model is specifically more what they're saying is token efficient,

which I this is actually one thing that I'm excited about basically can solve the same

problems using a lot less tokens and GPT 5.2.

So your costs are going to come down.

It's actually kind of cool.

If you already had 5.2 running in a software, which even if you don't, a lot of the software

you use will the costs come down a lot for that.

And it also gets a lot faster.

So the cost come down and the speed goes up.

And so yeah, for me, this is something I'm actually excited about.

So as far as how the benchmarks look, I know, you know, I'm not trying to like sit here

and nitpick the benchmark percentages, but I did want to talk about some interesting

use cases and reasons why these are, why they're good.

Specifically, it's, it's kind of leading on a bunch of the better known benchmarks.

One of those is for coding.

Of course, we know why that's important right now, but also computer use.

And this is something I'm excited about right now.

I feel like Anthropic is really crushing with computer use, basically, you know, it can

look at everything on your computer and go click on stuff and get stuff done for you.

This is a use case that I've been using a lot with Claude's Anthropic browser, the

Claude Browkrome browser extension.

Basically, it's a button you click, it opens a side chat bar.

I go to really complex UI or complex websites.

I'm not a developer, but if I'm going into like, for example, recently I had to do some

stuff on Google Cloud to set up a tool that I was vibe building on, lovable, and then

needed to beef up my back end so it could, you know, do some extra fancy stuff.

I didn't really understand anything that lovable was telling me I needed to be able to do.

So I opened up the Claude side bar, told it, look, I'm on my, you know, my Google Cloud

account, go and here's the instructions from lovable and it clicked around and set up some

stuff for me.

Now, should I have a real developer look over this?

I mean, we're going to throw caution to the wind for the time being and I hear all the

developers screaming into their headphones right now.

But at the end of the day, it got it done and my software is now functioning and I have,

I did not have to watch a whole bunch of long YouTube tutorials on how to set up some

complex, I mean, for me, complex, because I have no idea how to code Google Cloud stuff.

So this is a really incredible use case for a lot of reasons.

And I think OpenAI beefing up their capabilities and computer use is really exciting because

they're going to start competing more directly with OpenAI.

I mean, this is not like error with Anthropic.

It's not like Anthropic is kind of like the only one working on this.

OpenAI has been doing this for a long time with agents, but it feels like it's getting

a lot better.

The other one that I'm excited for is they're getting a lot better at knowledge work.

And so I mean, these are kind of things that I think everybody uses it for.

So this is something we're just going to see some incremental improvements on.

On OpenAI's GDPVal benchmark, which basically checks tasks, it has up like 44 different

occupations.

So it's kind of like showing you how you can use this for different professionals.

It is exceeding industry professionals in 83% of comparison.

So they're like, look, these are all the tasks that people in all of these different

professional industries are doing.

It is better than 83% or it's, you know, it's beating one of industry professionals might

give you in 83% of these cases, specifically, I think, for knowledge work.

And it has a really big jump from achieving about 71% and the GPT 5.2 is getting.

So upgrading this to now GPT 5.4, we're getting from 71% to 83%.

It just basically is going to be a lot better for knowledge work.

I mean, and by a lot better, I mean, we're seeing, you know, at 10% jump here or, you

know, 12% jump here, which is pretty significant.

On some of the coding benchmarks, so SWE, SWE bench pro, it's a, you know, software

engineering bench pro, the model is getting slightly better than the last version.

So I mean, this is good, but, but, you know, beyond just getting slightly better, it is

actually a bit quite a bit faster.

So if anybody has used a lot of these software tools, is specifically, we use cloud code

AI box.

My developer sends me screenshots of like, because of these really long elaborate tasks

that it's doing on our back end, our code base.

And he, I swear, it's like a goal for him to see how long he can get cloud code to run

continuously without stopping on a project.

He gives it.

It's funny because I'm, you know, vibe coding stuff on lovable and I usually get a lovable

response back to me and like, you know, a minute or two.

He has a go for like three and a half hours doing a task.

So in this model gets faster, I'm excited because hopefully that three and a half hours

gets cut down on, you know, some of the stuff that we're working on.

I think one of the things that it's also very good at is for real computer interaction.

There's an OS world verified.

It basically evaluates how well an AI can operate a desktop environment.

It's, it's, you know, pretty much just like takes a screenshot and then it uses the keyboard

and mouse commands to go and click stuff.

Right now it has about a 75% success rate.

I've used chat to be agents.

It's not perfect.

It's actually not my go to.

I don't use it that much.

I wish I could use it more.

I think anthropic is doing better in this, but 75% success rate, like they are improving

their success rate is up a bit.

It's better than GPT 5.2.

I still don't think it's the best.

There's a major focus on kind of how it is being used professionally.

Open AI says their model right now is significantly better at basically giving the kind of deliverables

that people use in real work.

So things like spreadsheets, presentations, financial models, legal analysis, all of those.

They've done a bunch of different tasks and they had one performed by a junior investment

banker analyst.

It got 87% compared to 68% that GPT 5.2 got.

Some human evaluators also preferred it about 68% of the time.

They said it had better visuals and better structure.

So there's some cool stuff.

Okay.

Cool features that you might actually use today.

This is the one I'm very excited about.

It has what they're calling steerability, but basically when you're talking to chat GPT,

it's available in the API too, which is I think crazy, but it's on chat GPT.

If you're talking to chat GPT and you can kind of see it's reasoning, right?

It's thinking through some stuff and it puts a couple steps down and you realize it's

going in the wrong direction.

Maybe you're like, hey, I'm trying to visit the best beach for surfing.

It's like, okay, looking at beaches in Kauai and you're like, oh crap, I'm in California,

I don't want to see Kauai and you're like, then you can type a message specifically in

California and mid prompt, mid response, it actually takes into account what you just said

and is steerability.

It's going to go and incorporate that into what it's looking at and into its reasoning and

give you an updated response.

Basically, you can do mid response prompts and it's going to take that into account and

change its prompt and give you a better prompt, mid response.

It's kind of interesting because I think they did a couple of clever things here, but one

of them is when you ask a question, you have to wait for it to think, you have to wait

for it to reply, you sit there and you wait.

We all hate waiting and so if in the middle of waiting, we're reading its line of reasoning

and we're giving a more input and more feedback, it feels like we did a lot less waiting.

We're really just kind of reading and trying to throw in something and it can get it

down faster and better rather than having to wait for it to spit out the whole thing and

you'd be like, okay, this is wrong and here's why it's wrong and here's what you should

do instead.

You can do that in the middle of the chat conversation response, which is really cool

in my opinion.

Something else I've kind of focused on right now is online research.

Apparently, it can search across like a greater number of sources on the web so it can kind

of instead of just like, okay, we're looking at this website, get in some data and now we're

going to look at this website.

It's going to go in and search just like a ton at the same time across the web and then

it's going to follow leads across different pages.

So it might get an idea from something to read on one article.

It's going to go follow that to another article bounce around a lot more.

So it's kind of doing like, I know we have had deep research for a while but it's doing

deep research if that's the thing and it's going to combine all the information that

it gets into one coherent answer.

So basically, this is going to be more useful for some of the more complex questions where

the information is kind of scattered across a lot of different sites instead of sitting

in one place.

Now, not every question you ask, this is going to be relevant but sometimes when you have

a complex and our question, it's going to be able to go get you a more coherent answer

quicker.

So this is great.

They have all this like kind of, I don't know, fluff and their launch about how it hallucinates

less and it has less, you know, it has more, it has less factual errors and all this

kind of stuff.

I don't think that's super important.

One thing that we also heard about it is that it is going to, it's going to turn you

down less.

So like if you ask a question and they're like, hey, you know, I don't know, you ask a

question, it's going to be, it's less likely allegedly according to them all meant to like

not answer.

However, our good friend Connor Grenin, who hosts the AI Applied Podcast with myself, he

was tested and I saw a post he made on LinkedIn where he asked it, is it true that airbubble

inside of an IV can cause me or can, you know, could kill me?

And it said, you know, apparently it typed out the whole response to him kind of and just

like we saw with like deep seek in the Chinese censored model, if you ask anything about

Tiananmen Square to deep seek it, like it types it out and then it disappears and it's

like sorry, can't answer this.

Apparently, Chad CBT said the exact same thing.

And also, this is kind of at a tricky moment because we're seeing New York right now is

trying to pass some legislation where they, they're saying, hey, we don't, like they're

basically trying to pass legislation saying, AI models can't ask answer any questions about

medical, health, legal, like they have all of these different areas.

I think even hairstylists, they're trying to put in there.

It's basically all of the, all of the different industries with regulatory capture.

They just don't want people to be able to get the answers for free.

So pretty, I don't know, kind of bummed about that legislation and people like seriously

considering that.

However, so it doesn't seem like it's that much better, but maybe it's moving in a good

direction.

I'm not 100% sure.

It still feels like there's other models that are more of the adult in the room, but you

also get pros and cons with those models.

Grock famously is going to answer any question you have about basically any of those topics,

but you know, there's a lot of different, there might be some other cons with Grock.

So pros and cons to all of the models.

Thank you so much for tuning into the podcast today, guys.

If you enjoyed the episode, it would really help to show a ton if you left a rating review

wherever you listen to your podcasts.

Just drop me a note, say if you enjoy it, you know, say where you're from, say what

topics are interesting to you.

I read all the reviews and all the comments.

It helps a ton.

Also, make sure you go check out aibox.ai if you want to get access to all of these latest

models in one place so you don't have to pay a $20 subscription to 10 different platforms.

It's $899 a month and you get access to over 40 different AI models.

So go check it out, link in the description, aibox.ai, I'll catch you guys all in the

next episode.

Craving the coffee flavor you love.

But without the caffeine, Kachava's got you covered with their newest coffee flavor.

This all-in-one nutrition shake delivers bold, authentic flavor, crafted from premium

decaffeinated Brazilian beans with 25 grams of protein, 6 grams of fiber, greens, and

so much more.

Treat yourself to the flavor and nutrition your body craves.

Go to kachava.com and use code news.

New customers get 15% off their first order.

That's K-A-C-A-V-A dot com code news.