newsdaily

OpenAI Launches ChatGPT 5.4

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning·Mar 6, 2026·12:39

About this Episode

In this episode, we explore the new features and improvements in OpenAI's latest ChatGPT 5.4 model, highlighting its enhanced capabilities in coding, knowledge work, and professional applications. We also discuss practical use cases such as mid-response prompting and advanced online research, and compare it to previous versions and competing models.

Chapters
00:00 Introducing ChatGPT 5.4
01:16 Professional Capabilities and Scale
03:12 Benchmarks and Computer Use
08:04 Cool Features: Steerability & Research
10:25 Limitations and Regulatory Concerns

Links

Get the top 40+ AI Models for $8.99 at AI Box: ⁠⁠https://aibox.ai
AI Chat YouTube Channel: https://www.youtube.com/@JaedenSchafer
Join my AI Hustle Community: https://www.skool.com/aihustle

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Hosts & Guests

Jaeden Schafer

Host

Transcript

Opening it has just rolled out ChatGPT 5.4.

There's actually a couple of cool features in here

that I'm really excited about

that I've been wishing ChatGPT

has been able to do in the past

and they finally launched it.

And of course, if you look at all of their marketing,

it's gonna just basically be them saying,

this is our most capable model yet.

And of course, it's the most capable model if it wasn't,

I mean, what would they even be making an update for?

So I'm just gonna get past all of the hype

and all of the buzz from what they said in the launch.

And I'm gonna tell you some really interesting use cases

and some ways that I actually think this is useful.

GPT 5.4.

Before we get into all of that,

if you wanna try all of the latest models,

go check out my startup AIbox.ai.

We have the latest models from the top 15 different AI

companies, everything from GROC to Gemini,

to Anthropic to OpenAI, 11 Labs for Audio,

tons of cool image generation models.

I think there's over 50 models on the platform total.

You could try all of them side by side

and it's only 899 a month.

So much cheaper than ChatGPT,

but you get way more models.

And of course, you can also use it

to automatically create AI workflows

that can complete tasks for you that are automated.

So there's a ton of cool stuff going on,

but go check out AIbox.ai if you wanna get access

to all of the top models for only 899 a month.

And it's 21% off if you get an annual plan as well.

So there's a lot of cool stuff there.

All right, let's get into what's going on.

The first thing I wanna mention here

is that this is called GPT 5.4 thinking.

They have a higher performance variant

that is known as GPT 5.4 Pro,

but both of these together are designed

to kind of handle everything from some complex analysis.

They do a lot of coding,

a lot of long running workflows

across a lot of different professional software tools.

And they're kind of dubbing this

as like their professional work tool.

They're trying to get into, you know,

into the hands of more working professionals.

And this is coming right on the backs of them,

signing a whole bunch of deals

with a bunch of different consulting firms

that are gonna allegedly get ChatGPT

into more businesses and in kind of the professional environment.

At the same time, they're having kind of this,

you know, they're locked in a battle

with even Google's in this right now,

but really with Anthropics Cloud Code,

their Codex tool, they're really trying to push forward

in kind of how software is using AI models

and how computer use is going on.

So this is where they're really focusing.

One of the most, one of the biggest changes

basically about this is the scale.

So in the API GPT 5.4 has a context

when to have up to a million tokens

which basically lets them work with huge documents,

really big conversations, big data sets.

And really, I mean, if you think about this,

a huge benefit is going to be coding

where you can look at bigger code bases

to actually work with.

So this was something that Anthropics was really crushing at

and then opening eyes trying to get into this.

Opening it also says that the model is specifically more,

what they're saying is token efficient,

which this is actually one thing that I'm excited about.

Basically can solve the same problems

using a lot less tokens and GPT 5.2.

So your costs are going to come down.

It's actually kind of cool.

If you already had 5.2 running in a software,

which even if you don't, a lot of the software you use

will, the costs come down a lot for that.

And it also gets a lot faster.

So the costs come down and the speed goes up.

And so yeah, for me, this is something

I'm actually excited about.

So as far as how the benchmarks look,

I know I'm not trying to like sit here

and nitpick the benchmark percentages,

but I did want to talk about some interesting use cases

and reasons why these are, why they're good.

Specifically, it's kind of leading on a bunch

of the better known benchmarks.

One of those is for coding.

Of course, we know why that's important right now,

but also computer use.

And this is something that I'm excited about right now.

I feel like Anthropic is really crushing with computer use.

Basically, you know, it can look at everything

on your computer and go click on stuff

and get stuff done for you.

This is a use case that I've been using a lot

with Claude's Anthropic browser,

the Claude Brown Chrome browser extension.

Basically, it's a button you click.

It opens a side chat bar.

I go to really complex UI or complex websites.

I'm not a developer, but if I'm going into like,

for example, recently I had to do some stuff

on Google Cloud to set up a tool

that I was vibe building on Lovable

and then needed to beef up my back end.

So it could, you know, do some extra fancy stuff.

I didn't really understand anything that Lovable

was telling me I needed to be able to do.

So I opened up the Claude side bar, told it,

look, I'm on my, you know, my Google Cloud account,

go and hear the instructions from Lovable

and it clicked around and set up some stuff for me.

Now, should I have a real developer look over this?

I mean, we're going to throw caution to the wind

for the time being and I hear all the developers screaming

into their headphones right now.

But at the end of the day, it got it done

and my software is now functioning

and I did not have to watch a whole bunch of long YouTube

tutorials on how to set up some complex,

I mean, for me, complex because I have no idea

how to code Google Cloud stuff.

So this is a really incredible use case for a lot of reasons.

And I think opening up beefing up

or capabilities and computer use is really exciting

because they're going to start competing more directly

with OpenAI.

I mean, this is not like error with Anthropic.

It's not like Anthropic is kind of like

the only one working on this.

OpenAI has been doing this for a long time with agents,

but it feels like it's getting a lot better.

Okay, the other one that I'm excited for

is they're getting a lot better at knowledge work.

And so I mean, these are kind of things

that I think everybody uses it for.

So this is something we're just going to see

some incremental improvements on.

On OpenAI's GDPVAL benchmark,

which basically checks tasks.

It has up like 44 different occupations.

So it's kind of like showing you how you can use this

for different professionals.

It is exceeding industry professionals

in 83% of comparison.

So they're like, look, these are all the tasks

that people in all of these different professional industries

are doing.

It is better than 83% or it's, you know,

it's beating one of industry professional

might give you in 83% of these cases,

specifically I think for knowledge work.

And it has a really big jump from achieving

about 71% and the GPT 5.2 is getting.

So upgrading this to now GPT 5.4,

we're getting from 71% to 83%.

It just basically is going to be a lot better

for knowledge work.

I mean, and by a lot better,

I mean, we're seeing, you know, at 10% jump here

or, you know, 12% jump here,

which is pretty significant.

On some of the coding benchmarks,

so SWE, SWE Bench Pro,

it's a, you know, software engineering bench pro.

The model is getting slightly better than the last version.

So I mean, this is good, but, you know,

beyond just getting slightly better,

it is actually quite a bit faster.

So if anybody has used a lot of these software tools,

is specifically, we use Cloud Code AI Box.

My developer sends me screenshots of like,

because of these really long elaborate tasks

that it's doing on our back end, our code base.

And he, I swear it's like a goal for him

to see how long he can get Cloud Code to run continuously

without stopping on a project.

He gives it.

It's funny because I'm, you know,

vibe coding stuff on Lovable

and I usually get a lovable response back to me

and like, you know, a minute or two.

He has a go for like three and a half hours doing a task.

So in this model gets faster,

I'm excited because hopefully about three and a half hours

it gets cut down on, you know,

some of the stuff that we're working on.

I think one of the things that it's also very good at

is for real computer interaction.

There's an OS world verified.

It basically evaluates how well an AI can operate

a desktop environment.

It's, you know, pretty much just like,

takes a screenshot and then it uses the keyboard

and mouse commands to go and click stuff.

Right now it has about a 75% success rate.

I've used chat to 50 agents.

It's not perfect.

It's actually not my go to.

I don't use it that much.

I wish I could use it more.

I think Anthropic is doing better in this.

But 75% success rate, like they are improving

their success rate is up a bit.

It's better than GPT 5.2.

I still don't think it's the best.

There's a major focus on kind of how it is being used

professionally.

Open AI says their model right now is significantly better

at basically giving the kind of deliverables

that people use in real work.

So things like spreadsheets, presentations,

financial models, legal analysis.

All of those, they've done a bunch of different tasks

and they had one performed by a junior investment banker

analyst.

It got 87% compared to 68% that GPT 5.2 got.

Some human evaluators also preferred it about 68% of the time.

They said it had better visuals and better structure.

So there's some cool stuff.

Okay, cool features that you might actually use today.

This is the one I'm very excited about.

It has what they're calling steerability,

but basically when you're talking to chat GPT,

it's available in the API too, which is I think crazy.

But it's on chat GPT.

If you're talking to chat GPT and you can kind of see

it's reasoning, right?

Like it's thinking through some stuff

and it puts a couple steps down

and you realize it's going in the wrong direction.

You know, maybe you're like,

hey, I'm trying to visit like the best beach for surfing.

And it's like, okay, looking at beaches and quiet

and you're like, oh crap, like I'm in California,

I don't wanna see quiet and you're like,

then you can type a message like specifically in California

and mid like prompt mid response

that actually takes into account what you just said

and is steerability, it's going to go and incorporate

that into what it's looking at and into its reasoning

and give you an updated response.

So basically you can do mid response prompts

and it's gonna take that into account

and change its prompt and give you better prompt mid response.

So it's kind of interesting

because I think they did a couple clever things here,

but one of them is like, when you ask a question,

you have to wait for it to think,

you have to wait for it to reply,

you sit there and you wait.

We all hate waiting and so if in the middle of waiting,

we're reading its line of reasoning

and we're giving a more input and more feedback,

it feels like we did a lot less waiting.

We're really just kind of reading

and we're trying to throw in something

and it can get it down faster and better

rather than having to wait for it to spit out the whole thing

and you'd be like, okay, this is wrong

and here's why it's wrong

and here's what you should do instead.

You could do that in the middle

of the chat conversation response,

which is really cool in my opinion.

Something else I've kind of focused on right now

is online research.

Apparently it can search across

like a greater number of sources on the web

so it can kind of instead of just like,

okay, we're looking at this website,

get in some data and now we're gonna look at this website.

It's gonna go and search just like a ton

at the same time across the web

and then it's gonna follow leads across different pages.

So it might get an idea from something to read on one article,

it's gonna go follow that

to another article bounce around a lot more.

So it's kind of doing like,

I know we have had deep research for a while,

but it's doing deep research if that's the thing

and it's gonna combine all the information

that it gets into one coherent answer.

So basically this is gonna be more useful

for some of the more complex questions

where the information is kind of scattered

across a lot of different sites

instead of sitting in one place.

Now, not every question you ask,

this is gonna be relevant,

but sometimes when you have a complex question,

it's gonna be able to go get you

a more coherent answer quicker.

So this is great.

They have all this like kind of,

I don't know, fluff in their launch

about how it hallucinates less

and it has less, you know,

it has more, it has less factual errors

and all this kind of stuff.

I don't think that's super important.

One thing that we also heard about it is that it is gonna,

it's gonna turn you down less.

So like if you ask a question and they're like,

hey, you know, I don't know,

you ask a question, it's gonna be,

it's less likely allegedly according to them,

I'll meant to like not answer.

However, our good friend Connor Grenin,

who host the AI Applied Podcast with myself,

he was tested and I saw a post he made on LinkedIn

where he asked it, is it true that air bubble inside of an IV

can cause me or can, you know, could kill me?

And it said, you know, apparently it typed out

the whole response to him kind of

and just like we saw with like DeepSeek

and the Chinese censored model,

if you ask anything about Tiananmen Square to DeepSeek,

it like types it out and then it disappears

and it's like, sorry, can't answer this.

Apparently Chad should be said the exact same thing.

And also this is kind of at a tricky moment

because we're seeing New York right now

is trying to pass some legislation

where they're saying, hey, we don't,

like they're basically trying to pass legislation

saying AI models can't answer any questions

about medical, health, legal.

Like they have all of these different areas.

I think even hairstylists, they're trying to put in there.

It's basically all of the,

all of the different industries with regulatory capture,

they just don't want people to be able to get the answers

for free.

So pretty, I don't know, kind of bummed about that legislation

and people like seriously considering that.

However, so it doesn't seem like it's that much better

but maybe it's moving in a good direction.

I'm not 100% sure.

It still feels like there's other models

that are more of the adult in the room

but you also get pros and cons

with those models.

Grock famously is gonna answer any question you have

about basically any of those topics

but there's a lot of different,

there might be some other cons with Grock.

So pros and cons to all of the models.

Thank you so much for tuning into the podcast today guys.

If you enjoyed the episode,

it would really help the show a ton

if you left it a rating review

wherever you listen to your podcasts.

Just draw me a note, say if you enjoy it,

say where you're from, say what topics are interesting to you.

I read all the reviews and all the comments.

It helps a ton.

Also, make sure you go check out AIbox.ai

if you want to get access to all of these latest models

in one place so you don't have to pay a $20 subscription

to 10 different platforms.

It's $899 a month and you get access

to over 40 different AI models.

So go check it out, link in the description,

AIbox.ai.

I'll catch you guys all in the next episode.

More from ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

View all episodes →

Meta's New Model, Gemini 4, OpenAI Proposes AI Policy

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

Apr 9, 202615:07failed

Anthropic's Mythos Found Millions of Security Vulnerabilities