businessinvesting

OpenAI Launches ChatGPT 5.4

AI Investing: for the AI Investor·Mar 6, 2026·12:39

About this Episode

In this episode, we explore the new features and improvements in OpenAI's latest ChatGPT 5.4 model, highlighting its enhanced capabilities in coding, knowledge work, and professional applications. We also discuss practical use cases such as mid-response prompting and advanced online research, and compare it to previous versions and competing models.

Chapters

00:00 Introducing ChatGPT 5.4

01:16 Professional Capabilities and Scale

03:12 Benchmarks and Computer Use

08:04 Cool Features: Steerability & Research

10:25 Limitations and Regulatory Concerns

Links

Get the top 40+ AI Models for $8.99 at AI Box: ⁠⁠https://aibox.ai

AI Chat YouTube Channel: https://www.youtube.com/@JaedenSchafer

Join my AI Hustle Community: https://www.skool.com/aihustle

Hosts & Guests

AI Investing

Host

Transcript

Hello, it is Ryan and we could all use an extra bright spot in our day, couldn't we?

Just to make up for things like sitting in traffic, doing the dishes, counting your steps, you know,

all the mundane stuff. That is why I'm such a big fan of Chamba Casino.

Chamba Casino has all your favorite social casino style games that you can play for free

anytime, anywhere with daily bonuses. So sign up now at chamba casino.com.

That's chamba casino.com.

Opening has just rolled out chat GPT 5.4. There's actually a couple of cool features in here that I'm

really excited about, that I've been wishing chat GPT has been able to do in the past and they

finally launched it. And of course, if you look at all of their marketing, it's going to just

basically be them saying, this is our most capable model yet. And of course, it's the most

capable model if it wasn't, I mean, what would they even be making an update for? So I'm just going

to get past all of the hype and all of the buzz from what they said in the launch. And I'm going to

tell you some really interesting use cases and some ways that I actually think this is useful.

GPT 5.4. Before we get into all of that, if you want to try all of the latest models,

go check out my startup AIbox.ai. We have the latest models from the top 15 different AI

companies, everything from GROC to Gemini, to Anthropic to OpenAI, 11 Labs for Audio,

tons of cool image generation models. I think there's over 50 models on the platform total.

You could try all of them side by side and it's only 899 a month. So much cheaper than chat

GPT, but you get way more models. And of course, you can also use it to automatically create AI

workflows that can complete tasks for you that are automated. So there's a ton of cool stuff going

on, but go check out AIbox.ai if you want to get access to all of the top models for only 899 a

month. And it's 20% off if you can annual plan as well. So there's a lot of cool stuff there.

All right, let's get into what's going on. The first thing I want to mention here is that this is

called GPT 5.4 thinking. They have a higher performance variant that is known as GPT 5.4 pro.

But both of these together are designed to kind of handle everything from some complex analysis.

They do a lot of coding, a lot of long running workflows across a lot of different professional

software tools. And they're kind of dubbing this as like their professional work tool. They're

trying to get into, you know, into the hands of more working professionals. And this is coming

right on the backs of them, signing a whole bunch of deals with a bunch of different consulting

firms that are going to allegedly get chat GPT into more businesses and in kind of the professional

environment. At the same time, they're having kind of this, you know, they're locked in a battle

with even Google's in this right now, but really with Anthropics Cloud Code, their Codex tool,

they're really trying to push forward in kind of how software is using AI models and how computer

use is going on. So this is where they're really focusing. One of the most, one of the biggest

changes basically about this is the scale. So in the API GPT 5.4 has a context window of up to a

million tokens, which basically lets them work with huge documents, really big conversations,

big data sets. And really, I mean, if you think about this, a huge benefit is going to be coding,

where you can look at bigger code, you know, code bases to actually work with. So some of this

was something that Anthropics was really crushing at and then opening eyes trying to get into this.

Opening it also says that the model is specifically more, what they're saying is token efficient,

which I, this is actually one thing that I'm excited about. Basically, can solve the same

problems using a lot less tokens than GPT 5.2. So your costs are going to come down. It's

actually kind of cool. If you already had 5.2 running in a software, which even if you don't,

a lot of the software you use will, the costs come down a lot for that. And it also gets a lot

faster. So the costs come down and the speed goes up. And so yeah, for me, this is something

I'm actually excited about. So as far as how the benchmarks look, I know, you know, I'm not trying

to like sit here and nitpick the benchmark percentages, but I did want to talk about some

interesting use cases and reasons why these are, why they're good. Specifically, it's,

it's kind of leading on a bunch of the better known benchmarks. One of those is for coding. Of

course, we know why that's important right now, but also computer use. And this is something I'm

excited about right now. I feel like Anthropic is really crushing with computer use. Basically,

you know, it can look at everything on your computer and go click on stuff and get stuff done for

you. This is a use case that I've been using a lot with Claude's Anthropic browser, the Claude

Browse Chrome browser extension. Basically, it's a button you click. It opens a side chat bar.

I go to really complex UI or complex websites. I'm not a developer, but if I'm going into like,

for example, recently I had to do some stuff on Google Cloud to set up a tool that I was

building on Lovable and then needed to beef up my back end so it could, you know, do some extra

fancy stuff. I didn't really understand anything that Lovable was telling me I needed to be able to

do. So I opened up the Claude sidebar, told it, look, I'm on my, you know, my Google Cloud account,

go and here's the instructions from Lovable and it clicked around and set up some stuff for me.

Now, should I have a real developer look over this? I mean, we're going to throw caution to the wind

for the time being and I hear all the developers screaming into their headphones right now.

But at the end of the day, it got it done and my software is now functioning and I have,

I did not have to watch a whole bunch of long YouTube tutorials on how to set up some complex,

I mean, for me, complex because I have no idea how to code Google Cloud stuff. So this is a really

incredible use case for a lot of reasons. And I think OpenAI beefing up their capabilities and

computer use is really exciting because they're going to start competing more directly with OpenAI.

I mean, this is not like error with Anthropic. It's not like Anthropic is kind of like

the only one working on this. OpenAI has been doing this for a long time with agents,

but it feels like it's getting a lot better. Okay, the other one that I'm excited for

is they're getting a lot better at knowledge work. And so I mean, these are kind of things that

I think everybody uses it for. So this is something we're just going to see some incremental

improvements on. On OpenAI's GDPVal benchmark, which basically checks tasks. It has up like 44

different occupations. So it's kind of like showing you how you can use this for different professionals.

It is exceeding industry professionals in 83% of comparisons. So they're like, look,

these are all the tasks that people in all of these different professional industries are doing.

It is better than 83% or it's, you know, it's beating one of industry professionals might

give you in 83% of these cases, specifically, I think for knowledge work. And it has a really big

jump from achieving about 71% and the GPT 5.2 is getting. So upgrading this to now GPT 5.4,

we're getting from 71% to 83%. It just basically is going to be a lot better for knowledge work.

I mean, and by a lot better, I mean, we're seeing, you know, at 10% jump here or, you know, 12%

jump here, which is pretty significant. On some of the coding benchmarks, so SWE,

SWE Bench Pro, this is a, you know, software engineering bench pro. The model is getting slightly

better than the last version. So I mean, this is good, but, you know, beyond just getting

slightly better, it is actually quite a bit faster. So if anybody has used a lot of these software

tools, specifically, we use Cloud Code AI Box. My developer sends me screenshots of like, because

of these really long elaborate tasks that it's doing on our back end, our code base. And he, I swear,

it's like a goal for him to see how long he can get Cloud Code to run continuously without stopping

on a project. He gives it. It's funny because I'm, you know, vibe coding stuff on lovable and I

usually get a lovable response back to me and like, you know, a minute or two. He has a go for like

three and a half hours doing a task. So in this model gets faster, I'm excited because hopefully

that three and a half hours gets cut down on, you know, some of the stuff that we're working on.

I think one of the things that it's also very good at is for real computer interaction. There's

an OS world verified. It basically evaluates how well an AI can operate a desktop environment. It's,

it's, you know, pretty much just like, takes a screenshot and then it uses the keyboard and mouse

commands to go and click stuff. Right now it has about a 75% success rate. I've used Cheshire

PD agents. It's not perfect. It's actually not my go to. I don't use it that much. I wish I could

use it more. I think Anthropic is doing better in this. But 75% success rate, like they are

improving their success rate is up a bit. It's better than GPT 5.2. I still don't think it's the

best. There's a major focus on kind of how it is being used professionally. Open AI says their

model right now is significantly better at basically giving the kind of deliverables that people

use in real work. So things like spreadsheets, presentations, financial models, legal analysis,

all of those. They've done a bunch of different tasks and they had one performed by junior

investment banker analyst. It got 87% compared to 68% that GPT 5.2 got. Some human evaluators also

preferred it about 68% of the time. They said it had better visuals and better structure. So

there's some cool stuff. Okay. Cool features that you might actually use today. This is the one I'm

very excited about. It has what they're calling steerability. But basically when you're,

when you're talking to chat GPT, it's available in the API too, which is I think crazy. But

it's on chat GPT. If you're talking to chat GPT and you can kind of see it's reasoning,

like it's thinking through some stuff and it puts a couple steps down and you realize it's going

in the wrong direction. You know, maybe you're like, hey, I'm trying to visit the best beach

for surfing. And it's like, okay, looking at beaches and quiet and you're like, oh crap,

like I'm in California. I don't want to see quiet and you're like, then you can set type of message

like specifically in California. And mid like prompt mid response that actually takes into account

what you just said and is, you know, steerability. It's going to go and incorporate that into it's

into what it's looking at and into its reasoning and give you an updated response. So basically,

you can do mid response prompts and it's going to take that into account and change its prompt and

give you better prompt mid response. So it's kind of interesting because I think they did a couple

clever things here. But one of them is like, when you ask a question, you have to wait for it to

think. You have to wait for it to reply. You sit there and you wait. We all hate waiting. And so

if in the middle of waiting, we're reading it's it's line of reasoning and we're giving a more

input and more feedback. It feels like we did a lot less waiting. We're really just kind of reading

and trying to throw on something and it can get it down faster and better rather than having to wait

for it to spit out the whole thing and you'd be like, okay, this is wrong. And here's why it's wrong.

And here's what you should do instead. You could do that in the middle of the chat conversation

response, which is really cool in my opinion. Something else I've kind of focused on right now is

online research. Apparently it can search across like a greater number of sources on the web.

So it can kind of instead of just like, okay, we're looking at this website, get in some data.

Now we're going to look at this website. It's going to go in and search just like a ton at the

same time across the web. And then it's going to follow leads across different pages. So it might

get an idea from something to read on one article. It's going to go follow that to another article

bounce around a lot more. So it's kind of doing like, I know we have had deep research for a while,

but it's doing deep research if that's the thing. And it's going to combine all the information that

it gets into one coherent answer. So basically this is going to be more useful for some of the more

complex questions where the information is kind of scattered across a lot of different sites inside

of sitting in one place. Now, not every question you ask, this is going to be relevant. But sometimes when

you have a complex and our question, it's going to be able to go get you a more coherent answer

quicker. So this is great. They have all this like kind of, I don't know, fluff in their launch

about how it hallucinates less. And it has less, you know, it has more, it has less factual errors

and all this kind of stuff. I don't think that's super important. One thing that we also heard about

is that it is going to, it's going to turn you down less. So like if you ask a question and they're

like, Hey, you know, I don't know, you ask a question. It's going to be, it's less likely allegedly

according to them. I'm going to like not answer. However, our good friend Connor Grannon, who host the AI

applied podcast with myself, he was testing. I saw a post he made on LinkedIn where he asked it,

is it true that air bubble inside of an IV can cause me or can, you know, could kill me? And it said,

you know, apparently it typed out the whole response to him kind of. And just like we saw with

like deep seek in the Chinese censored model, if you asked anything about Tiananmen Square to deep seek,

it like types it out and then it disappears. And it's like, sorry, can't answer this. Apparently

chat should be said, the exact same thing. And also, this is kind of a tricky moment because we're

seeing New York right now is trying to pass some legislation where they, they're saying, hey,

we don't, like they're basically trying to pass legislation saying, AI models can't ask

answer any questions about medical, health, legal. Like they have all of these different areas.

I think even hairstylists, they're trying to put in there. It's basically all of the,

all of the different industries with regulatory capture. They just don't want people to be able to

get the answers for free. So pretty, I don't know, kind of bummed about that legislation. And people

like seriously considering that. However, so it doesn't seem like it's that much better, but maybe

it's moving in a good direction. I'm not 100% sure. It still feels like there's other models that are

more of the adult in the room, but you also get pros and cons of those models. Grok famously is

going to answer any question you have about basically any of those topics, but you know, there's

a lot of different, there might be some other cons with Grok. So pros and cons to all of the models.

Thank you so much for tuning into the podcast today, guys. If you enjoyed the episode,

it would really help the show a ton if you left a rating review wherever you listened to your podcasts.

Just draw me a note, say if you enjoy it, you know, say where you're from, say, say what topics

are interesting to you? I read all the reviews and all the comments. It helps a ton. Also,

make sure you go check out AIbox.ai if you want to get access to all of these latest models in one

place. So you don't have to pay a $20 subscription to 10 different platforms. It's $899 a month

and you could access to over 40 different AI models. So go check it out, link in the description,

AIbox.ai. I'll catch you guys all in the next episode. Hello, it is Ryan. And I was on a flight

the other day playing one of my favorite social spin slot games on chumbacaceno.com. I looked over the

person sitting next to me and you know what they were doing. They were also playing chumbacaceno.

Everybody's loving having fun with it. Chumbacaceno is home to hundreds of casino style games that

you can play for free anytime, anywhere. So sign up now at chumbacaceno.com to claim your

free welcome bonus. That's chumbacaceno.com and live the chumbac. Sponsored by chumbacaceno.

OpenAI Launches ChatGPT 5.4

About this Episode

Hosts & Guests

More from AI Investing: for the AI Investor

SoftBank's $40B OpenAI Investment & Anthropic's Claude Mythos Leak

Claude Co-Work can now Control Your Computer

OpenAI Kills Sora Video Model

Bezos to Raise $100B for AI and Nvidia's Challenges