Loading...
Loading...

Get the top 40+ AI Models for $8.99 at AI Box: https://aibox.ai
AI Chat YouTube Channel: https://www.youtube.com/@JaedenSchafer
Join my AI Hustle Community: https://www.skool.com/aihustle
Opening it has just rolled out ChatGPT 5.4.
There's actually a couple of cool features in here
that I'm really excited about
that I've been wishing ChatGPT
has been able to do in the past
and they finally launched it.
And of course, if you look at all of their marketing,
it's gonna just basically be them saying,
this is our most capable model yet.
And of course, it's the most capable model if it wasn't,
I mean, what would they even be making an update for?
So I'm just gonna get past all of the hype
and all of the buzz from what they said in the launch.
And I'm gonna tell you some really interesting use cases
and some ways that I actually think this is useful.
GPT 5.4.
Before we get into all of that,
if you wanna try all of the latest models,
go check out my startup AIbox.ai.
We have the latest models from the top 15 different AI
companies, everything from GROC to Gemini,
to Anthropic to OpenAI, 11 Labs for Audio,
tons of cool image generation models.
I think there's over 50 models on the platform total.
You could try all of them side by side
and it's only 899 a month.
So much cheaper than ChatGPT,
but you get way more models.
And of course, you can also use it
to automatically create AI workflows
that can complete tasks for you that are automated.
So there's a ton of cool stuff going on,
but go check out AIbox.ai if you wanna get access
to all of the top models for only 899 a month.
And it's 21% off if you get an annual plan as well.
So there's a lot of cool stuff there.
All right, let's get into what's going on.
The first thing I wanna mention here
is that this is called GPT 5.4 thinking.
They have a higher performance variant
that is known as GPT 5.4 Pro,
but both of these together are designed
to kind of handle everything from some complex analysis.
They do a lot of coding,
a lot of long running workflows
across a lot of different professional software tools.
And they're kind of dubbing this
as like their professional work tool.
They're trying to get into, you know,
into the hands of more working professionals.
And this is coming right on the backs of them,
signing a whole bunch of deals
with a bunch of different consulting firms
that are gonna allegedly get ChatGPT
into more businesses and in kind of the professional environment.
At the same time, they're having kind of this,
you know, they're locked in a battle
with even Google's in this right now,
but really with Anthropics Cloud Code,
their Codex tool, they're really trying to push forward
in kind of how software is using AI models
and how computer use is going on.
So this is where they're really focusing.
One of the most, one of the biggest changes
basically about this is the scale.
So in the API GPT 5.4 has a context
when to have up to a million tokens
which basically lets them work with huge documents,
really big conversations, big data sets.
And really, I mean, if you think about this,
a huge benefit is going to be coding
where you can look at bigger code bases
to actually work with.
So this was something that Anthropics was really crushing at
and then opening eyes trying to get into this.
Opening it also says that the model is specifically more,
what they're saying is token efficient,
which this is actually one thing that I'm excited about.
Basically can solve the same problems
using a lot less tokens and GPT 5.2.
So your costs are going to come down.
It's actually kind of cool.
If you already had 5.2 running in a software,
which even if you don't, a lot of the software you use
will, the costs come down a lot for that.
And it also gets a lot faster.
So the costs come down and the speed goes up.
And so yeah, for me, this is something
I'm actually excited about.
So as far as how the benchmarks look,
I know I'm not trying to like sit here
and nitpick the benchmark percentages,
but I did want to talk about some interesting use cases
and reasons why these are, why they're good.
Specifically, it's kind of leading on a bunch
of the better known benchmarks.
One of those is for coding.
Of course, we know why that's important right now,
but also computer use.
And this is something that I'm excited about right now.
I feel like Anthropic is really crushing with computer use.
Basically, you know, it can look at everything
on your computer and go click on stuff
and get stuff done for you.
This is a use case that I've been using a lot
with Claude's Anthropic browser,
the Claude Brown Chrome browser extension.
Basically, it's a button you click.
It opens a side chat bar.
I go to really complex UI or complex websites.
I'm not a developer, but if I'm going into like,
for example, recently I had to do some stuff
on Google Cloud to set up a tool
that I was vibe building on Lovable
and then needed to beef up my back end.
So it could, you know, do some extra fancy stuff.
I didn't really understand anything that Lovable
was telling me I needed to be able to do.
So I opened up the Claude side bar, told it,
look, I'm on my, you know, my Google Cloud account,
go and hear the instructions from Lovable
and it clicked around and set up some stuff for me.
Now, should I have a real developer look over this?
I mean, we're going to throw caution to the wind
for the time being and I hear all the developers screaming
into their headphones right now.
But at the end of the day, it got it done
and my software is now functioning
and I did not have to watch a whole bunch of long YouTube
tutorials on how to set up some complex,
I mean, for me, complex because I have no idea
how to code Google Cloud stuff.
So this is a really incredible use case for a lot of reasons.
And I think opening up beefing up
or capabilities and computer use is really exciting
because they're going to start competing more directly
with OpenAI.
I mean, this is not like error with Anthropic.
It's not like Anthropic is kind of like
the only one working on this.
OpenAI has been doing this for a long time with agents,
but it feels like it's getting a lot better.
Okay, the other one that I'm excited for
is they're getting a lot better at knowledge work.
And so I mean, these are kind of things
that I think everybody uses it for.
So this is something we're just going to see
some incremental improvements on.
On OpenAI's GDPVAL benchmark,
which basically checks tasks.
It has up like 44 different occupations.
So it's kind of like showing you how you can use this
for different professionals.
It is exceeding industry professionals
in 83% of comparison.
So they're like, look, these are all the tasks
that people in all of these different professional industries
are doing.
It is better than 83% or it's, you know,
it's beating one of industry professional
might give you in 83% of these cases,
specifically I think for knowledge work.
And it has a really big jump from achieving
about 71% and the GPT 5.2 is getting.
So upgrading this to now GPT 5.4,
we're getting from 71% to 83%.
It just basically is going to be a lot better
for knowledge work.
I mean, and by a lot better,
I mean, we're seeing, you know, at 10% jump here
or, you know, 12% jump here,
which is pretty significant.
On some of the coding benchmarks,
so SWE, SWE Bench Pro,
it's a, you know, software engineering bench pro.
The model is getting slightly better than the last version.
So I mean, this is good, but, you know,
beyond just getting slightly better,
it is actually quite a bit faster.
So if anybody has used a lot of these software tools,
is specifically, we use Cloud Code AI Box.
My developer sends me screenshots of like,
because of these really long elaborate tasks
that it's doing on our back end, our code base.
And he, I swear it's like a goal for him
to see how long he can get Cloud Code to run continuously
without stopping on a project.
He gives it.
It's funny because I'm, you know,
vibe coding stuff on Lovable
and I usually get a lovable response back to me
and like, you know, a minute or two.
He has a go for like three and a half hours doing a task.
So in this model gets faster,
I'm excited because hopefully about three and a half hours
it gets cut down on, you know,
some of the stuff that we're working on.
I think one of the things that it's also very good at
is for real computer interaction.
There's an OS world verified.
It basically evaluates how well an AI can operate
a desktop environment.
It's, you know, pretty much just like,
takes a screenshot and then it uses the keyboard
and mouse commands to go and click stuff.
Right now it has about a 75% success rate.
I've used chat to 50 agents.
It's not perfect.
It's actually not my go to.
I don't use it that much.
I wish I could use it more.
I think Anthropic is doing better in this.
But 75% success rate, like they are improving
their success rate is up a bit.
It's better than GPT 5.2.
I still don't think it's the best.
There's a major focus on kind of how it is being used
professionally.
Open AI says their model right now is significantly better
at basically giving the kind of deliverables
that people use in real work.
So things like spreadsheets, presentations,
financial models, legal analysis.
All of those, they've done a bunch of different tasks
and they had one performed by a junior investment banker
analyst.
It got 87% compared to 68% that GPT 5.2 got.
Some human evaluators also preferred it about 68% of the time.
They said it had better visuals and better structure.
So there's some cool stuff.
Okay, cool features that you might actually use today.
This is the one I'm very excited about.
It has what they're calling steerability,
but basically when you're talking to chat GPT,
it's available in the API too, which is I think crazy.
But it's on chat GPT.
If you're talking to chat GPT and you can kind of see
it's reasoning, right?
Like it's thinking through some stuff
and it puts a couple steps down
and you realize it's going in the wrong direction.
You know, maybe you're like,
hey, I'm trying to visit like the best beach for surfing.
And it's like, okay, looking at beaches and quiet
and you're like, oh crap, like I'm in California,
I don't wanna see quiet and you're like,
then you can type a message like specifically in California
and mid like prompt mid response
that actually takes into account what you just said
and is steerability, it's going to go and incorporate
that into what it's looking at and into its reasoning
and give you an updated response.
So basically you can do mid response prompts
and it's gonna take that into account
and change its prompt and give you better prompt mid response.
So it's kind of interesting
because I think they did a couple clever things here,
but one of them is like, when you ask a question,
you have to wait for it to think,
you have to wait for it to reply,
you sit there and you wait.
We all hate waiting and so if in the middle of waiting,
we're reading its line of reasoning
and we're giving a more input and more feedback,
it feels like we did a lot less waiting.
We're really just kind of reading
and we're trying to throw in something
and it can get it down faster and better
rather than having to wait for it to spit out the whole thing
and you'd be like, okay, this is wrong
and here's why it's wrong
and here's what you should do instead.
You could do that in the middle
of the chat conversation response,
which is really cool in my opinion.
Something else I've kind of focused on right now
is online research.
Apparently it can search across
like a greater number of sources on the web
so it can kind of instead of just like,
okay, we're looking at this website,
get in some data and now we're gonna look at this website.
It's gonna go and search just like a ton
at the same time across the web
and then it's gonna follow leads across different pages.
So it might get an idea from something to read on one article,
it's gonna go follow that
to another article bounce around a lot more.
So it's kind of doing like,
I know we have had deep research for a while,
but it's doing deep research if that's the thing
and it's gonna combine all the information
that it gets into one coherent answer.
So basically this is gonna be more useful
for some of the more complex questions
where the information is kind of scattered
across a lot of different sites
instead of sitting in one place.
Now, not every question you ask,
this is gonna be relevant,
but sometimes when you have a complex question,
it's gonna be able to go get you
a more coherent answer quicker.
So this is great.
They have all this like kind of,
I don't know, fluff in their launch
about how it hallucinates less
and it has less, you know,
it has more, it has less factual errors
and all this kind of stuff.
I don't think that's super important.
One thing that we also heard about it is that it is gonna,
it's gonna turn you down less.
So like if you ask a question and they're like,
hey, you know, I don't know,
you ask a question, it's gonna be,
it's less likely allegedly according to them,
I'll meant to like not answer.
However, our good friend Connor Grenin,
who host the AI Applied Podcast with myself,
he was tested and I saw a post he made on LinkedIn
where he asked it, is it true that air bubble inside of an IV
can cause me or can, you know, could kill me?
And it said, you know, apparently it typed out
the whole response to him kind of
and just like we saw with like DeepSeek
and the Chinese censored model,
if you ask anything about Tiananmen Square to DeepSeek,
it like types it out and then it disappears
and it's like, sorry, can't answer this.
Apparently Chad should be said the exact same thing.
And also this is kind of at a tricky moment
because we're seeing New York right now
is trying to pass some legislation
where they're saying, hey, we don't,
like they're basically trying to pass legislation
saying AI models can't answer any questions
about medical, health, legal.
Like they have all of these different areas.
I think even hairstylists, they're trying to put in there.
It's basically all of the,
all of the different industries with regulatory capture,
they just don't want people to be able to get the answers
for free.
So pretty, I don't know, kind of bummed about that legislation
and people like seriously considering that.
However, so it doesn't seem like it's that much better
but maybe it's moving in a good direction.
I'm not 100% sure.
It still feels like there's other models
that are more of the adult in the room
but you also get pros and cons
with those models.
Grock famously is gonna answer any question you have
about basically any of those topics
but there's a lot of different,
there might be some other cons with Grock.
So pros and cons to all of the models.
Thank you so much for tuning into the podcast today guys.
If you enjoyed the episode,
it would really help the show a ton
if you left it a rating review
wherever you listen to your podcasts.
Just draw me a note, say if you enjoy it,
say where you're from, say what topics are interesting to you.
I read all the reviews and all the comments.
It helps a ton.
Also, make sure you go check out AIbox.ai
if you want to get access to all of these latest models
in one place so you don't have to pay a $20 subscription
to 10 different platforms.
It's $899 a month and you get access
to over 40 different AI models.
So go check it out, link in the description,
AIbox.ai.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.
I'll catch you guys all in the next episode.

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning