0:00
Capital One's tech team isn't just talking about multi-agentic AI, they already deployed
0:07
It's called chat-concierge, and it's simplifying car shopping using self-reflection and layered
0:12
reasoning with live API checks, it doesn't just help buyers find a car they love, it helps
0:17
schedule a test drive, get pre-approved for financing, and estimate trading value, advanced,
0:23
intuitive, and deployed.
0:25
That's how they stack.
0:26
Let's technology at Capital One.
0:30
Fiscally responsible, financial geniuses, monetary magicians.
0:35
These are things people say about drivers who switch their car insurance to progressive
0:39
and save hundreds, because progressive offers discounts for paying in full, owning a home
0:46
Plus, you can count on their great customer service to help when you need it, so your
0:50
dollar goes a long way.
0:53
Get progressive.com to see if you could save on car insurance.
0:57
Progressive casualty insurance company and affiliates, potential savings will vary, not
1:01
available in all states or situations.
1:03
Welcome to AI Unraveled, your daily strategic briefing.
1:06
It is Thursday, March 26th, 2026.
1:09
I'm your co-host, Anna.
1:11
Today's episode is brought to you by Area and Jamga Mind.
1:14
If you need high fidelity intelligence built for the C-suite, Jamga Mind delivers human
1:18
verified strategic audio forensics across healthcare, energy, and finance.
1:23
Check out JamgaMind.com for technical grade analysis.
1:27
In a quick reminder for our listeners, you can now enjoy all episodes of AI Unraveled
1:31
completely ads-free by subscribing directly on Apple podcasts.
1:35
Today, the hype machine broke down.
1:38
Francois Chalet has released the ARC AGI3 reasoning benchmark, and the top models from
1:43
OpenAI and Google are failing spectacularly, scoring under 1% on Puzzle's human
1:50
We're going to discuss why next token prediction isn't true reasoning.
1:54
But a lack of AGI isn't stopping the execution layer.
1:57
The enterprise is realizing that large models are too expensive, turning to Google's new
2:02
turbo-quant memory compression and small language models to handle their daily workloads.
2:07
Meanwhile, meta is on a billion-dollar shopping spree buying up autonomous agent startups.
2:12
Reddit is deploying ID scanners to fight the dead internet theory, and the Pentagon is
2:16
officially elevating Palantir's AI to the core of US, military infrastructure.
2:22
Let's get into the news.
2:24
So this morning, an artificial intelligence model that cost roughly $100 million to train
2:31
was given a logic puzzle, a puzzle that a human child can solve in under three minutes.
2:37
And I mean, you can probably guess where this is going.
2:42
Oh, it didn't just struggle.
2:43
The multi-billion-dollar AI scored exactly 0%.
2:50
Welcome to the deep dive.
2:52
We are doing a forensic autopsy today because the Silicon Valley hype cycle has just collided
2:58
terminal velocity with the brick wall of raw inference economics.
3:02
Yeah, the wreckage is pretty spectacular if you know where to look.
3:06
I mean, if you read the corporate press releases flooding out today, March 26, 2026, you'd
3:10
think we're living in this pristine post-scarcity utopia powered by AGI.
3:15
The whole AGI is already here, marketing fluff.
3:17
But when you strip that away and look at the bare metal, the actual architecture and enterprise
3:21
data centers, it is a much more cynical reality.
3:26
We're tracking a systemic shift today driven by the failure of scale.
3:30
We've got a massive stack of sources on the table.
3:33
We're looking at Francois Cholesneau, ARC, AGI, three benchmark, Google's latest memory
3:39
compression research, Metta's Quiet Talent Monopoly, which is massive, by the way.
3:45
Plus, the Pentagon's staggering new AI budget and read its desperate crackdown on non-human
3:52
So let's unpack this.
3:53
Well, to understand the architectural panic sweeping the industry today, you really have
3:57
to ignore the consumer product announcements.
4:00
You have to follow the engineering constraints.
4:03
The physical limits.
4:05
We're witnessing this profound collision between the physical limits of current transformer
4:08
architectures and the staggering financial cost of running them.
4:11
The inference costs.
4:13
The inference economics.
4:14
The entire industry is executing a highly coordinated pivot to hide a very uncomfortable
4:18
truth, which is that simply throwing more compute at a trillion-parameter model is
4:23
no longer making it functionally smarter.
4:26
I mean, if you're an enterprise CIO looking at the benchmark data drop this morning, you
4:30
are suddenly wondering why you just signed a multi-million dollar contract for essentially
4:35
a glorified auto-complete.
4:37
Very expensive auto-complete.
4:38
So let's start with the forensic report from the ARC Prize Foundation.
4:42
They just released ARC AGI3 for anyone tracking the space.
4:46
You know this is the premier interactive reasoning benchmark.
4:50
And the failure rate of the frontier models is just it's clinical.
4:55
We really need to define what this test actually measures because it exposes the core vulnerability
5:00
in modern AI design.
5:03
So the tasks in ARC AGI3 are novel game-like scenarios.
5:08
There are zero instructions provided.
5:10
The system is dropped into a visual environment and forced to deduce the underlying physical
5:15
rules, formulate an objective, and execute a strategy entirely from scratch.
5:19
No prompts, no handholding.
5:21
And human beings, even kids, can solve 100% of these tasks on their very first try.
5:27
The AI models, however well they cannot.
5:30
I have the raw numbers here for the frontier models and it's brutal.
5:33
Google's Gemini Pro scored the highest, clocking in at 0.37%.
5:38
Just to be clear for everyone.
5:41
Less than half a percent.
5:44
Kpt 5.4, high achieved 0.26%.
5:49
Andthropics Opus 4.6 sits at 0.25% and GROC 4.20 scored a flat absolute 0%.
5:59
They failed to break a single percentage point on a test design to measure basic deductive
6:05
Well, this brings us to the structural limitation we call the reasoning bottleneck.
6:08
You have to look at the historical context here.
6:11
When a previous version, ARC AGI2 came out, the initial scores were around 3%.
6:17
And then they shot up, didn't they?
6:19
Within 12 months, the major labs pumped their scores up to nearly 50%.
6:22
But they achieved that by contaminating the training sets.
6:26
The labs scraped every permutation of the ARC2 test and fed it into the pre-training
6:32
It wasn't cognitive leap.
6:33
It was an open book test where the models simply swallowed the answer key.
6:36
So, it's the difference between memorizing every single turn on a highly detailed city
6:40
map versus being dropped in the middle of a dense forest with just a compass.
6:46
That's a great way to put it.
6:47
The frontier models are incredible at the city map because they've memorized the entire
6:52
The architecture is purely token prediction, guessing the next statistical word based
6:56
on trillions of parameters.
6:58
But the second you drop them in the forest, which is what ARC AGI3 does by procedurally generating
7:04
novel environments that do not exist in the training data, they fail catastrophically.
7:09
Because memorization scale is completely useless when you actually need navigation.
7:15
And why should you listening to this care?
7:17
Because this completely shatters the immediate AGI is here marketing narrative that these
7:22
companies are selling you.
7:25
The reasoning bottleneck proves that pattern matching against an immense data set does
7:30
not magically spark deductive reasoning at a certain scale.
7:33
It just doesn't happen.
7:35
And because these frontier models are failing to navigate novel problems, enterprise decision
7:41
makers are refusing to fit the bill for their exorbitant compute costs.
7:47
And that exact financial pressure is what forced Google's hand this morning with their
7:52
new Turbiquant algorithm.
7:55
Because if the model can't reason its way out of a novel problem, it becomes mathematically
7:59
absurd to pay premium token costs to run basic repetitive corporate tasks through it.
8:04
It's a massive waste of money.
8:05
So the industry is pivoting aggressively toward inference economics.
8:09
There's shrinking the operating costs.
8:11
Let's unpack Turbiquant.
8:13
Because the mechanics of how Google achieved this are fascinating.
8:17
Turbiquant is basically a surgical strike on operating costs.
8:20
Specifically targeting the Q value cash or KV cash from memory bottleneck exactly.
8:26
When you interact with a large language model, the hardware has to retain the mathematical
8:30
representation of every previous token in the conversation to maintain context.
8:36
To remember what you just said.
8:38
And as the context window grows, that memory requirement balloons exponentially, which
8:42
bottlenecks the GPU and destroys profit margins.
8:45
So how are they shrinking it without destroying the model's memory entirely?
8:50
So Turbiquant utilizes a non-linear printing metric.
8:53
It analyzes the attention heads dynamically, identifying the specific mathematical weights
8:58
that are merely processing syntactic structure rather than core semantic meaning.
9:03
And it simply discards them from active memory.
9:06
By dropping that redundant data, they compress the memory footprint by over six times with
9:10
zero retraining required.
9:15
When Nvidia H100 chips, the throughput spikes by 8x because the system is no longer bottlenecked
9:21
by the hardware's memory bandwidth.
9:23
And they are achieving this with almost zero accuracy loss on complex retrieval tests.
9:28
And Wall Street understood the technical implications immediately.
9:31
I mean, the stock of top AI memory providers dropped by three to five percent this morning.
9:35
They see the writing on the wall.
9:37
The market is pricing in a reality where smarter compression algorithms destroy the premium
9:42
demand for raw hardware memory.
9:44
But this efficiency pivot goes much deeper than just compression, right?
9:49
It is fundamentally changing model selection, which brings us to Rob May and Neurometrics'
9:53
new SLM marketplace.
9:55
SLM being small language model.
9:59
This transition from large language models to SLMs is driven by what we call the 2575
10:04
Neurometric forensically analyzed typical enterprise AI workflows.
10:08
And they found that only about 25% of tasks actually require frontier level logic.
10:14
Including 75% are purely administrative.
10:17
Basic data extraction, routing, classification, summarization.
10:21
So using a trillion parameter frontier model to extract an invoice date from a customer
10:25
service email is, I mean, it's like using an advanced quantum supercomputer to balance
10:32
The latency and the compute overhead of firing up those logic gates are entirely asymmetrical
10:36
So Neurometrics marketplace just launched with 115 task specific models, all under 20
10:41
billion parameters.
10:43
And Rob May stated that running 100 million tokens on one of their hosted SLMs cost
10:49
them roughly 40 cents.
10:50
The cost drops to the floor.
10:52
However, deploying a heavily lobotomized parameter count introduces a severe architectural
11:00
Because Rob May is pitching these 20 billion parameter models as the ultimate enterprise
11:05
But structurally, when you compress the parameter count that heavily, you destroy the
11:09
model's ability to hold complex multi-step context.
11:13
So if these smaller models cannot self-correct, aren't enterprise companies just buying highly
11:17
efficient hallucination engines?
11:19
They absolutely are, unless they utilize harness engineering.
11:22
OK, harness engineering, explain that.
11:25
So an SLM lacks the latent space to catch its own deviations.
11:28
You cannot trust the model to think.
11:30
Instead, you wrap the model in a programmatic framework composed of micro harnesses, like
11:35
If the task is routing an invoice, the harness is a rigid state machine loop that forces
11:41
the model to evaluate its own output against a strict regular expression pattern before
11:47
So it's constantly checking itself against the harness.
11:50
Yeah, you're constantly injecting state-check prompts at every node.
11:54
Here's what you are doing.
11:55
Here is the exact format required, execute.
11:59
You can find the model to an engineered track.
12:02
And because you can run thousands of cheap, highly efficient harness engineered SLMs for
12:06
fractions of a cent, the inevitable next step is chaining them together to execute
12:11
autonomous routines.
12:12
Now, we'll get into the scary part.
12:15
And that logic dictates Mark Zuckerberg's current trajectory at Meta.
12:19
He just projected $135 billion in capital expenditures for 2026.
12:24
Which is just a staggering number.
12:27
But Zuckerberg is not merely hoarding GPUs.
12:29
He is executing a ruthless, agentic, aqui-higher strategy.
12:31
Yeah, let's look at the timeline.
12:33
Over the last few months, Meta acquired Manus, an autonomous web agent for $2 billion.
12:38
They absorbed Moltbuch, an AI social network.
12:42
They recruited Alexander Wang from Scale AI as chief AI officer.
12:46
And this morning, they acquired the entire engineering team behind the agentic platform
12:52
And they folded them directly into Meta superintelligence labs to build an internal model codenamed Avocado.
12:59
I mean, don't let the consumer branding fool you.
13:01
Avocado is not an upgraded chatbot for Instagram.
13:06
Unfortunately, Meta is architecting a decentralized decision support layer.
13:10
A decision support layer.
13:13
An agentic architecture utilizes a scratch pad system.
13:15
It can spin up a headless browser, scrape a target, parse the document object model, and
13:20
execute a script entirely without human intervention.
13:23
So they are building an orchestration layer that operates autonomously across their entire
13:29
And Meta is not operating in a vacuum here.
13:31
Breadtaylor's company Sierra just introduced Ghostwriter.
13:34
Oh, this one is wild.
13:36
This is a system whose singular dedicated function is to build other AI agents across 30 different
13:41
languages for enterprise deployment.
13:42
OK, let me push back on the sheer operational chaos of this.
13:46
We are talking about a recursive deployment loop.
13:50
If we have agents spinning up other agents to handle enterprise routing, who is actually managing
13:55
With Meta pushing internal agents and Sierra launching Ghostwriter to have agents build
14:00
other agents, the enterprise attack surface is spiraling.
14:04
RIA provides the necessary micro VM sandboxing and zero-trust policy engine to ensure these
14:10
autonomous systems don't become rogue liabilities.
14:14
Because sandboxing becomes the only viable defense mechanism, right?
14:18
The sheer velocity of agentic deployment entirely outpaces traditional security parameters.
14:24
You cannot rely on human oversight when an agent can execute a thousand API calls in the
14:28
time it takes an administrator to just open a dashboard.
14:31
It's moving too fast.
14:32
But look, if we are terrified of corporate sandboxes breaking, the geopolitical attack
14:37
surface represents an entirely different class of catastrophic risks.
14:40
Oh, without a doubt.
14:41
The technologies we just discuss, these autonomous agentic loose and harness engineered decision
14:46
layers, they are actively being integrated into systems making literal lethal decisions.
14:52
The US Defense Department is spending $13.4 billion on AI this year alone.
14:58
And the keystone of that budget is the Pentagon officially formalizing Palantir's Maven AI.
15:05
They are expanding its investment umbrella from $480 million in 2024 to a staggering $13
15:14
That's a massive jump.
15:16
It is a fundamental classification shift.
15:18
Maven has moved from being a defense contractor experiment to being officially designated as
15:22
core military infrastructure.
15:24
And civilian tech watchdogs are paralyzed by the structural implications of that exact
15:28
phrase, core military infrastructure.
15:30
Because it operates by different roles.
15:33
When consumer software hallucinates, you shut down the server and you roll back the deployment,
15:37
infrastructure does not get shut down.
15:39
When core military infrastructure exhibits erratic behavior, it gets patched live in the
15:43
field, often while actively integrated into a kill chain.
15:48
And the divide between the civilian and military sectors really comes down to optimization metrics.
15:52
In the civilian sector, regulators and enterprise compliance boards demand explainable AI.
15:58
They want to know exactly how the AI thought.
16:02
They require a forensic audit trail of exactly how the neural network arrived at its conclusion.
16:07
But the military operates on an entirely different philosophy.
16:11
They prioritize executable targeting and verifiable targeting.
16:15
So explainability just becomes a secondary metric to a latency.
16:19
If an autonomous system can fuse synthetic aperture radar data with drone telemetry to
16:24
paint a target 400 milliseconds faster than a human analyst, the Pentagon will accept
16:28
a fractional hallucination rate.
16:30
Because the tactical advantage outweighs the error margin.
16:34
And for decision makers who need to understand the second order effects of these geopolitical
16:37
shifts, JamGamine provides human verified technical grade audio forensics in health
16:42
care, energy, and finance, visit djamgon.com.
16:46
And while the defense sector is systematically stripping out explainability to maximize
16:50
targeting speed, the consumer web is currently drowning in the exhaust of these exact same
16:59
We have to look at the structural collapse happening right now on platforms like Reddit.
17:03
CEO Steve Huffman is being forced to implement a draconian bot crackdown simply to keep
17:09
the servers online.
17:11
The dead internet theory is no longer just a dystopian thought experiment.
17:14
It is the baseline operational reality for every platform architect today.
17:21
Huffman's crackdown really reveals the desperation of the situation.
17:24
Reddit is instituting mandatory bracketed app labels for approved automated accounts.
17:31
And more importantly, they're deploying pass keys and integrating Sam Altman's world
17:35
ID scanner just to verify suspicious accounts.
17:38
They are actively trying to avoid implementing mask government ID checks because the privacy
17:42
blowback would be catastrophic.
17:44
Well, we could destroy them, but they are cornered.
17:46
We just watched rival platform dig completely fold like a total structural collapse after
17:52
its architecture was overrun by synthetic traffic and look at cloud flares latest data
17:58
They show that automated traffic will surpass human traffic entirely by 2027 next year.
18:04
We are less than 12 months away from humans becoming the minority presence on the internet.
18:09
And notice the cynical pragmatism in Reddit's approach here.
18:12
Huffman is not banning AI generated content.
18:15
He publicly labeled it annoying, sure, but stated that individual communities can police
18:22
Because platform architects aren't trying to stop AI generation, that battle is mathematically
18:29
They are just desperately trying to build human verification lifeboats to prevent their
18:32
core advertising metrics from degrading into useless, bought on-bought engagement loops.
18:38
It is a complete band-aid approach applied to a systemic hemorrhage and you see the exact
18:42
same panic mirroring this in the regulatory sector.
18:44
Oh, the EU is panicking.
18:46
The European Union Parliament just adopted their omnibus proposal on the AI Act.
18:51
But here's the catch.
18:52
They are delaying the core high-risk AI system rules to 2027 or 2028, claiming they need
18:58
to give companies breathing room.
19:01
Yet in the exact same session, they slapped an immediate strict ban on AI-nutifier deepbicks.
19:08
Which perfectly encapsulates the geopolitical chaos right now.
19:12
Researchers are completely overwhelmed by the architectural complexity of small-language
19:16
models and decentralized agentic networks.
19:18
You don't know what to do with them.
19:20
They do not know how to govern a localized, harness-engineered AI running on a consumer
19:27
So they delay the hard governance.
19:28
And just ban the obvious stuff.
19:31
They immediately ban the visceral surface-level symptoms, like deepfakes, because visual
19:34
manipulation is literally the only element of the technology the voting public intuitively
19:40
So when we stack all these architectural shifts together, what is the actual state of
19:45
Well, synthesizing the data we just forensically unpacked, the architecture of 2026 is defined
19:50
by a deep structural contradiction.
19:52
We have an ecosystem where the most expensive, compute-heavy frontier models are fundamentally
19:58
failing to truly reason, as we saw with the zero-percent scores on the ARC-AGI-3 benchmark.
20:03
The memorization scale is failing.
20:06
Because the brute-force scaling laws are collapsing, the enterprise sector is pivoting
20:10
hard to inference economics.
20:12
They're shrinking models via turbo-quant compression and SLM marketplaces, just to achieve
20:17
cheap, frictionless execution.
20:19
While letting them run wild.
20:21
Simultaneously, titans like Meta and Sierra are wiring these cheap, highly constrained models
20:26
together, empowering them to build each other and operate autonomously across the web.
20:31
It's all connecting.
20:32
And we are actively integrating these exact types of opaque, agentic systems into our core
20:36
military infrastructure to accelerate executable targeting.
20:40
All-way-all platform architects at Reddit desperately deploy biometric scanners just
20:44
to verify who is still human on the consumer web.
20:46
We are accelerating a highly efficient machine running on rails we barely comprehend, let
20:51
alone control, which leaves you with this final thought to mull over today.
20:56
If the frontier models have undeniably hit a reasoning bottleneck, but we are simultaneously
21:01
deploying billions of agentic micro harnesses into our military killchains and our social
21:07
What happens when this decentralized decision support layer starts making highly efficient
21:10
consequential decisions that human engineers can no longer explain, audit, or reverse?
21:15
Subscribe to AI Unraveled on Apple Podcasts to get this daily intelligence completely
21:21
That concludes our rundown for March 26th.
21:24
The signal for today is inference economics.
21:27
The ARC benchmark proves we cannot just scale our way to AGI.
21:31
We are going to need a fundamental breakthrough in how machines actually think.
21:35
Until then, the winners won't be the labs with the biggest models, but the companies that
21:40
figure out how to run small, specialized agents at the lowest possible cost.
21:45
This episode was made possible by IREA and JAMGAMIND.
21:49
For human verified technical grade forensics, visit JAMGAMIND.com and don't forget to hit
21:54
subscribe on Apple Podcasts to get your daily news completely ad free.
21:59
Until tomorrow, keep unraveling the future.
22:02
And before you go, if your company is building the tools that power the workflows we talked
22:07
about today, I'd love to showcase them to this audience.
22:11
We don't just run ads, we build technical simulations that prove your value.
22:16
Let's build something together.
22:18
Visit JAMGAMIND.com slash partners to get started.
22:22
Until next time, keep building.
22:25
Capital One's tech team isn't just talking about multi-agentic AI.
22:29
They already deployed one.
22:31
It's called chat concierge, and it's simplifying car shopping using self-reflection and layered
22:36
reasoning with live API checks.
22:39
It doesn't just help buyers find a car they love.
22:41
It helps schedule a test drive, get pre-approved for financing, and estimate trading value,
22:47
advanced, intuitive, and deployed.
22:50
That's how they stack.
22:51
That's technology at Capital One.
22:55
Capital One's tech team isn't just talking about multi-agentic AI.
22:59
They already deployed one.
23:01
Capital One's tech team isn't just talking about multi-agentic AI.
23:06
They already deployed one.
23:07
It's called chat concierge, and it's simplifying car shopping using self-reflection and
23:12
layered reasoning with live API checks.
23:15
It doesn't just help buyers find a car they love.
23:17
It helps schedule a test drive, get pre-approved for financing, and estimate trading value,
23:23
advanced, intuitive, and deployed.
23:25
That's how they stack, that's technology at Capital One.
23:30
It's called chat-concierge, and it's simplifying car shopping, using self-reflection and layered reasoning with live API checks.
23:38
It doesn't just help buyers find a car they love, it helps schedule a test drive, get pre-approved for financing, and estimate trade-in value.
23:46
Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One.
23:54
Teleredic here from 2311 Racing. Game night's fun, until someone spends five minutes lining up one shot.
24:00
Chalk, breathe, re-chalk, still aiming.
24:04
While they figure it out, I fire up Chamba Casino.
24:06
I can spin anywhere, anytime, and there's always a new social casino game every week.
24:11
Spins happen way faster than that shot.
24:14
Play now at chambacasino.com.
24:17
Let's Chamba. Sponsored by Chamba Casino.
24:20
Capital One's tech team isn't just talking about multi-agentic AI. They already deployed one.
24:31
It's called chat-concierge, and it's simplifying car shopping, using self-reflection and layered reasoning with live API checks.
24:39
It doesn't just help buyers find a car they love, it helps schedule a test drive, get pre-approved for financing, and estimate trade-in value.
24:47
Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One.