Loading...
Loading...

This episode argues that the most important AGI threshold has already been crossed. As coding agents learn to reason, iterate, and operate autonomously over long horizons, they unlock a form of functional general intelligence that matters for real work. Coding isn’t just another domain—it’s a universal lever that collapses the distance between idea and execution, reshaping how companies build, decide, and compete. The result isn’t a gradual improvement, but a structural shift in how work gets done.
Readings from:
https://x.com/gradypb/status/2011491957730918510
https://x.com/danshipper/status/2011617055636705718
Brought to you by:
KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcasts
Zencoder - From vibe coding to AI-first engineering - http://zencoder.ai/zenflow
Optimizely Opal - The agent orchestration platform build for marketers - https://www.optimizely.com/theaidailybrief
AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/brief
LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/
Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/
The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.
The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Interested in sponsoring the show? [email protected]
Today on the AI Daily Brief, why Code AGI is Functional AGI and why Functional AGI is here.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right friends, quick announcements before we dive in.
First of all, thank you to today's sponsor Zencoder, robots and pencils,
section and super intelligent to get an ad-free version of the show, go to patreon.com slash
AI Daily Brief. And if you are interested in sponsoring the show, send us a note at sponsors at
aidealybrief.ai. So we are back now with another long-read slash big think episode. And this week
we're getting into a topic that I have been kind of obsessing about for the last several weeks.
It feels to me quite clear that something dramatic has shifted. Obviously I don't mean
some new model that changes everything, but more it feels as though we've digested what the latest
round of models is actually capable of. We've had enough time with them for them to start to shift
our behaviors. And the implication of all of that is fundamentally speaking some different new era
in the story of AI and more broadly in the story of work. It is a shift which I am still trying
to figure out how to put words around, but one that I am convinced has profound implications
for how companies do what they do. To some extent, the shift is starting to come home to roost
in a concerted conversation around whether we are finally at AGI. I will argue that we are with
some nuance. But what I'm going to do first is read some excerpts from a recent piece by Sequoia's
Pat Grady called 2026. This is AGI, followed up with a more skeptical piece by every Dan Shipper,
called Toward a Definition of AGI. And then I'm going to add my own thoughts,
steel manning both perspectives, and trying to end with where I think is the most useful place to be.
Let's start with Pat's piece. It's actually by Pat Grady and Sonya Huang. And begins, years ago,
some leading researchers told us that their objective was AGI, eager to hear a coherent definition
we naively asked, how do you define AGI? They paused, looked at each other tentatively,
and then offered up what's become something of a mantra in the field of AI.
Well, we each kind of have our own definitions, but we'll know it when we see it. The vignette
typifies our quest for a concrete definition of AGI. It has proven elusive. While the definition
is elusive, the reality is not. AGI is here now. Coding agents are the first example. There are more
on the way. Long horizon agents are functionally AGI, and 2026 will be their year.
Now, in the next section, Pat and Sonya make sure to qualify that they do not have any
sort of scientific authority to propose this definition. And yet, with that said, they offer
what they call a functional definition of AGI. AGI, they write, is the ability to figure things out.
That's it. A human who can figure things out has some baseline knowledge, the ability to reason
over that knowledge and the ability to iterate their way to the answer. An AI that can figure
things out has some baseline knowledge, pre-training, the ability to reason over that knowledge,
inference time compute, and the ability to iterate its way to the answer, long horizon agents.
First ingredient, knowledge and pre-training, is what fueled the original chat GPT moment in 2022.
The second, reasoning and inference time compute, came with the release of a one in late 2024.
The third, iteration and long horizon agents, came in the last few weeks with Claude code and
other coding agents crossing a capability threshold. Generally intelligent people can work
autonomously for hours at a time, making and fixing their mistakes and figuring out what to do
next without being told. Generally intelligent agents can do the same thing. This is new.
So what's an example of this new capability that they're talking about? They provided
an example of a founder telling his agent that he needs a developer relations lead. He gives
a set of qualifications, including the fact that this person needs to enjoy being on Twitter.
The agent starts in an obvious place. LinkedIn searches for developer advocate, for example.
Unfortunately, it finds hundreds of examples, so it has to iterate. It pivots they write to
signal over credentials. It searches YouTube for conference talks. From there, it finds 50 plus
speakers and filters for those with talks that have strong engagement. Next, because of that
Twitter qualification, it cross references those speakers with Twitter. The total number is now
whittled down to a dozen with real followings and posting real opinions. Honing in even further
for who's been most engaged in the last few months, that total list, which was hundreds and then
50 and then dozen, is now down to three. Now it can hone in on those three. One just announced
the new role. One is the founder of a company that just raised funding. The third was a senior
dev route at a series D company that just did layoffs in marketing. The agent they write
drafts an email acknowledging her recent talk, the overlap with the startup's ICP,
and a specific note about the creative freedom a smaller team offers. It suggests a casual
conversation, not a pitch. Total time, 31 minutes. The founder has a short list of one,
instead of a JD posted to a job board. This patent's on your right is what it means to figure
things out, navigating ambiguity to accomplish a goal, forming hypotheses, testing them,
hitting dead ends, and pivoting until something clicks. The agent didn't follow a script. It ran
the same loop a great recruiter runs in their head, except it did it tirelessly in 31 minutes
without being told how. To be clear, agents still fail. They hallucinate, lose context,
and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable,
and the failures are increasingly fixable. So what, while soon they say you'll be able to hire an
agent, which with a hat tip to Sarah Guo, they call one that missed tests for AGI. You can hire
GPT 5.2 or Quad or GROC or Gemini today. More examples are on the way. In medicine, open evidence
is deep consult functions as a specialist. In law, Harvey's agents function as an associate.
They go through examples in cybersecurity, DevOps, go to market, recruiting math, semiconductor
design, and AI research. All of this they say has profound implications for founders.
The AI applications of 23 and 24 were talkers. Some were very sophisticated conversationalists,
but their impact was limited. The AI applications of 26 and 27 will be doers. They will feel like
colleagues. Usage will go from a few times a day to all day every day, with multiple instances
running in parallel. Users won't save a few hours here and there. They'll go from working as an
IC to managing a team of agents. Remember all that talk of selling work? Now it's possible.
What work can you accomplish? The capabilities of a long horizon agent are drastically different
than a single forward passive of model. What new capabilities do long horizon agents unlock in
your domain? What tasks require persistence? Where sustained attention is the bottleneck?
Saddle up they say it's time to ride the long horizon agent exponential. Today your agents can
probably work reliably for around 30 minutes, but they'll be able to perform a day's work of
work very soon and a centuries worth of work eventually. Ultimately they write, the ambitious
version of your roadmap just became the realistic one. Let's move over to Dan Shippers toward a
definition of AGI. Dan writes, when an infant is born, they are completely dependent on their
caregivers to survive. They can't eat, move, or play on their own. As they grow, they learn
to tolerate increasingly longer separations. Gradually the caregiver occasionally and intentionally
fails to meet their needs. The baby cries in their crib at night, but the parent waits to see if
they'll self-sooth. The toddler wants attention, but the parent is on the phone. These small, manageable
disappointments, with the psychologist DW Winnecott called good enough parenting, teach the child
that they can survive brief periods of independence. Over months and years, these periods extend from
seconds to minutes to hours until eventually the child is able to function independently.
A.I. is following the same pattern. Today we treat A.I like a static tool we pick up when needed
and set aside when done. We turn it on for specific tasks, writing an email, analyzing data,
answering questions, then close the tab. But as these systems become more capable,
we'll find ourselves returning to them more frequently, keeping sessions open longer and
trusting them with more continuous workflows. We already are. So here's my definition of AGI.
Artificial general intelligence is achieved when it makes economic sense to keep your agent running
continuously. In other words, we'll have AGI when we have persistent agents that continue thinking,
learning, and acting autonomously between your interactions with them, like a human being does.
I like this definition because it's empirically observable. Either people decide it's better to
never turn off their agents or they don't. It avoids the philosophical rigmarole inherent to
trying to define what true general intelligence is. And it avoids the problems of the
Turing test and open A.I.s definition of AGI. In the Turing test, a system is AGI when it can
fool a human judge into thinking it's human. The problem with the Turing test is that it sets
up movable goal posts. If I interacted with GPT-4 10 years ago, I would have thought it was human.
Today, I'd simply ask it to build a website for me from scratch that I'd instantly know it was
not human. Open A.I.s definition of AGI, which is A.I. that can outperform humans at most
economically valuable work, suffers from the same problem. What constitutes economically valuable work
constantly changes. We will invent new economically valuable work that we can perform in conjunction
with A.I. These hybrid roles then become the new benchmark that A.I. will need to learn to do
before it counts as A.I. So the definition isn't ever receding target. By contrast, the definition
I proposed A.I. is achieved when it makes economic sense to keep your agent running continuously
is a binary, irreversible, and immovable threshold. I like this definition because in order to
meet it, we will need to develop a lot of necessary but hard to define components of A.I.
One, continuous learning. The agent must learn from experience without explicit user prompting.
Two, memory management. The agent needs sophisticated ways to store, retrieve, and forget
information efficiently over extended periods. Three, generating, exploring, and achieving goals.
The agent requires the open-ended ability to define new, useful goals and maintain them across days,
weeks or months while adapting to changing circumstances. Four, proactive communication.
The agent should reach out when it has updates, questions, or requires input,
rather than only responding when summoned. It must also be able to be interpreted and redirected
by the user. Five, trust and reliability. The agent must be safe and reliable. Users will not
keep agents running unless they are confident the system will not cause harm or make costly errors
autonomously. While I've described these capabilities, I'm deliberately avoiding the
trap of trying to specify exact technical criteria for each one. What precisely constitutes
continuous learning or trust is difficult to pin down. Instead, my A.G.I. definition entails
that all of these capabilities are present to some extent, and these capabilities already are
present in limited ways. Judging for example has rudimentary forms of memory and proactive
communication. The length of time during which an A.I. can run on its own is increasing,
gradually and consistently. When GBT-3 first came out, the primary use case for A.I. was the
GitHub co-pilot. The best it could do was complete the line of code you were already writing.
Chad GBT lengthened the amount of time the A.I. could run from the amount required for
you to press tab to complete a line of code to the time required to deliver a full response
in a chat conversation. Now, agentic tools like Clawed Code, Deep Research, and Codex can run
for between five and twenty minutes at a stretch. The trajectory is clear, from seconds to minutes
to hours and days and beyond. Eventually, the cognitive and economic costs of starting fresh
each time will outweigh the benefits of turning A.I. off.
If you're using A.I. to code, ask yourself, are you building software or are you just playing
prompt roulette? We know that unstructured prompting works at first, but eventually it leads to A.I.
Slop in technical debt. Enter ZenFlow. ZenFlow takes you from vibe coding to A.I. first
engineering. It's the first A.I. orchestration layer that brings discipline to the chaos.
It transforms freeform prompting into spec-driven workflows and multi-agent verification,
where agents actually cross-check each other to prevent drift. You can even command a fleet of
parallel agents to implement features and fix bugs simultaneously. We've seen teams accelerate
delivery 2x to 10x. Stop gambling with prompts. Start orchestrating your A.I. Turn raw speed into
reliable production-grade output at ZenFlow.free. Today's episode is brought to you by robots and
pencils, a company that is growing fast. Their work as a high-growth AWS and Databricks
partner means that they're looking for elite talent ready to create real impact at velocity.
Their teams are made up of AI native engineers, strategists, and designers who love solving hard
problems and pushing how AI shows up in real products. They move quickly using RoboWorks,
their agentic acceleration platform, so teams can deliver meaningful outcomes in weeks, not months.
They don't build big teams. They build high-impact nimble ones. The people there are
wicked smart with patents, published research, and work that's helped shape entire categories.
They work in velocity pods and studios that stay focused and move with intent.
If you're ready for career-defining work with peers who challenge you and have your back,
robots and pencils is the place. Explore open roles at robotsandpensals.com slash careers.
That's robotsandpensals.com slash careers.
Here's a harsh truth. Your company is probably spending thousands or millions of
dollars on AI tools that are being massively underutilized. Half of companies have AI tools,
but only 12% use them for business value. Most employees are still using AI to summarize
meeting notes. If you're the one responsible for AI adoption at your company, you need section.
Section is a platform that helps you manage AI transformation across your entire organization.
It coaches employees on real use cases, tracks who's using AI for business impact,
and shows you exactly where AI is and isn't creating value.
The result? You go from rolling out tools to driving measurable AI value.
Your employees move from meeting summaries to solving actual business problems,
and you can prove the ROI. Stop guessing if your AI investment is working.
Check out section at section AI dot com. That's SEC T-I-O-N-A-I dot com.
Today's episode is brought to you by super intelligent.
Super intelligent is a platform that very simply put is all about helping your company figure out
how to use AI better. We deploy voice agents to interview people across your company,
combine that with proprietary intelligence about what's working for other companies,
and give you a set of recommendations around use cases, change management initiatives,
that add up to an AI roadmap that can help you get value out of AI for your company.
But now we want to empower the folks inside your team who are responsible for that transformation
with an even more direct platform. Our forthcoming AI strategy compass tool is ready to start to be tested.
This is a power tool for anyone who is responsible for AI adoption or AI transformation inside
their companies. It's going to allow you to do a lot of the things that we do at super intelligent,
but in a much more automated, self-managed way, and with a totally different cost structure.
If you are interested in checking it out, go to AIDailyBrief.ai slash compass,
fill out the form and we will be in touch soon.
So both good entries into the canon of what is AI, but as I indicated at the beginning,
what I think is actually most relevant about them right now is the fact that we are having
this conversation right now. We are having this conversation because people have a sense that
something big has shifted, but something big does not necessarily convey AI. Indeed,
one of the best steelman arguments against what we have now being AI is the need to separate
wow-level competence from general autonomy. The argument would go along the lines of,
AGI isn't just about being able to generate impressive outputs across domains. It's about robust,
self-directed competence under real-world constraints. AGI could be dropped into novel situations,
defined success criteria, managed-long horizon execution, and reliably converge,
without a human acting as the external executive function. And as much as things have changed,
what both of the pieces we just read have in common is that more than anything else,
they're disagreeing about which point we're on on an agreed upon trajectory.
The funny thing in fact about that this is AGI piece is that when you actually read it closely,
it's not so much saying that this is AGI. It's saying that we're really, really close,
that what is AGI, i.e. these long horizon agents, are available ish now and just getting better,
and that because we're now within call it months rather than years of AGI,
you better start preparing. Dan isn't really disagreeing with that, although he doesn't get into
timelines. Instead, he's pointing out all of these things that need to happen to get to a certain
point of indispensability, which he is arguing is the key thing. But what about what we've seen over
the last couple of weeks? The sense among some of the most enfranchised and powerful users of AI,
that we really are in a fundamentally different moment. To take one example of a type of testimony
we've seen lots of, mid-Journey founder David S. Holt's tweeted on January 3rd,
I've done more personal coding projects over Christmas break than I have in the last 10 years.
It's crazy. I can sense the limitations, but I know nothing is going to be the same anymore.
And honestly, this brings up a more interesting and nuanced take on it's not AGI yet.
That argument would go something like, yes, Claude code and similar tools have crossed the
threshold for coding specifically, but generality is the whole point of general intelligence.
There's still so much that current AI fails at, like novel reasoning, multi-step planning
and unfamiliar domains. These new big breakthroughs that everyone is sensing happened in a domain that's
really well suited to LLMs, while documented, pattern-rich, verifiable outputs. That's not the
argument would go evidence of general intelligence, it's evidence of domain fit. This would in some
ways be an argument about the jaggedness of AI, the idea that it can be superhuman in one area
and infantile in another. And indeed it is the case that this sense of what has shifted is about
AI's capacity to code. But I keep coming back to this essay from Sean Wang aka Swix when he decided
to join cognition. This line which absolutely wins the award for the couple of sentences that
have lived most rent free in my head since they were written. Sean wrote, the central realization I
had was this, code AGI will be achieved in 20% of the time of full AGI and capture 80% of the
value of AGI. Now for him, this is an argument to simply do code AI now rather than later, but I
think what I would argue is that code AGI doesn't quote unquote capture 80% of the value of AGI.
I think code AGI is more or less just functional AGI. The argument here is that coding is
effectively a universal lever in the modern world. Most economically valuable work to reference
open AI's terminology has been computer shaped for a long time. If your job touches a screen and
API a database of spreadsheet, a ticketing system, a CRM, a repo, a dashboard, or a docs tool,
then in principle it's addressable by software. So if an AI can understand intent, translate
intent into procedures, write and modify code, run tools, inspect outputs, and iterate until
it meets acceptance criteria, then it has a meta skill that can simulate competence in many
domains by building the missing tool. And in that framing, coding isn't one domain, it instead
is closer to instrumental generality. Want data analysis? Write SQL or Python notebooks,
run them, interpret the results, generate charts, and build pipelines. Want operations?
Automate workflows across systems, tickets, approvals, audits, alerts. Want finance?
Pull data, reconcile, generate variance analysis, draft narratives. Want product, spin up
prototypes, instrumentation, AB analysis, telemetry pipelines. Basically the idea is that if you can
program you can create capabilities. And if you can create capabilities on demand, you're not
narrow, you're general in a way that matters for real work. You could take this argument even
farther. Coding doesn't just help you build general capabilities. It also is to be good in some
ways, a test of general reasoning, non-trivial coding forces abstraction decomposition,
causal reasoning, adversarial thinking, and iterative debugging. Those are indicators
ultimately of general intelligence. And I would argue that a lot of what feels different about
building with AI coding tools now as opposed to six months ago is in that set of general
reasoning capabilities rather than just how good it is at knowing a bunch of different coding
languages. We recently had an issue come up where a company that we were producing an AI and
agent readiness audit for found contained in one of the recommendations, a tool or platform that
was at contrast at the tech stack that they currently have. And this is a problem that we are
extremely conscientious of. It would be very easy to recommend a bunch of platforms that an
enterprise is never going to use. It's much more difficult and much more valuable to let them know
how to work with the band of tools they have or the things that they would consider to actually
solve the problems that are clear and present for them. And so we spent a lot of time making sure
that that type of recommendation doesn't make it through, but it did. And so as the team was talking
about new processes and procedures for making sure this didn't happen anymore, I was following
the conversation on Slack from a haircut. It struck me that this might not actually be all that
difficult, at least if you removed it from the domain of the human. And so as I was sitting there,
I fired up my mobile web browser version of lovable, and by the time the haircut was done,
I had a checker to run final reports through that would make sure to compare the tech stack
of the company to all the recommendations in the report. And in literally a matter of seconds,
make sure before the final deliverable was sent that that sort of thing didn't happen.
Now, of course, there are even more sophisticated ways to do this with code.
Basically, we could just build that capability that I built as a standalone into the overall
processing pipeline. But the point that I'm trying to make is that this wasn't coding to solve
a technical problem. It was coding to solve a business problem. Increasingly, the people who are
most adept at working with AI, for the people who are most adept at working with AI, that's what
they're doing. The question is decreasingly, what's the best AI tool for this? And increasingly,
can I build some custom software to solve or enable this? So it's happening right now at all
these different startups is not some cute little example of the non-technical folks being able to
vibe code and prototype features. What's happening is a complete collapse in the distance between idea
and execution for everyone. And that's amazing for those startups. We are going to see complete
and utter re-evaluations of how to do everything. And startups and small companies are going to be
the incubators of that change. The thing that would worry me right now, if I was an enterprise leader,
is that this Rubicon that we've crossed starts to feel more like a shift in kind than a shift in
scale. What I mean by that is that for the three years since ChachiBT was launched, obviously
enterprises were behind more nimble companies and AI users relatively speaking. There's all
sorts of systems inertia, there's compliance issues, governance issues, etc. But the patterns of
what they were doing were still similar to the patterns of what other people were doing,
just maybe with a little bit slower adoption and a little bit more process along the way.
Still, they were running on a parallel track. I now believe that the tracks have diverged.
The frontier of what's possible and the median of what's deployed in enterprises has
decoupled in a way in which I believe that they are increasingly pointed in different directions.
The standard enterprise invocations at this point to audit and automate your workflows,
experiment with AI, will ultimately contain the transformation possible within existing power
structures, more or less keeping the org chart intact. The reality is, in a world of code AGI,
a world of functional AGI, the org chart is broken. Bottom next shift from who can code to who
has good ideas, the role of management shifts from resource allocation to taste and judgment,
competitive advantage shifts from execution capability to speed of iteration, and the gap
increasingly isn't linear. It's compounding. Every month, someone is building in the new paradigm.
They are getting comparatively farther ahead from those who aren't.
So is the answer as simple as letting everyone on your team vibe code?
Honestly, I think you could do worse. I think we are at a moment. We are increasingly the
modality by which things are produced in this world. Looksin is different to the way that it was
just a few years ago, even just a few months ago. The message is less upskill your workforce and
audit your AI use cases, and much closer to your entire organizational model is built for a world
where execution was the bottleneck, and that world is over. I worry for enterprises because this
new set of shifts involves accepting a loss of control, a restructuring of incentives,
and a total transformation of process that is even harder than the AI transformation that has come
so far. And to the extent that you are in one of those enterprises and looking for a bright spot in
this, it's that at least most of you are in this together, and that it is unlikely that many
enterprises are going to get comfortable fast with the types of change they really should be making.
But the change, I believe, has happened. And I think that the rewards for the companies,
not just startups, but enterprises too, who can lean into this new capability set and can live
on the other side of this inflection will be immense. So like I said at the beginning,
I think code AGI is functional AGI, and I think it's here. This is something that I will be
exploring a lot more in the weeks to come. For now, though, that is going to do it for today's
AI daily brief. Appreciate you listening or watching as always. And until next time, peace!

The AI Daily Brief: Artificial Intelligence News and Analysis

The AI Daily Brief: Artificial Intelligence News and Analysis

The AI Daily Brief: Artificial Intelligence News and Analysis