Loading...
Loading...

Agent swarms are quickly moving from theory to practice, with early 2026 model releases making coordinated, multi-agent work feel like a real shift rather than a niche experiment. This episode focuses on Moonshot’s Kimi K2.5, what its agent swarm design reveals about the future of AI work, and why this may mark a transition from single assistants to teams of AI operating in parallel. In the headlines: Anthropic’s huge new funding round and revised revenue forecasts, Nvidia chip sales reopening in China, a UK-wide AI upskilling initiative, and new agentic features from Google and Chinese labs.
Brought to you by:
KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcasts
Zencoder - From vibe coding to AI-first engineering - http://zencoder.ai/zenflow
Optimizely Opal - The agent orchestration platform build for marketers - https://www.optimizely.com/theaidailybrief
AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/brief
Section - Build an AI workforce at scale - https://www.sectionai.com/
LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/
Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/
The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.
The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Interested in sponsoring the show? [email protected]
Today on the AI Daily Brief is 2026 going to be the year of AI agent's
forms, before that on the headlines, some big jumps in Anthropics fundraising and
revenue. The AI Daily Brief is a daily podcast and video about the most
important news and discussions in AI.
Alright friends, quick announcements before we dive in. First of all, thank you to
today's sponsors, KPMG, Zencoder, and Super Intelligent. To get an ad-free
version of the show, go to patreon.com slash AI Daily Brief or you can
subscribe on Apple Podcasts to learn more about sponsoring the show. Send us a
note at sponsors at aidealybrief.ai. Also, if you were interested in the
research that we did at the end of last year, we have our next research
kicking off soon. To keep track of all that, as well as to hear about future
products we have coming AI maturity maps, AI opportunity radars, and much more,
go to aidebintel.com where you can sign up to get that information as soon as it comes out.
Now with that out of the way, let's dive in. Welcome back to the Aidealy Brief
headlines edition, all the daily AI news you need in around five minutes.
We kick off today with some fundraising and business news out of Anthropic.
The company is close to finalizing their latest funding round, which could raise more than 20
billion dollars. Reports state that Anthropic has between 10 and 15 billion
firm commitments that could be finalized early next week, including the Singapore
sovereign wealth fund and Sequoia making large investments. Anthropic has also recently
doubled the size of the round from 10 to 20 billion in response to excessive interest.
One investor told the financial times that the round was five to six times over-subscribed
before the size increase. In addition to venture capital and sovereign wealth,
Microsoft and Nvidia have also committed to invest a total of 15 billion in the company,
which is on top of the 20 billion from investment firms. The round would reportedly value Anthropic
at $350 billion, almost a doubling from their series F, which closed in September.
The fundraising frenzy firmly cements Anthropics momentum. Last year, remember,
OpenAI raised 40 billion anchored by 30 billion from Softbank, meaning that Anthropic is now
neck and neck with those figures. In addition to fundraising news, the information has an update
on Anthropics revenue growth forecasts. They report that Anthropic updated investors in December
and hiked forecasts across the board. 2026 revenue is now expected to come in at 18 billion,
around a 4x increase from last year's numbers, and up 20% from estimates made last summer.
In 2027, Anthropic expects to generate 55 billion in revenue. For 2029, their most optimistic
forecast calls for 148 billion. That forecast is particularly notable as its 3 billion more than
OpenAI's last forecast, which was made during the summer. OpenAI, of course, may have hiked
expectations since then, but still very notable that Anthropic believes they could overtake OpenAI
within three years. The other big number from the financial update was Anthropics increasing
training costs. They expect to spend 12 billion on training this year, which is a 50% increase
from summer projections. Their forecast also project training costs to exceed 100 billion by
2029. These increased costs push back Anthropics timeline for profitability by a year,
with the company now expecting to flip cash flow positive by 2028.
Now, one of the things that Dario and Anthropic have, of course, been weighing in a lot about,
is chip exports to China, with Anthropic being firmly in the camp that we should not be
exporting chips to China. An update on that front, as Beijing has approved
the first batch of NVIDIA chip imports. Rotors reports that Chinese officials have
improved the import of several hundred thousand H-200s, allowing access to the advanced chips for
the first time. Sources said the first batch of approvals were primarily allocated to three
unnamed tech giants. The Wall Street Journal later named Alibaba and bite dances two of the
three receiving approval. Other enterprises are still in the queue awaiting a subsequent round of
approvals, presumably including high flying startups like DeepSeek, who may have to wait in line
to set up their H-200s. Reports stated that Chinese AI firms will be required to support local
chip makers as well, using their chips for some training tasks and most AI inference.
Basically, it seems like officials are trying to strike a balance, allowing Chinese companies
to train advanced models while also protecting domestic chip makers.
Now, this could be a huge boom to NVIDIA's first quarter financials. Several hundred thousand
H-200s is in the ballpark of 10 billion in sales, and that's only the first round of approvals.
In Q2 of last year, when Chinese chip exports were shut down by the US government,
NVIDIA reported a $5.5 billion right down associated with losing Chinese sales.
That implies NVIDIA could see record Chinese sales this quarter simply based on this first round
of approvals. NVIDIA CEO Jensen Huang is currently visiting China to meet with local employees,
but reports suggest that he hasn't met with any senior officials. That said, his next stop is
Taiwan, where people familiar with the trip set he plans to ask suppliers to bump up H-200 production
to meet Chinese demand. Moving over to the training side of the House, the UK government
has expanded their AI training initiative with an ambitious new goal to upskill every worker
in the country. The Department for Science, Innovation, and Technology announced on Tuesday that
free AI training will be made available to every adult worker. The training will come in the form
of 20-minute online courses with modules covering use cases like drafting text, content creation,
and automation of administrative tasks. Technology Secretary Liz Kendall said,
We want AI to work for Britain, and that means ensuring Britain's can work with AI.
Change is inevitable, but the consequences of change are not. We will protect people from the risks
of AI while ensuring everyone can share its benefits. New partners including Cisco,
Cognizant, and the National Health Service will join existing partners including Amazon,
Google, Microsoft, and Salesforce, and the Upskilling Initiative. The Department claimed this would be
the largest targeted training program since the establishment of Open University in the late 1960s,
which delivers distance learning for higher education. They said the program had already
delivered a million courses and the government would aim to retrain 10 million workers by the
end of the decade. Workers that complete the training will be certified within AI Foundation's
badge to give employers confidence they have basic AI skills. Now, there is a lot that we could
say about this. The cynic in me of course sees all of the potential challenges with this program,
most of which sort of amount to a question of whether this is too little to move the needle,
but we got to start somewhere. Governments need to get involved in a way that is actually
helpful to people adapting to a new world rather than just trying to pretend that they have control
over whether that new world exists. And so for that reasons, I think this is a good thing,
and I'm excited to see it hopefully go even farther than they're thinking right now.
Now, our main episode today is about a new model out of China and its agent swarm capabilities,
but Ali Baba's Quen team also released a new model earlier this week, specifically called
Quen 3 Max Thinking. Now, as you can probably tell from the naming convention, this is the
big flagship model from the Quen team. They're equivalent of GPT-52 Pro, Gemini 3 Pro, or Opus 45.
The model makes use of an inference technique that the Quen team are calling heavy mode.
Quen is doing things slightly differently from existing approaches to test time scaling,
generating a response, then feeding it back into the model for improvements in a recursive loop.
It appears to be generating some pretty significant gains. Quen said that this method
improved benchmark scores on GPQA, which is a PhD level science test, from 90.3% to 92.8%,
on live code bench scores jump from 88% to 91.4%. Overall, the benchmarking looks pretty strong.
Now, the cost is a little beefy for a Chinese open source model.
Quen 3 Max Thinking comes in and around the same cost as Claude Haiku 4.5,
meaning that it's still much cheaper than models like Gemini 3 Pro, or GPT-5.2,
but about 10 times more expensive than Deepseek V3.2.
Now, Quen 3 is already being used by many American companies.
Airbnb CEO Brian Chesky for example recently said that his company was relying on Quen 3 as a
more affordable alternative to US models, meaning that you got to think that they will be watching
this model release closely, although again, how it stacks up compared to Kimi K2.5,
which we will talk about in our main episode, remains to be seen.
Lastly today, it's not just the Chinese labs with some interesting new product to show off.
Google has released a new feature for Gemini 3 Flash called Agendic Vision.
The feature leverages Gemini state-of-the-art multimodal reasoning with code to execute unique
capabilities. Right, Google? Agendic Vision introduces an agentic, think, act,
observe loop into image understanding tasks. Think, the model analyzes the user query in the
initial image, formulating a multi-step plan, act, the model generates and executes Python code
to actively manipulate images, such as cropping, rotating, or annotating, or analyzing them,
such as running calculations, counting bounding boxes, etc. Last is observed, the transformed
image is appended to the model's context window. This allows the model to inspect the new data
with better context before generating a final response. Overall, this promises to improve
Gemini's ability to annotate images, perform data visualization tasks, help with basic image
analysis. Google said that the loop improves model performance by between 5% and 10% across
most vision benchmarks. Still developer experience lead Omar Sansaviero hinted at the most exciting
unlock from the new feature. He showed an output of an annotated image of a table containing a spill.
Gemini had identified a spill, a piece of cloth, and several other items. The annotations appear
to be instructions for a robot to clean up the spill by first clearing away the items in the way,
debiting the cloth and wiping up the spill. The implications of course being that this
feature could be used to give robots all the fly analysis and reasoning ability, allowing them
to tackle tasks that they've never seen before. Ultimately, as I said, when it comes to new models,
the big conversation is around Kimi K2.5, and so with that, we will wrap up the headlines and move
on to the main episode. Hello friends, if you've been enjoying what we've been discussing on the show,
you'll want to check out another podcast that I've had the privilege to host, which is called
You Can With AI from KPMG. Season 1 was designed to be a set of real stories from real leaders,
making AI work in their organizations, and now season 2 is coming and we're back with even bigger
conversations. This show is entirely focused on what it's like to actually drive AI change
inside your enterprise, and as case studies, expert panels, and a lot more practical goodness that
I hope will be extremely valuable for you as the listener. Search You Can With AI on Apple,
Spotify, or YouTube, and subscribe today. If you're using AI to code, ask yourself,
are you building software, or are you just playing prompt roulette? We know that unstructured
prompting works at first, but eventually it leads to AI slop and technical debt. Enter ZenFlow.
ZenFlow takes you from vibe coding to AI first engineering. It's the first AI orchestration
layer that brings discipline to the chaos. It transforms freeform prompting into spec-driven workflows
and multi-agent verification, where agents actually cross-check each other to prevent drift.
You can even command a fleet of parallel agents to implement features and fix bugs simultaneously.
We've seen teams accelerate delivery to x to 10x. Stop gambling with prompts. Start orchestrating
your AI. Turn raw speed into reliable production grade output at ZenFlow.free.
Today's episode is brought to you by my company Super Intelligent. In 2026, one of the key themes
in Enterprise AI, if not the key theme, is going to be how good is the infrastructure into which you
are putting AI in agents. Superintelligence agent readiness audits are specifically designed to
help you figure out one, where and how AI in agents can maximize business impact for you,
and two, what you need to do to set up your organization to be best able to leverage those
new gains. If you want to truly take advantage of how AI in agents can not only enhance
productivity, but actually fundamentally change outcomes in measurable ways in your business this
year, go to psuper.ai. Welcome back to the AI Daily Brief. Today we're talking about something
that has been of interest to people for quite some time. When I first started this show, all the way
back in April of 2023, already there were people who were extremely interested in the way
that LLMs could generate code. Now it would take a couple of years and some significant advances
in the models to actually unleash vibe coding in the way that had happened over the course of 2025,
but the idea was there very early. We've similarly had interest,
invest teams of agents that can coordinate amongst themselves to accomplish more things,
even if the capability set hasn't fully been there. Which isn't to say that people haven't
been experimenting. Lindy released their agent swarm tool back in April of 2025, and the concept
is related to something that I've talked about on this show, the Doctor Strange Theory of AI Agent
Work. Now the specific point that I've made is actually about the difference in how enterprises
think agents will play out versus how I think they will play out, with the difference being that
I don't think that agents are going to be one to one replacements for existing human work.
I think that we're going to be able to deploy lots and lots of agents to scenario and war game
different types of work, which while not exactly the same as agent swarms, which are more about
breaking down complex tasks into specific subtasks, is in some ways still part of the same larger
conversation about how agents will actually work in the future. Over the last couple of days,
we have started to get the first big model releases of 2026, and maybe the most significant so far
is Moon Shots Kimi K2.5. While it is the agent swarm feature of K2.5 which has the most chatter,
it's worth checking out the broader model as a whole. Artificial analysis sums up the shift
when they write, Moon Shots Kimi K2.5 is the new leading open weights model, now closer than
ever to the frontier, with only open AI and traffic in Google models ahead, and indeed the benchmarks
are impressive. K2.5 for example claims 50.2 on humanity's last exam, which would put them ahead
of GPT-5-2 running on high settings, Opus 4.5 and Gemini-3. On a variety of other benchmarks as well,
they claim performance that matches or exceeds these premier western models.
On the overall independent artificial analysis index, Kimi jumps from
11th place overall with their K2 thinking model, into 5th only behind two iterations of GPT-5.2,
Opus 4.5 and Gemini-3 Pro, and of course the cost is cheaper than any of those models.
In a A's test, Kimi K2.5 was about four times cheaper than Opus 4.5 or GPT-5.2, but was still
much more expensive than for example Deepseek version 3.2. One of the things that Moon Shots
emphasized in their launch is the model's native multimodality. Artificial analysis again writes,
Kimi K2.5 is the first flagship model for Moon Shots to support image and video inputs.
This is the first time that the leading open weights model has supported image input,
removing a critical barrier to the adoption of open weights models compared to proprietary models
from the frontier labs. They point out that this makes a significant difference as compared to
other open weights leaders like Deepseek's V3.2. Now anytime we get a model out of China,
of course one aspect of the discourse is what it says for the state of the AI race.
On that front, there were a number of people who took to Twitter slash X to share examples of
Kimi K2.5 claiming that it was clawed. Enrico from Big AGI says identity crisis or training set.
Still overall, even with some of the suspicion of distillation of Western models,
the release of 2.5 certainly validates the recent arguments from people like Demesis Abbas
that Chinese models are very, very close to the US when it comes to performance,
if not yet having had an example of actually pushing the frontier. As Balaznamethi points out,
however, the real value in 2.5 is not as he puts it pure IQ dominance. It's about how it does
in an actual work environment. He calls it less chatbot and more employee. And indeed there are
a couple things that stood out to me about the 2.5 announcement that are really impressive.
One is the way that they're using this multimodal input capability in the context of coding.
They show an example of taking a screen recording of a website, dumping it into Kimi and asking
it to clone it with Kimi shipping that code, including UX and interactions. If this actually works
like that, it opens up a significant new frontier in AI coding that you have to imagine that everyone
will race to copy very quickly. Another thing that moonshot emphasize is how good 2.5 is that
office skills, things like financial modeling and Excel or creating high quality power points.
Now again, this could be incredibly valuable when it comes to work, although I haven't really
been able to find a ton of examples yet of people testing this out that don't just feel like
paid influencer posts. One that I found that did seem to positively test out these features came
from Shafi. He wrote, this new AI model Kimi from China created a full slide deck from my journal
article in one single shot prompt. I just gave it the keyword and journal name not even the link
or PDF to the article. It searched the article and found the correct one, developed the contents
after reading the paper, created contents for 12 slides including searching images from internet,
asked for suggestions to make edits which I declined and asked it to go ahead and generated slides
in a PowerPoint format. Everything happened inside my phone in 5 to 6 minutes. Since it's my own
article, I know it got most of the things right. And yet, as I said at the beginning, probably the
feature that people are most excited about is this agent swarm parallelization. An example that
Kimi gave was adapting Oh Henry's short story, the gift of the magi into a 10 minute short film.
They asked it to generate a highly consistent storyboard script and embed it into an Excel file,
which they said from a single prompt created a 100 megabyte Excel file generated with images
with a total of 55 scenes. Simon Willison writes the self-directed agent swarm paradigm claim
there means improved long sequence tool calling and training on how to break down tasks for
multiple agents to work on at once. He gave it the prompt, I want to build a dataset plugin
that offers a UI to upload files to an s3 bucket and stores information about them in an sqlite table.
Break this down into 10 tasks suitable for execution by parallel coding agents. He said the
response was pretty good. It produced 10 realistic tasks and reasoned through the dependencies between
them. Global soul writes tried Kimi moonshot agent swarms and it is quite magical. Basically,
they gave Kimi a list of stocks and asked it to create a report that analyzes each from a variety
of different factors. They said it created individual files for each company and overall summary
and finished the output for all companies in 10 minutes. Swix also had an interesting experience
in his testing. He writes little detail from exploring the K2.5 agent swarm preview today.
I asked it to make a custom website for the latenspace podcast and despite it being trained
to parallelize eagerly and having full permission to do so, it recognized that this was a newb task
and did a highly competent job with one agent and refunded my credits. This thing might be AGI,
I've never expected a parallel agent lab to use less than what it was trained or opted into use.
In other words, just because it could use a parallel agent structure, it recognized that
for certain tasks it doesn't need that. Client Founder, Soud, Rizwan explains a little bit about
what's going on in the background. He writes, LLMs are trained on sequential reasoning,
breaking tasks down step by step one to do after another. When you ask them to orchestrate parallel
work, they don't know how to split tasks without conflicts. Moonshot calls this serial collapse and
solved it with reinforcement learning. They used P-A-R-L parallel agent reinforcement learning
where they gave an orchestrator a compute and time budget that made it impossible to complete
tasks sequentially. It was forced to learn how to break tasks down into parallel work for sub-agents
to succeed in the environment. Simon Smith from Click Health did a full test as well and came
away pretty impressed. He writes, I've been thinking about the best way to organize agents in
step by step workflows where each agent has skills defined by an agent's skills file and to then
scale this across an enterprise. Today, Kimmy dropped its K2.5 model along with agent swarms and
I thought, could this be it? The answer? Mostly. He then locks through how you do this. First,
using Kimmy, you actually use the model selector to select agent swarm in the same way that you would
select between, for example, instant or thinking mode. For Simon's task, he gave agent swarm,
the task of responding to an RFP, which included in his words, research, strategy, creative,
brainstorming, and concept development, media planning, analytics planning, high-level project
planning, and consolidating everything into a final written response and a word document.
He continues, as would be familiar to users of agent coding tools like Cloud Code and Codex,
Kimmy turns your request into a step by step plan and then proceeds to work through it,
where things get interesting, however, is how it executes the plan with multiple agents.
For each step in the plan, he writes, Kimmy creates a set of relevant agents. And importantly,
these aren't generic agents. Agents each have roles and names. Each agent he writes plays a
specific role, defined for it in a prompt, and even gets a name in Avatar. The role description
ensures the agent focuses on a specific job to be done, and the name in Avatar make this extremely
user friendly. The model is then smart enough to figure out which agents can work in parallel,
or in the case that an agent requires the output of a different agent, how to run them sequentially.
Simon writes that you can monitor agents overall via a dashboard with progress indicators,
and also select individual agents to monitor their work. One of the important things that Simon
points out is that part of the big upgrade here is not just the performance, but the user experience.
He writes, when I think about something that would scale up to an enterprise which will include a lot
of users who won't be comfortable in something like Cloud Code in the terminal, this feels like
it would be easily adopted. It's extremely clear and intuitive. The model gave Simon both not only
the final output, but also all of the intermediate outputs from each of the distinct agents.
Now, Simon's big request, and his caveat, is that he wants access to connectors or MCPs as well
as agent skills, to be able to fully sync this with the larger ecosystem of data that people work in.
Overall, though, he says, I'm impressed. I've been waiting for something like this that makes it
easy for anyone, regardless of technical expertise, to ask AI to do something and have it complete
the task with multiple agents playing different roles and working collaboratively. This feels like
the emerging future of humans managing teams of AI agents the way they currently manage teams of
other humans. I honestly don't understand how Kimi got here first. There are other solutions out
there for agents to work together on tasks, but everything I've seen is too technical for the
average user, requiring you to use the terminal or too rigid, requiring you to pre-build workflows.
How did Kimi create such a great model with such excellent agentic capabilities and build such an
intuitive interface? Now, this is the interesting question, and why it makes me feel like we are very
much seeing the beginning of a broader phenomenon around these agent swarms. In addition to K2.5,
I've seen a couple of people talking about Cloud Code's new task system in the same context,
and so it seems like something that's probably on the minds of those folks as well.
Langchain developer Sydney Runkle is also talking about this sub-agents architecture,
all of which makes me feel like 2026 might be the year of the agent swarm.
Indeed, there's enough chatter that Ethan Mollick is making one last perhaps vain glorious attempt
to steer us away from using the swarm terminology. On Monday, he tweeted,
let's not call groups both terrifying and not a useful analogy.
Groups of agents should be called teams or organizations. It both describes how to structure them
and also how to use them. Don't let the weird AI folk naming win again.
I'm not sure where we'll land when it comes to terminology, but it really does feel like this is
something new happening, and I'm excited to see how it develops. I will be testing out K2.5,
maybe we'll do a special bonus operator's episode about that. For now, however,
that is going to do it for today's AI Daily Brief. Appreciate you listening or watching.
As always, and until next time, peace!

The AI Daily Brief: Artificial Intelligence News and Analysis

The AI Daily Brief: Artificial Intelligence News and Analysis

The AI Daily Brief: Artificial Intelligence News and Analysis