technologynews

Why AI engineering needs old-school discipline

The New Stack Podcast·Apr 24, 2026·24:26

About this Episode

In this episode of The New Stack Makers, Nimisha Asthagiri of ThoughtWorks explores why many AI initiatives stall between proof of concept and production. A key issue is that organizations focus on speed—asking how to move faster—rather than rethinking what new capabilities AI actually enables. Successful companies take a systems-thinking approach, investing in organizational literacy and aligning teams around meaningful use cases instead of retrofitting AI into existing workflows.

Asthagiri highlights that core engineering practices are ফিরে to prominence. As AI-generated code increases, so does the risk of “cognitive debt,” where developers lose understanding of their own systems. To counter this, teams are reviving fundamentals like test-driven development, mutation testing, observability, and zero-trust security, especially as autonomous agents contribute to production code.

She also introduces the concept of “dark code”—AI-generated code that may never be used—and argues for more intentional lifecycle management, including ephemeral code. Ultimately, the focus shifts from code itself to specifications, context management, and disciplined engineering practices.

Learn more from The New Stack around the latest about system-thinking approaches:

System Two AI: The Dawn of Reasoning Agents in Business

A practical systems engineering guide: Architecting AI-ready infrastructure for the agentic era

Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Transcript

ThoughtWorks is a leading global technology consultancy that delivers extraordinary impact

by blending design, engineering, and AI expertise.

For over three decades, we've led in technology innovation, and today we're at the forefront

of AI-powered software and data engineering.

Welcome back to another episode of The New Stack Makers.

I'm Friedrich Klauding-Wyme, the senior editor for AI at The New Stack.

And I'm Nimesha Estegiri at ThoughtWorks.

I'm the data and AI advisor.

Well, Nimesha, thank you for being here.

Thank you, Friedrich, for having me.

Absolutely.

It's a pleasure.

Now, one thing I was thinking, ThoughtWorks, I've heard the name, but I'm thinking quite

a few folks in our audience may not have heard it yet, they may not be familiar with what

you're doing.

So, maybe give us a short background on what it is that ThoughtWorks does.

Yes, of course.

So ThoughtWorks, we're a global digital consulting firm.

We help our clients around the world with various things, from strategy, design, engineering,

and these days, really AI, and our phrases, it's not just AI, but AI that works.

And why that is is really taking up, taking our decades of experience, bringing thought

leadership to the industry and bringing the latest and greatest ways of thinking about

building software, building it right, and building the right things.

So, everything from our product, thinking, capabilities, and skills, as well as agile

and engineering and platform engineering practices.

What is it in your experience right now that these companies are looking for specifically?

Like, what are they struggling with still?

Well, I mean, these days, there is, so since 2023, in November, 2022, this whole wave

of generative AI, and then the next year of doing a lot of proof of concepts and really

exercising it, trying to now upskill their own employee force, companies are looking

for understanding how do they actually bring a lot of their proof of concepts to production.

They're also looking for how do we get our development work that we're doing, so it's

not just accumulating code, but it's actually viable code and code that can really execute

and run successfully.

So, I think Gartner was saying, like, 40% of agentic projects will be canceled by 2027.

So these are staggering numbers and just imagine the billions of dollars that are being put

into the industry and then therefore waste in some ways.

Not a waste in terms of learnings, there's been a lot of learnings and rapid learning with

the pace that is there in industry, but definitely in terms of business ROI, one may not actually

achieve that.

No, absolutely.

And I keep hearing the same numbers kind of, I don't think it's really the needless

and move there all that much, but what is it that these companies are getting wrong?

Why are 40% of these projects getting canceled?

Why aren't we better at getting value out of these tools, which are really powerful?

I think it's the question that is being asked.

The question that we're hearing a lot from executives and others is, how do we go faster?

How do we go faster?

How do we keep relevant?

Where I think the right question or another alternative, better question here might be,

what do we build, given the latest technology that we couldn't build before?

So reimagining and reinventing what we couldn't do before, and that's where we focus our

energy rather than everything.

And then secondly, how do we build differently?

Now that we do have AI, and it's not just as an assistance, but actually even a part

of the team with human and machine agents working together, what does that look like?

That is not, therefore, just a tool change, but a systemic systems change, rethinking

your models of what to build and how to build.

And that also then implies how do you measure?

And the measurements in the past, where we might have just measured output and production

of how quickly are people bringing in pull requests or code into production.

Instead of that, it could be more about, for instance, iteration cycles and interactions

with your AI and with others.

It was collaboration and interaction metrics that are also, how your first past acceptance

rate, right?

How that code generated, how quickly can it be accepted and with minimal rework?

As you mentioned, systems thinking, they're talking to me a little bit more about that,

what needs to change there, how the units and both the change that's practical, but also

change kind of in the culture, I think, of how these projects develop over time.

Exactly, exactly.

So I think it is the typical people process tech coming together, but I think when we're

thinking, rethinking this, I think there's a lot of thinking about the organization at

large and you're thinking about the technical platforms that you have in your organization

and what platform capabilities you may need to bring in that become strategic assets.

And so therefore, that then supports the system at large and you're using, therefore,

your platform as paved roads that people can use to accelerate their work, but also becomes

that the conduit to the governance that you also need to apply and the efficiency gains.

So all of that kind of coming together, and for ThoughtWorks, we've been a progenitor

of a lot of platform and engineering excellence capabilities, and that is now, once again,

coming to the fourth front, where yes, there's a lot of fundamentals from the past that

need to be actually, and at this point, reinforced, if anything.

And so then you start thinking about, if you have that harness and the technical system

in place, then that's where we now, you're elevating the people to say, hey, use more

of your human judgment.

Let the other things that we have been jaded in the past of repetitive work that we may

have just, that's what I do when I come to work, but kind of rethink that.

So there's a little bit of unlearning so that you can learn this new way and where your

human judgment and the higher order, you know, a value of your human effort can come into

play.

Yeah, yeah.

In practical terms, as we're thinking about some of these kind of standards that we've

had for many years, like, is it that we need to bring back their focus on, you know,

what are some examples that you've seen?

Yes.

So for instance, it might be everything from like mutation testing, right?

And so test-driven development, right?

So we're really thinking about creating those feedback sensors for AI.

So that the AI, when it's now with autonomous coding agents, and there's been a drastic change

right in December with the latest models and so forth.

So and there's a lot of, therefore, keen interest in generating and designing these autonomous

coding agents.

But you want to, what do you want to bring back?

So I think there are things like mutation testing, or so a lot of our testing principles,

and there might be things with test-driven development, for instance, designing that

type feedback loop for the AI so that it can continuously learn and evolve, so that being

a key component, I think the other thing would be even the metrics, such as even like

door metrics.

So I think those types of things with deployment frequency, lead time, change failure rate,

these are our lagging metrics that will then ensure that, okay, yes, we are moving the

needle in the right place.

But I think there are also other things like zero-trust security architecture, right?

And really thinking about the ensuring that we have proper identity management and security

as well.

And we're seeing the propagation of agents, and now they're coexisting along with you

when you're working on your laptop and your desktop, and a lot of that changes that are

happening.

Zero-trust architecture is critical, and being able to know who did what, as well as

the authentication and the authorization of the work that is happening.

A lot of traditional, fundamental ways of thinking about engineering discipline, but

just really becoming now back into the forefront.

Yeah, it's about me through a little bit, but what's the best case scenario here that

you've seen?

Like, the company that has done this really well, and if what does that look like?

Yeah, I think that this is where we're going back, right, to the systems thinking.

I think the companies that we find who aren't thinking about their overall strategy, right?

And designing that, and so there's a little bit of, let's think this through before we

go ahead and jump and require top-down mandates.

I think those are not as successful, and those were finding as anti-patterns.

So it is more about when there are successful, are doing the due diligence, it's hard.

It is hard work, but to provide the literacy and the enablement within your organization

for the people, and then to really leverage the ROI of your work, a lot of strategic thinking

as well, to think about where do we invest as much as, you know, how, and with what

tools?

Yeah.

I feel to me, like, in this time of AI FOMO, strategic thinking isn't always at the forefront.

Is that something that, you know, really needs to, would you say a lot of these companies

need to slow down a little bit potentially and just, you know, think over what they're

doing and whether maybe AI is even the right tool for them at this time?

Yeah.

I mean, definitely.

Yes.

And I think there is a responsible AI perspective here as well as responsible leadership

and technology aspect to it, but why now more than ever is because you can lend yourself

to create generating AI Slop, or the AI is going to produce a lot of, you know, what

you tell it to produce.

And so, and also what you tell it not to produce, you know, there is without proper feedback

loops in place.

So, I think that is why, like, once again, even the engineering discipline we talked about,

there is strategic disciplines as well.

And, you know, bringing back the disciplines that might be in place about what is your competitive

advantage, right?

I mean, you don't need to do what the joins as they're doing just because they're, that's

what they're doing, right?

So, how do you want to differentiate and where do you invest your money?

So, yes, definitely that's also part of that.

And that requires getting the, this is where the human judgment comes into play.

For sure.

Now, when we haven't talked about agents specifically yet, but we should talk about that

a little bit because that's, you know, basically becoming the default now for at least development

teams.

How's that changing how you're talking to your customers and the problems they're facing?

And you mean coding agents, right?

Yeah, coding agents.

Sorry, coding agents.

Yes.

Specifically coding agents.

Yeah.

So, I guess seeing that in especially two different ways, but one is the, you can think

about it from the topology shifts that we're seeing as well as architectural shifts.

And so, I'm part of this global team that puts together the, that works technology radar

where we're, you know, finding trends and we're learning from on the ground actual experiences

from thought workers, from our thought workers globally of 10,000 plus employees.

And one of the things that we're finding is that we're using, we're putting together

techniques to ensure that the code base is not shifting or drifting away from architectural

principles.

Right?

So, those types of things are very important when we're also thinking about coding agents.

And so, I think there is the latest technologies with agent skills as well as agents at MD was

from before, right?

So, a lot of these things that help provide the context to the agents.

We're thinking about that as essentially forward engineering, right?

Like, forward constraints for them in terms of the guiding principles.

What are those architecture decision records that we would ask the agent itself to document

so that the humans could review that and could reference that?

Because at this point, like humans reading all the code that's going to become more just,

yeah, unsurmoned.

Yeah.

So, having those ways to be able to validate that.

And on the other hand, the team topologies are also changing because now we have humans

and machine agents working together and collaborating.

And so, there we're seeing these teams of coding agents where it might be a lot more strategic

and intentional, deliberate design, you know, who might be orchestrating very role-specific,

okay, this machine agent for backend versus front end and whatnot.

So, that we are finding and we put in our radar as a technique that people can assess.

But the thing that we put on the technique as people to maybe caution and watch out for

and just don't jump in without some other additional experimentation and testing

or the coding agent's swarms that are coming out where it's like hundreds of agents

being tasked to do the same thing potentially.

And then you have to think about collaboration and, you know, they might run into conflicts

that they have to resolve.

And so, they're, you know, right now we're like, it's still maturing.

So, for organizations around the forefront of trying these things out, then okay, fine,

go ahead and continue to evolve and provide more practice, best practices for the industry.

But for others who have maybe more regulatory compliance requirements and other things,

there might be more cautionary steps towards it.

So, overall, I'll just say that really this is still an evolving field

and so it's dramatically changing.

And one of the great things about the tech radar is that it just allows you to give,

it gives you a glimpse of what are the things that we're all testing, experimenting

versus what are the things that have evolved much further that you can, you know, choose to adopt.

Yeah.

As you've looked at that radar lately, anything else that stood out for you any surprises there?

Um, surprises there.

Yeah, that's interesting.

I guess the, the, the, the, so this was my third time being part of this team that put this together.

One of the biggest things I'll say was, we used to call this, we used to coin this term

two complex to bit blip, which is still the case sometimes because our blips are very short

and sweet and very quickly get a glimpse of a new technology or new technique or older techniques

and technologies that we want to reinforce.

Um, but this time there were a lot of too young to blip.

Okay.

So it's moving so great.

Everything is moving so quickly and we're, and like a thought worker may have put something

out there like because everyone's also experimenting on our organization and others.

So they might put out a blip that that were like, oh, wow, this might, this is an interesting

open source project that tackles this white space in the industry, but it just came out

two weeks ago.

So we're like, okay, well, we publish our radar maybe two months after we have this meeting.

So, um, would it be, you know, is it the right time to still anticipate the maturity of

this?

Or is it, um, you know, one of those things that comes out and then, I mean, not last.

So I think there's, there's definitely, um, yes, something about that as well.

Yeah.

Two weeks is a long hype cycle right now, a lot of these projects.

Yes.

We get $50,000 in that time.

Um, one thing I just wanted to go back to you said, you know, I think it's another fundamental

issue we're all dealing with is, uh, a lot of code is being generated now a lot more

than before with all of these tools.

And one thing that keeps coming up in my discussions with people is how that creates new

bottlenecks all across kind of the life cycle from code reviews on rate really after that.

The code has been written like, what are you seeing there?

How are people dealing with that and what's your advice?

Yes.

And we blip this actually in the past radar and also, uh, I think we reinforced it here as

well as, um, first of all, I think a perspective of cognitive load, cognitive load for the

human agents as well as for the machine agents.

And, and this is where once again, good architectural principles like, um, thinking about

the boundaries of your code, um, and, you know, from before we've talked about modularity

and, you know, from 1970s, you know, partners as paper like once again, it becomes very evident

here as well.

So why that, that as a technique is important is that for the machine agents themselves,

I think for their own, you know, what we feed into their context, uh, windows, um, but also

for human, uh, cognitive load to be able to review and understand the code, write a lot

of that modularity, um, helps and comes into bear.

Um, and I think, uh, uh, once you have that, then you can start also thinking about the,

the harness that you are developing and that harness, including, you know, those guard

rails or architecture guard rails, what are those feedback sensors, right?

So, um, in addition to, right, your, the, the, the, the feed forward of your context that

you provide your agents with the feedback with the sensors and the, um, you know, the tests

and linteres and a lot of those common practices that come.

Sure.

Those are not going away.

They're not going away.

And if anything, it's like, yes, how might we do better?

Hmm.

I do want to share though that I think the other thing is to maybe rethink a little bit.

This is maybe a little bit more advanced, uh, uh, advanced in the industry right now,

but rethink of how we even think about code itself because, yes, um, the, the quantity

of code is, is, is, is just going to, you know, dramatically increase with how quickly

AI is able to produce it and then humans become the bottleneck in that case.

But I, I think, uh, there's an opportunity to rethink where does code matter and the

volatility and durability of that code.

So, what I mean to say is, first of all, this is why strategic thinking is important

about what's actually valuable for the organization, where do you want to build versus by?

How do you want to actually differentiate your organization, right?

Should we even bother building it?

So that is very important because it's so easy to build these days that it's, it's going

to get even easier that because code is, you know, going to become a commodity to generate

that you don't necessarily need to spit it becomes, there's a lot of dark code, right?

Like we have a lot of dark data.

We've been collecting a lot of data through our big data initiatives.

But now, you know, there's a lot of dark code more, more so than even before.

Um, but, uh, not that there wasn't dark code.

The second thing is, uh, so viewing, you know, should this code even exist, I think the

other thing is what is the volatility of the code?

So, because of it's so quickly to also create POCs, how might you architecturally or from

your system standpoint, think about documenting that code as having a, a, a life cycle that

would then eventually be, you know, get deleted, you know, so the, the, the, um, retainment

of that code, um, being very explicit about it.

But secondly, um, uh, the, um, thinking about code that could be dynamically and more

affirmatively generated, right?

Because like someone, like a user wants to be able to access a particular, uh, interface,

let's say, or an API, well, if I don't have the agent skill for it or if I don't have

that built already, and it's, uh, not a necessarily reusable, you know, feature, then why not

go ahead and just dynamically generate it for that particular single of it for purpose use

and then you're done.

So, so I think there's, it's going to be that perspective of how we really think about

code differently.

Yeah.

It's interesting.

It came up, I think first in discussions a year ago or so and the models weren't quite

there yet.

I think at the time that people were thinking about it already, like, maybe it's the spec

is more important than the code because I'll just regenerate the code as I need it and

the model gets better and then just have better code at the end and just don't worry about

all of the.

Yes.

But then we're going to still shift the needle then to the spec and the spec is to load,

which we're also seeing as well.

Um, and for that, I think we do have a technique by the way that our thought workers proposed,

um, we were calling about progressive context disclosure.

Okay.

Um, where, you know, progressively, uh, because remember, like I said, the machines cognitive

load as much as important as our human ones, um, and, uh, and the progressive

discontext exposure, you know, we are being explicit and intentional, I'd say, about,

um, what matters for this particular request.

Right.

Because that's super spec is also going to become difficult and then you want to think about

that modularly as well.

I think I got to spec a mono repo of specs.

Yeah.

They'll keep us, they'll keep us pretty for a while.

Um, for those who want to learn more about the radar, is there a place they can go to

is that public or is it only for your clients or with this?

Oh, no, no, no, no, definitely, it's, it's public.

I mean, our, um, our goal is to take a lot of the learnings that we're having and then

the escalation and then share it with the industry.

Um, so if you go to thoughtworks.com slash radar, you'll be able to see our latest one.

Uh, we do have our, uh, the, uh, the, uh, the next version, the next edition is coming out

next week, actually, April 15th.

So volume 34 will be out.

So, um, yeah, if you just wait a few days and, and, and, and see that one,

then you'll get the latest and greatest.

The end.

Awesome.

Perfect.

Well, the Misha has been a pleasure.

Yes.

Thank you so much, Patrick.

It was great to, to be here.

Same here.

Thank you so much.

Okay.

Why AI engineering needs old-school discipline

About this Episode

More from The New Stack Podcast

Why Broadcom gave Velero to the CNCF Sandbox — and what it means for Kubernetes...

Jim Bugwadia on why finding a Kubernetes problem is only half the battle for Kyv...

How AWS Bedrock is shaping Model Context Protocol

Why Microsoft is betting on temporary identities to stop autonomous agents from...