Loading...
Loading...

At KubeCon + CloudNativeCon Europe 2026 in Amsterdam, Alex Kestner, principal product manager for Amazon Elastic Kubernetes Service (EKS), discussed how Amazon EKS Auto Mode aims to reduce the operational burden of running Kubernetes at scale. While Kubernetes delivers significant power, it also introduces complexity—particularly through repetitive, day-to-day tasks like managing node lifecycles, ensuring security updates, and selecting optimal infrastructure.
Kestner emphasized that much of this “undifferentiated heavy lifting” distracts platform teams from delivering business value. Amazon EKS Auto Mode addresses this by automating infrastructure operations across the full node lifecycle, shifting responsibility for key operational components outside the cluster and into AWS-managed services.
Built in collaboration with the EC2 team and leveraging technologies like Karpenter, Auto Mode dynamically provisions right-sized compute resources based on workload requirements. While it doesn’t eliminate all challenges—such as unpredictable workloads or diverse deployment needs—it provides a more application-focused approach to scaling and cost optimization. Ultimately, Auto Mode represents a meaningful step toward simplifying Kubernetes operations in increasingly complex cloud-native environments.
Learn more from The New Stack about the latest developments around the latest with Amazon Elastic Kubernetes Service (EKS):
2026 Will Be the Year of Agentic Workloads in Production on Amazon EKS
How Amazon EKS Auto Mode Simplifies Kubernetes Cluster Management (Part 1)
A Deep Dive Into Amazon EKS Auto (Part 2)
Join our community of newsletter subscribers to stay on top of the news and at the top of your game.
Since its inception, Amazon Web Services has been the best place for customers to build
and run open source software in the cloud. AWS is proud to support open source projects,
foundations, and partners.
Hi there, I'm Adrian Bridgewater. I'm here with the new stack at CubeCon and Cloud
NativeCon Europe, 2026, we're in Amsterdam, here in the Netherlands, and I'm with Alex
Kessner. I'm just going to hit your job title right, Alex, your principal product manager,
AWS elastic Kubernetes service, otherwise known as EKS, and you're here throughout the
whole convention. Specifically, we're going to look at EKS Auto mode, I know, but there's
a sort of kind of intro into that. I think we probably is worth covering the fact that
Kubernetes complexity has always been the two words go together, don't they? As it's become
the de facto orchestration standard within Cloud Native environments, we've been talking about all
of the self-service and automation layers that are going to be there, incrementally there every
year to make things easier. Is Auto mode part of the answer to making things that much better?
Certainly for part of it. I mean, I think the challenge with Kubernetes is that because it
is so powerful, there's a certain amount of complexity that just comes with that space and
that and solving for that problem. One of the ways that we think Auto mode can help with
that complexity is that the infrastructure layer. While there are all kinds of complexity
and Kubernetes that we can't directly address through this kind of feature for Amazon EKS,
there are a very healthy portion of undifferentiated heavy lifting that we can take on for customers.
Where are the difficulty points? Is it scaling that's hard? Is it interconnections to new services?
Is it preparing for quantum? I'm probably jumping ahead of the game. Is it going the other way?
Is it trying to make form connections to legacy systems? Where are all the real difficulties
in operational terms? So these difficulties come from sort of the day-to-day tasks that take
platform teams time away from delivering true value for their business, like unique and
differentiated value, helping them ship their applications faster, serve their own users better,
and these often take the form of repeated and ongoing operational tasks. So handling the life
cycle of the nodes in the cluster, making sure that they're secure, up-to-date, the right
instance types are selected for performance and cost, making sure that all of the software
in the cluster that helps it operate is consistent, is up-to-date, the right fit for the workloads
in that cluster, and all of this adds up to a very real amount of work for platform teams.
And this kind of infrastructure toil is where we really focused for auto mode to help
publiciate some of that, and by mainly giving it to us to take on. Let's get into auto mode in a
second, but before we do, you just said, you know, node life cycle, and without being too simplistic,
we've for most of the last 20 years, we've been talking about software development life cycle,
people haven't even, although cloud native and cloud has been around before the millennium,
we haven't been talking about node life cycle. Well, all of that software has to run somewhere,
right? And that's infrastructure that needs to be managed by the teams that are running these
Kubernetes-based platforms. And so node life cycle is just one of those examples of things that
are critical for these Kubernetes platform teams to manage, but not necessarily unique to their
business or the problems that they're facing. And so this was a kind of an obvious place for us to
start. When we were talking with customers about the things that they really didn't like doing,
that didn't provide a lot of value for their businesses, that was one example that we just heard
repeatedly was that, you know, making sure that the instances where their software runs are secure
and up-to-date is just not something that's that differentiated from them. Got you. So,
so how long's auto modes been around? Yeah. How do you, if you had to do an elevator cell,
which probably everyone would? I've done a couple of times, yeah. You might find people getting out
than early. No, it's complex technology. How long's it been around and how do you encapsulate it
and, you know, someone that has no idea? So, like, like a lot of services and features at AWS,
we launched it at ReInvent in 2024. So, celebrated its first birthday last year, roughly, you know,
15 months old at this point. Fundamentally, auto mode is meant to take on a lot of this
undifferentiated, heavy lifting that we're seeing platform teams do just to get the benefits of
this incredible ecosystem that we see here through Kubernetes and the Cloud Native Computing Foundation.
It's almost a tax that you have to pay to get all of these benefits. And it struck us as something
that we could take on for our customers, letting them focus on things that are particularly valuable
for their businesses. So, what's been happening over the 18 months, not quite probably? Yeah, almost
that time period. Well, you said that you've seen commonalities in node execution and behavior,
and for everything I presume, from spreading up to retiring and node, there are key activities
that you can codify. Can you put specific mates on most phones? What are those?
Yeah, so I think there's two main things that auto mode takes care of for customers. One is
for a Kubernetes cluster to be truly useful in like a production environment. There's key
sorts of operational software that need to run on that cluster that help it interact with all kinds
of other infrastructure primitives or in the case of EKS, other AWS services. And then the other
part that we take those over for customers, we run them outside of the cluster and basically take
those sort of maintenance of that software off of their plates. These are key things for enabling
compute storage and networking in the cluster, table stakes for any kind of a Kubernetes environment.
The other side of this is a sort of unique innovation that we worked on with the EC2 team in AWS
called EC2 Managed Instances. So one of the ways that we're able to let customers offload all of
this work to us is that every instance that auto mode launches into a Kubernetes cluster is an EC2
Managed Instance. And this is an instance that looks like any other kind of Amazon EC2 instance.
You get the rich sort of portfolio of 850 or more EC2 instance types that you could leverage,
all of the different kinds of ways of purchasing this capacity on-demand spot reserved instances.
But it is ours operationally to manage. So trying to kind of thread this needle of letting customers
have their cake access to all of these these amazing capabilities from the AWS portfolio while
eating it too by giving it giving a lot of that kind of heavy lifting does. And is there enough
are there enough managed services within these offerings to really overcome the major.
It seems that because modern workloads are so dynamic and unpredictable, almost sounds like
I'm reading a marketing brochure, but that's what everyone always says. And because
you've got agentic services, some of which will enjoy rapid uptake and some of which will be
prototyped and fall off and skew and die, you've still got this kind of workload provisioning.
Yep, difficulty, which is and there's a lot of wastage. And you know, that's that's something
you want to avoid. I know AWS every reinvent you talk about the, I can't remember whether it's called
the shared responsibility model or something like that. It's like the customers have to take some of
this responsibility for themselves to be efficient. And at the same time at the back end, you're
the hyperscaler. You're going to be doing this much. You'll provide enough tooling to make it
as manageable as possible. But you're still, we're still going to have unpredictable workloads.
Is it something we'll never get around? I think this is a place, certainly not something that
will likely change in the near future, just by virtue of the diversity of kinds of use cases
that customers bring to Kubernetes and EKS. But it is a place where automotive really excels.
So, you know, just as we are taking on the operational responsibility for common
sorts of like maintenance tasks for this infrastructure, we also bring a very application oriented
perspective to scaling and cost optimization. So one of the things that we want to get customers out
of the sort of need to do is capacity planning. So automotive is built on a series of open-source
standards and products, one of which is the Carpenter project. This is a project that we launched in 2021
as an open-source project. Hasn't that become part of what you all know? That is right.
And also part of the CNCF here, which is such a nice sort of way that we've seen this
that kind of fledgling project emerge into something else. It's not common to exist anymore.
Yeah, of course. Sorry, it's an open-source project that we also build and operate as part of
EKS automobiles. So, as a result, what this means is that customers can have their workloads
specify the kinds of infrastructure they need. They're compute requirements, essentially.
And behind the scenes, automotive will go and look for the optimal and most cost-effective
infrastructure to meet those requirements. So when I say you don't have to think about capacity
planning anymore, it means that we'll let a system like EKS Automode figure out what the workloads
need and then deliver that infrastructure at the right time to allow them to perform their job
as they need to. And there's so much misconfiguration in cloud. And not just, I mean, that's probably
the preserve of the cyber security companies. They're in there talking about where things break,
just you to simple misconfigurations. But there's so many other reasons that you can have instances
corrupt, authentication issues. Or just inconsistencies across clusters in general. Like,
one of the things that we see a lot is that it's not that a customer is running one singular
Kubernetes cluster. They have a fleet of clusters, maybe hundreds. For all kinds of different
reasons, they maybe want to segment things by the way their organization structure, or they want
to keep things separate for technical reasons. But suffices to say that keeping a fleet of clusters
consistent and operating them with sort of a level hand is really a challenging task. And so,
this is one of the things I think we're also able to help customers with is by having all of
these best practices built into the product that we have in Automode, it means that you don't have to
try to, this includes security for what it's worth. You don't have to try to achieve that kind of
consistency through effort, but you get it by default through the system. And this is like just one
of those things that, you know, in aggregate adds up to less and less work, less and less sort of
maintenance effort that teams need to take on themselves that they get just as part of the offer.
And is there, I know we were talking about trying to codify commonalities. Are they common
across industries? Is that patterns that is that too broader question? I mean, certainly there are
unique things, industry to industry. There's, you know, even kind of uniquenesses within individual
organizations, particularly the larger ones. And, you know, one of the things, one of the kind of
balances that we tried to strike with Automode was, and as I think like the easy to manage
instances is part of this story, is letting customers have the subtraction that takes away this
kind of effort that is not valuable for them to be, you know, to be doing and spending, you know,
valuable engineering time on while also giving them the configurability and customization they need
to meet their use cases requirements. And it seems to be a key feature. I'm sure I'm going to
say it wrong. Is it 21-day maximum node runtime? Yeah. That's, and so it seems to make a lot of
sense for hygiene, system hygiene. That's right. And are we coming back to configuration drift and
all of the reasons that you want to enforce that? That's exactly right. I mean, there's certainly one
of the benefits in that, you know, but the 21-day maximum lifetime of Automode instances brings,
is that you can effectively rest assured that at 21 days, your all the instances in your cluster
will have been updated with ever the latest, you know, Amazon machine image or configuration you
have. What does nothing needs to be changed? Well, there's always something going on, you know,
whether that's through like a CVE that easy-get patched or just the latest performance improvements
all the way down through to like the Linux kernel. And so typically with Automode, we see that
about once a week there's a good opportunity for us to release a new version of the Amazon machine
image that powers all of the the Automode instances. And, you know, so the point of not letting the
abstraction get in the way of customers like real-world use cases, there's all kinds of ways that they
can configure how those new instances, those new images are rolled out across their cluster.
Right. Yeah. So is there over the last almost 18 months of its existence? Have there been
significant, I mean, there must be tears where you've seen the whole service graduate and step
up. Has there been significant moments? I think, you know, with the kind of beginning of any new
product, you certainly see adoption in common fits and spurts. We're really excited about,
you know, the amount of use that we've seen customers bring to Automode. I think, you know,
the thing that we found to that really, you know, encourages those kinds of like, you know, big
upticks in adoption are often because we've delivered some critical functionality that we, you know,
for whatever reason we didn't have the feedback or we just had to be, you know, diligent about what
the launch scope was going to be, you know, when we delivered those kinds of features, for example,
a recent one, one of the things that we knew customers would need is the ability to not only
trust us that we were going to be like kind of running things as effectively as possible,
but very be able to verify that. And so just recently we launched the ability for customers to get
logs from all of the managed components that we're running behind the scenes on our side of the
fence. A really popular ask from customers that, you know, obviously then results in, you know,
big surge of adoption now that that, that sort of gap has been closed.
And is that for that son to take him to analyze for vulnerabilities, mainly for troubleshooting?
So customers are so used, especially in Kubernetes with as sort of transparent
of an ecosystem as it is, you can go dig in as deeply as you'd want, right?
Customers are used to being pretty hands-on to understand why things maybe aren't
working the way they'd expect, whether there's a misconfiguration or, you know, some,
some kind of a sort of updated they could make that would make things work better.
And when we took on the responsibility for running a lot of that software,
you know, they wanted to see if they couldn't still have the same kind of visibility that they
used to have. And so that was feedback that we heard pretty early on. It was actually something
we wanted to include when we launched the service, but just, you know, it couldn't get time for it.
It's called auto mode. You know, you're almost imagining clicking a button.
It's, it's, it's what, once you, there is a, how long does it take to get running with all that
matter? Well, I think one of the things that is like the most exciting about when we launched auto
mode is that we also took a pretty hard look at what it took to get an EKS cluster up and running.
Right. And so as we, we're bringing auto mode into the final stages of development,
we decided that it'd be really critical, it'd be really great if we could use this as a way
to simplify what it took to get an EKS cluster. And so with auto mode, whether that's through the EKS APIs
or through the AWS Management Console, you can get started with an auto cluster with effectively
a single click, single click, production ready cluster up and running, you know, ready to integrate
with all of the various Kubernetes ecosystem tooling that you use on, you know, on, on, on all
environments where you run Kubernetes. And we think that is a, that's one of the things that,
that helps remove some of that complexity, or at least the burden of getting started.
But when you provide this type of service or product service technology, you, you know,
a naysayer might say, that's another level of abstraction, you know, my BMW is over-engineered,
I don't drive a BMW, you know, you're not allowing me to look into the engine room enough. And so
it's, it's, you know, a hyperscanner as much as we love you guys, you, you know, you are,
you're providing an encapsulated service at so many levels that you're going to get engineers
who will want to adjust to seeing a few more of the guts and gears working. How do you, how do you
respond to that? So we've been trying to create a really intentional balance with the amount of
configurability and then the amount of kind of hands-offness that auto mode has. So one of the
other ways that you can get started using auto mode is to enable it in an existing cluster and
decide which workloads in that cluster are a good fit or are ready to be migrated over,
ready to be handed off to us to operate, so that you can kind of gradually adopt this, this new
model for infrastructure and Kubernetes clusters. And is it, is it something that are you purely
talking to developers or do, you know, is it, do you find, you're, you're finding system and
general operations? Yeah, platform engineers is primarily who we think will get the most
benefit out of this. These are platform engineers who are even labels and tell about two years.
I know. And you, you know, DevOps, DevOps engineers before perhaps. Yeah. No, the reality is that
these, these are folks who are deeply devoted to building golden paths at their companies,
helping their companies increase kind of the, the velocity with which they're able to deploy
applications to their end users. And the work that, that, that kind of work that they want to
be doing is often like the ratio of that to this operational sort of day to day maintenance
of Kubernetes clusters isn't, isn't, isn't right, in my view. And so is the actual user feedback?
Great. You have automated something or what, you know, almost kind of like, why weren't we doing
this two years ago? Is it, has it been really welcomed in practical use cases? Yeah, absolutely.
I think that, you know, one of the things that's been really exciting is to see kind of the
variety of ways that folks have been using, using automoting and achieving not only its sort of
operational benefits, but also its cost benefits. You know, one of the, I think the easiest ways to,
to see someone take a really hard look at an offering like this is to be able to say
concretely, there are true cost benefits to, to using this. And, and one of the things that we do
see with, with customers using automotives that they're able to achieve pretty meaningful
reductions in their infrastructure. You speak to the actual users. There's, in fact,
there's a company called Stormforge, who is not itself a cost optimization, Kubernetes company,
who's able to achieve 30% infrastructure cost savings with automobiles.
Right. So it's kind of pace for itself kind of message.
Correct. Which in multiple ways. Yeah. And so I know you're not like to look at, you know,
we're not here to sort out forward the forward progression of the roadmap, but
is that, I mean, do you see this one day becoming subsumed as a utility service,
into some other, you know, into the wider EKS offering? Or, I think that it's that being too
suggestive. If our vision for automotives becomes the way that the vast majority of EKS
customers use Kubernetes, they're able to get away from having to do all of this
infrastructure maintenance and management and delegate that to us at AWS. And only for the
most sort of specialized and unique use cases, will they have to drop down a layer in the stack
and dig into, you know, that that kind of level of infrastructure configuration and management.
Of course, to do that means that we'll have to continue to be building in like the same
ways that we always have for automobiles to date, you know, continuing to strike that balance
between configurability and abstraction, looking for ways to make it even a better fit for the kind
of emerging use cases that we're seeing, particularly in AIML. Yeah. And bringing that same sort of
like application centrity or orientation to infrastructure so that customers can think about
the applications that they want to run, not how they're going to run them on the infrastructure
that's available through Kubernetes. It's almost like we should classify use cases into
that we should have a nomenclature for drawing. You've got an embryonic, you know, edge use,
not as in computing, you know, an edge use use case that's very potentially dynamic in terms of
the way it execute. So, you know, your provisioning should be at the most abstracted layers possible
because you just don't know what's going on. But your, you know, customer B, your, you know,
you sell shoes in a big city, so pretty much could tell what you're doing, or, you know, you sell
tickets for a concert or something. There must be almost a spectrum of use cases, which
that's right. I think we don't talk about that much, do we? No, and to be honest, you can see all
kinds of companies and all kinds of use cases on Kubernetes. That's one of the great things about
it as a platform technology. And I think, you know, in order to meet all of our customers' needs,
we're going to have to have this kind of spectrum of offerings that's let's them sort of place
themselves into how much they want to be involved and how much they want to be able to hand off
to someone like AWS. Great. Okay. Listen, thanks so much. Yeah. Thanks for giving us the insight.
It's, I'm thanking for being sort of, you know, colorful and illustrative with your definitions.
So I'm going to wrap up and say, thanks very much. This is us at CubeCon and Cloud NativeCon.
Europe amps to them. I'm Adrian Bridgewater with the New Stack. And if you want to see more of
this kind of content, please go to thenewstack.io.
The New Stack Podcast
