technology

HN823: Defining A Modern Network Service

The Everything Feed - All Packet Pushers Pods·Apr 17, 2026·51:10

About this Episode

On today’s episode Ethan is joined by Mark Prosser, a self-described Network Operator Advocate and Network Automation Dreamer, to embark on a thought exercise about network services. Together they grapple with questions such as: What is a network service, exactly? How is it defined? Is it even possible to define it when considered in the... Read more »

Hosts & Guests

Packet Pushers

Host

Transcript

Today's episode is sponsored by METER, delivering a complete network as a service offering, wired,

wireless, and cellular in a unified solution. Find out more at METER.com slash Happy Networking.

Welcome to Heavy Networking, the voicing your head is mine and I am Ethan Banks, sharing deep

packet thoughts with you since 2010. My usual co-host Drew Connry Murray is not here today,

but he will be back very soon indeed. You can follow us on LinkedIn to help build our fragile

egos. Now, Drew is not here because we are in San Francisco for a Nanog 96 here in early

2026 and I got a chance to meet up with some people in person and I'll introduce our guest today

in just a moment, but did you know that you can watch this show on YouTube now? You can and Spotify.

Yeah, if you would like to watch our talking heads, instead of just listening to us, you can.

And I apologize in advance, I do wish we were better looking. Joining me today is Mark Prasser.

Mark's self-describes is a network operator advocate and a network automation dreamer employed

by the nice people at Nokia these days, although Mark's not here to talk Nokia stuff specifically.

Instead, Mark and I are embarking on a thought exercise about network services.

What is a network service exactly? How do we define it? And if we can define it,

when considered in the context of a larger network, does that change what a network service is?

If Mark and I come to some sort of a framework, what's that mean for a network operations

and network automation practice? Hey, Mark, you've been on the Pack of Buses Network

I've come up all the time, so welcome back and for the folks that don't know you,

tell them who you are and what you're doing and about Tornog. You're on your shirt there.

Very happy to talk about Tornog. Tornog, I'm one of the founders of Tornog and it's a Tornog

network operators group and I'm really an advocate, not just for operators, but the Canada needs

in fact, coast to coast to coast. I'm surprised you don't already have dogs. There's only four

people in Canada total, right? I know all four of them and so we can, you know, one in Vancouver,

one in Nunavut and one in East Coast, like up north of Boston and then here in Toronto at the

end of the universe, right? So all four of us are going to get together and make it happen.

Yeah. Tornog is actually a big tech city and so I imagine what I've heard about Tornog and

you started at what about it? We're almost a year to the day. Yeah. Yeah. We're at the point.

Six right now. Yeah. So and but it's taken off is basically what I'm here and you've had really good

attendance and good sponsor support and so on. Yeah. Absolutely. So like when I when I put it out there

and that was the real difference, I just made the call saying, hey, everybody, I'm going to do this.

And what I expect at the beginning is I would have operators come maybe five people, you know,

four of the five coming in from from all of Canada to come to my apartment and we have two

piece boxes and we say, yeah, let's go ahead and do this. Instead, unfortunately, we broke the

fire code in my building very quickly. We had to find a new venue. So really, and a lot of people

came to me and said, like, I was looking to do this as well. And somebody inspired me. I think

was Vincent Lindrow from Shionog. He said, the hardest part is sending that invite up a call of

action for a way to come. So now we've had two meetups under our belt and we're moving to our

first full day event, April 13th, 2026. We have quite a bit of capacity for anyone who wants to come

from far near and far. And we're going to be talking about regional dogs. Okay. So if you're doing a

full day event, something has changed event because now that means you've got enough interest

enough momentum and enough content and enough commitment from people who think they're willing to

take a day out of work to spend the day at Torna. Absolutely. People are saying that like they wish

they had more commitment from us to make that platform available to them. Because doing it meet

up in the evening, you know, there's a lot of me things in life. But if you can a lot a day,

an actual work day come in and talk about the content or get that hallway check out. People

are really interested in that as well as when we came here to Nanog and now I'm going to say six

of my colleague James Henderson and I, we've been realizing we're kind of the bell of the ball here.

And people were saying we always wanted this to happen in Canada. We're just waiting for someone to

do it. Now, to be said, we're not the only dog in Canada. It's also Montreal, not Claire. Okay.

Yeah. Yeah. Okay. Well, well done. Good for you. I'm glad you got a bunch of people in Toronto

that are interested in that and then a bunch of people to draw from. So I am director of the New Hampshire

Nugger director. I don't know what my title is anyway organizer. I think that's the word.

And the struggle is that's more rural community. It's not a big city. There's not

hundreds of thousands of people that potentially draw from and lots of businesses and so on.

And so it's been a struggle to get that going and to get the communications out there. There's

lots of network operators and lots of interesting networks even in a rural community. But just to

find the people and bring them together and all that has been a bit of a challenge.

Yeah. Yeah. I mean, like Toronto is like pretty much like the biggest city in Canada. We have

we call it so we're a big city, but we are small town interiors of our community. We all kind of

know each other. We work together in many times. But I'm actually from really close to New Hampshire

in New Brunswick as a province is about rainbow of Maine. So the idea, the theme of poor

dog is about like, you know, considerations of regional dogs, like things about pulling fiber in

places that don't look exactly like San Francisco in the Bay area. So I think like,

if somebody wanted to do New Brunswick dog or Nova Scotia close to Coast Coast or up in the

Arctic, we're talking about like some of the concerns with, you know, the current headlines.

If you are looking at the Arctic right now, so I'm willing to be someone's major dome off. They

want to reach out to me and get some support on that. And but we do have people coming from

US as well, Cleveland. So if you never know, if somebody didn't do Brunswick dog, maybe even

the New Hampshire people can come up a little bit. All right, Mark. So let's get into our

discussion here about about what a network service is. So you brought this topic up to me.

You slapped me and said, Hey, what's record at Nanonc about this? So what's on your mind?

Well, I noticed that it's something that we kind of once upon a time, everyone just kind of

assumed that they knew what a service was. Like, you know, you had maybe a Cisco certification.

You say, this is how I could figure your SDP. This is how I get a service. Like, you know,

and how do you like connect a VLAN from this port to that port? Or you get my BGP hearing,

I give someone internet access. Things were kind of very well understood 10, 15 years ago.

Well, from an engineering perspective, sure, in that we know technically the thing we need

to provision to deliver whatever the result is. Oh, they're looking for VPN access. I gotta do

this. I gotta set this. I gotta tweak this policy. I gotta update this access list. Boom. We're

there. And that's that would be a service in our minds. Is that what you're getting at? Yeah.

And I would find that the thing that has really changed over the past 15 years is now, for example,

in the network automation form, we're talking about delivering services everywhere, all at once.

And people are talking about different things as to what a service is. For some people,

they're saying, I'm just pushing configuration to an endpoint of the network. And I consider

that deployed. But in my perspective, a real service is end-to-end. It's many layers of

abstraction. And if you build a road, the cars have a lot of congestion. Is the service really

servicing the applications or the business outcomes? That's what I'm seeing. People aren't really

thinking about that. They deploy it. You know, wash their hands of it and walk away. So I'm thinking

more, we need to come back to some of those fundamentals and think was the impact of the service

at any time of the day, the many layers of abstraction. And as optical networks coherent optics,

get more complex. You know, what a service is with now firewalls and more complex is really

changing. And you go to different parallels or different verticals within our industry and

how we define services is very different. How hyperscaler coming to the conference, talking about

the way they do things is very different than an edge DC out in rural Canada.

So we need to define the service because the new and brought up an interesting context here,

so we're talking about the context of automation. Okay. So let's take a step back and go back

15 years again. If we're provisioning a service, there would be a lot of little services being

provisioned online. I don't want to call them microservices. That's going to complicate things.

But many services, perhaps, or maybe a better analogy is legal bricks that make up a larger

a bit of infrastructure. There's this provisioning of a VLAN and there's tweaking, I don't know,

spanning tree. And then there's maybe a tunnel that's got to be stood up in some encryption.

All of those things are little mini services along the way that have to get stood up to deliver

a bigger service. Are you making distinctions between that? Do you only care about the begin to end

services first, this conversation goes? No, I think that even for small organizations,

what the game has changed a lot, I think the biggest difference between way back then is we kind

of took like in the data center, we kind of said that we have our border, we have distribution,

we have our access, we set the core fundamentals, and then we're just adding compartments on,

we're adding a customer on, we're taking a customer off, maybe we were like cycling one rack

versus another rack. But now when you're talking about an organization like maybe a retail organization,

they have SDUAN to consider, they have like you know, layer seven firewalls for MGE, FWs,

like servers can really change based on the requirements. And now Mac's exit thing, we were,

I noticed impact busher is for quite a while, we're talking about overlays, and now we've achieved

them. Like they're almost taking brand. Yeah, I mean, we start with like cloud connects and all

these things that are happening. And how something gets from a cloud resource to you know, a private cloud

in organization to the branch is maybe not really understood because the documentation that we had

couldn't keep up. And when you bring up overlays, part of the implication there is that overlay could

be running across, oh, at the top, across infrastructure, we don't own. So how do we define services

in that context? Because we can't guarantee the service delivery of what's going on, they,

that over the top service that we're providing. Yeah, I mean, you think of things like

NXX, which might run over your own EVPN, which runs over someone else's EVPN kind of scenario.

So it could be many layers of overlays and that they, all these layers can be abstractions

at leak upwards. And so we don't think about those elements. What is, what, what could happen

until it does happen? So when we talk about automation, how do we validate these many layers, right?

So many great tools that are coming out, but I find we're just starting to come as elements,

we look at the operation side of automation. Okay, so what do you think a network service

should be? You've brought up things you're changing. And maybe we're not taking,

taking them into account properly. So can you, can you give me an example of what you're thinking

of as a network service has got more complexity to it now that we should be considering as we automate

it? Yeah, I mean, like I think you shouldn't really look at your, your whole network is just one

hole like boil the ocean. This is all the vendors and speeds and fees and protocols I have.

I think you really should have to do is look at a service and kind of walk the assembly line of

that service, see all the things as to how does this portion of the communication get to its

end zone? And is it point to point? Is it point to multi point? You kind of have to understand

that because sometimes in as they're kind of alluding to in the compute, it could go across multiple

teams and do they really collaborate? So defining a service to saying like, if I could create that

high level diagram and that low level diagram, what is really the technical debt that's within

this particular service as I life cycle it, rather than my data center fabric, which maybe has

many services on top of it. But I, so as a network engineer, I would know all of that instinctively

in my head, there's a switching layer, there's a routing layer, it may or may not go through a

middle box like a firewall or a load balancer. And I could tell you every step along the way of what

was happening to that packet of data is a left decline got to the server and came back. I might

have even known going up the stack, what was happening once it hit the server and then what happened

to it before it, the response came and it found its way back. How granular are you talking about

wanting to get? Well, I mean, it's either a collective knowledge among multiple teams or if

depending on the size of your nation, it'd be the one team. But what I'm finding when I go around

and talk to operators, they really are finally grained on their particular CLI commands, their

particular single pane of glass, which usually remains many panes of glass. And they're kind of

worried about their one segment on network. There are many architects that say I'm aware of the

entire shell, the entire warehouse, the really the materials moving from end to end. But not everyone

is quite like that. So for example, like when you say a service, if there was a problem and you're

looking for me, poor errors at the edge of the network and there are no poor errors, but there's

definitely errors occurring and does everybody know that or does everybody know who to contact?

You're arguing network segments are that siloed? I think so in the modern era, definitely they are.

I mean, like as people also outsource a lot more of those things to hyperscalers to carriers,

right? People are more connecting things up and outsourcing that portion. But I can't, I mean,

I don't know on that network. So I mean, I'm not going to be able to tell you about poor errors

that's happening in and as your cloud that I'm housing a service in. And I know you're not arguing

for that. So are you saying I'm using, I don't know, a thousand eyes or someone like that who can

make inferences about the infrastructure I don't own? Yeah, there's a many like observability

options out there that could help with those portions, as long as you consider that part of the

end end of the service, there's many ways to to skin that cat. Well, again, you're just to take

the vagueness away. You're arguing we should care about those things. Absolutely. Yeah. Like the

the people that the carriers and the hyperscalers won't really have those those granular open APIs

available to us, or will they reveal those secrets on this particular port was really you've

converged this time unless you maybe have enough power to ask for that. But there are ways to say

that if I couldn't transmit this packet across that service at this time, then yes, this service

was impacted. And I feel like when things were a lot simpler, we had less outsourced, you know,

transit pass. It was easier to determine that. But now it's getting harder, right? The service

really is is a collection of of sub services that are greater than some of their parts, right?

And in the case of the VPN, it's can someone connect in in the case of, you know, connecting all

my data centers together. It's the optical. It's maybe the IP DCI portions. It's maybe like the

people that are giving me this data center fabric in a collocation. So these things all come

together and we have to be accountable for them, even though the rate of change is faster than ever

in the AI era. In the AI era, yeah, you had to say it. Our sponsor meter is a network as a service

company. And that means a meter network is a tightly integrated stack of network hardware and

operating system software delivering wired wireless and cellular service. Now meter delivers the

gear to your premises as a service, right? And what that means is the tedious parts of network

operations are handled by meter for you. So you don't have to do with NOS upgrades. When gear

hits end of life, they're going to send you shiny new gear, all of that, and they monitor your

network for you remotely. Meter can handle sites as small as a branch office, as large as a

campus or data center. Meter doesn't just ship gear to you though. They handle other aspects of

bringing a site online. They can deal with internet circuit procurement and site surveys and cabling

all of that stuff in more is part of what meter offers. The network stack that they're delivering

gives you a full, full function, all the things, security, routing, firewalling, switching,

wireless, cellular, power, man, that is intelligent power, DNS security, VPN, SD-WAN, and even multi-site

workflows. In fall 2025, I did video in the packet pushes YouTube channel where I work with a meter

stack to show you some of those features. And I'm going to do an updated video in 2026.

I was at their headquarters recently, and I got a sneak peek at their newest hardware and a preview

of the new software features. There's a lot of stuff coming. These guys are iterating fast. And

I would put this, there's a vibe. It's been a while since I visited a startup that got,

I had that genuine thrill of enthusiasm from everybody that I was talking to there, but I got

that at meter 100%. The meter team is excited about what they are building. So thanks to

the meter for sponsoring and go to meter.com slash heavy networking to book a demo that's

n-e-t-e-r.com slash heavy networking to book a demo now.

To bring those services all together, that means there's a bunch of problems here. The least

of which is technical. I think that there's organizational problems here. That is, if I'm

very focused on the edge network that does the thing or one specific data center that

when it hits the fiber that leaves the data center out on care anymore, you have to cross

organizational boundaries to be able to bring all of those services and have them be managed

under a common management claim so that we can all see what the other one is doing. Typically,

if the responsibilities are siloed, probably the monitoring and management are siloed as well.

Absolutely. You agree? All right. I got absolutely agree.

Okay. So how do we get by that in your mind? That's great problem.

You're supposed to have the answer. No, no. I'm posing the questions. The organization can

ask themselves this. So there's this concept out there called socio-technical systems and you

kind of pushed us towards that by asking that golden question. Yes, you did. So the concept of

a socio-technical system essentially is this concept where there's a balance between the socio

and the humans and all of our faults and delicacies in life and in the technical parts. These

are the parts that we really love, like the nerd knobs. I could build this on top of this and

on top of this, but then one other organization or part of the organization does the other thing

and these two organizations have to communicate or collaborate. That's really the hard stuff.

And so this concept of socio-technical systems, which actually comes from 1950s,

British coal mining, is saying that we need to see that between the humans and the machines

there's a joint optimization. The human stuff is usually the tricky stuff. These pesky humans in

the loop, in the machine, along the assembly line, sometimes don't want to be so collaborative

and consider how can we work together to deliver this service for the business. The book,

the 1984 and 1986 book, The Goal, also talks about this and the Phoenix project is basically

a modern version of that, right? So this is the million dollar question is when we talk about

building like the controller, the automation platform, build up platform, which I did a talk

about at AutoCon 4. People don't realize that it's hard to determine what are all of the elements

and who has access to the database? Can I get access to the database and know how it gets from here

to here within the data center? That's really hard to do. And until you start asking those questions,

you don't know what it takes to get those people on board. It used to be a lot easier to do,

just to speak to your point. 20 years ago, it was possible as a network engineer,

generally speaking, to have the entirety of the network in your head. Even if it was a sprawling

when you kind of could do it and it doesn't feel like you can anymore. Well, I mean, like all my

mentors, they were all, they were in the engineers, but they're also kind of cisimists and all of them

kind of understood Pearl and they all work together, right? So maybe you're a DBA and

network engineer and a cisimist, but maybe you're called an earth engineer at the time. And the

the barriers between those two aren't the same as now because now you're talking to someone

talking about Kubernetes. And you're really on like the WAN routing side of network engineering.

You know, even speak the same language, right? I walked myself on a hotel for three days to learn

Kubernetes because I couldn't, I just wasn't picking it up. I had to just the vote hours and hours

and hours of time and isolation just to get a handle on it. It wasn't translating. So yeah,

your point is well taken. It's harder to pick up some of these skills. Whereas, like, you know,

I mean, somebody like his assessment back then could probably pick up like, you know,

Cisco, I have us language pretty quickly and have an idea of how they could automate it with Pearl.

Yeah. Yeah. Maybe no one can read each other's Pearl, but they still all wrote Pearl. Yeah. Yeah.

Yeah. The knowledge domains have become more complex. They've built one upon another all the legacy

stuff that we had to know. We still have to know. And now there's been all the years of iteration,

corner cases, and new technologies that have been brought to bear, deal with those corner cases

and how they're applicable. And there's a lot, there's a lot more to know. It really is a more

challenging job to be an IT than it was because of technology building upon technology.

I would have hoped we'd gotten over the siloing thing though. We just feel like we've been talking

about technology silos and how a lot of organizations are aligned upon technology silos for forever.

And the winning organizations to me have always been the ones where we're not drawing sharp lines,

like you're the security people and you're the network people and you're the this one,

you're the that. And we don't talk. You know, we it's been most successful in my experience

when those groups work together and your organization is built more along service delivery and all the

people that have a role to play in delivering that service are speaking together regularly and doing

whiteboard sessions together and figuring out how to best align their knowledge domain specialties

and the technologies that they're bringing to bear with one another. But it doesn't seem to work

that way. Yeah, I mean, sometimes you'd think it would seem simple. Actually,

let me take a step back. You touched on it. Very interesting is that like when humans, we're all

responsible for the same business outcomes if we work for a business, right? So we should be working

together and see ourselves as part of that process when it comes to like employee surveys coming out

in organizations, they say like you see yourself as part of the outcomes to business. If you're

answering no, that's probably a problem, not just in your organizational, but also in the delivery

of you delivering a service. But when you think about people working together and you say like,

oh, I love EVN. And I love EVN, too. And then we talk a little bit deeper like I'm a VX land person.

No, I'm a EVN MPLS person. Oh, suddenly the conversation breaks down. We're a completely different

sex of the same problem, but delivering it end to end from that DC to another DC, that's where that

stitching really happens. And then someone starts talking about similar routing and the TLVs, you know,

and like, sorry, maybe you lost me, we can go to Geneva in like the VX land world, but I'm not

talking about TLVs and I, so yes, right? So we thought we could collaborate, but if we try

a little bit harder, we're probably have very similar abstractions and how the X land works and how

MPLS work, we just took the time to understand each other. So even when it seems simple, it's hard.

Well, when you go between like platform engineers or site reload and site reliability engineers

or whatever they call themselves these days, to like network architects, like the gap was pretty

wide recently, but I noticed at Nanog and at the recognition forum, it's starting to come back

together somehow, some way, somehow we're going to get there. Maybe that's because of the stuff

that Limh Hendrix is doing with KubeNet. Maybe that's container lab. People are starting to get

onto the mindset where we're bringing network back to whatever the infrastructure is today.

This, some of them, I think there's also been a number of talks about you can go and talk

to someone just because they put in a help desk ticket request and they work in a different group

doesn't mean you shouldn't go talk to them. So go talk to the developer who's requested the thing and go,

what are you trying to do? How come? And then building a dialogue with them and building that

rapport translates to a better outcome because they asked for a technical thing thinking they

knew what they needed, but maybe they actually didn't know what they're asking for. There's a different

way, a better way that you can deliver that service. You're not going to figure that out if you just

give them what they asked for. Sometimes you need to just go and talk to them to deliver that

outcome. Go back to your example of the two network engineers who you think this would be a simple

conversation, but because their specialties are different than someone so doesn't have experience

with ISIS and delivering communications and metadata via TLDs. And all of a sudden the things break

down, you know, you can have those educational discussions within a network engineering team and

then across IT groups to help foster systems level thinking. Because this is another thing

of what you're talking about. We're really talking about systems level thinking across groups, domains,

organizational boundaries that deliver the business outcome that we all commonly are intending to

deliver or should be. Absolutely. I mean, like systems level thinking exactly where we need to go

and us participating and collaborating to what is this system is really the key answer there.

I mean, it shouldn't be as simple as like we both love the CLI, but you love ISIS XR and

I love Junels or SROS. These shouldn't break down. We all agree we love the CLI and we can work

together to say what is the common abstraction to understand the service as it goes across.

With the system level thing, the early, at the outset of this podcast, I did say don't think

of your network as a whole, but yes, do think of it as a system. Yeah, these are very clear

distinctions. Do you think about like all my speeds and fees rather than like everything I have is

an opportunity to deliver a service and I'm going to understand that. Yeah, going back to don't

think of your network as a whole. I don't think your argument is you shouldn't think of your

network as a whole as much as you probably can't because of segmentation and silos and so on.

Yeah, you shouldn't just simply say like I've got some of these and I've got some of these

and this is like my network and I have data center and I have you know optical. Don't think of it

that way. Think about like what are the actual services, what are the outcomes that are attached

to this mission critical network that we have. All right, I want to go back to something you said

near the top of the show we were talking about defining a service and you were getting into the

notion of testing evaluation as being a critical part of the service and helping to define the

dive into that. What do you mean by that? Well, I mean like you think about like you could say at the

outset of I'm going to start validating my network and maybe I'm going to do some automation around

it or maybe it's more manual testing that we'll do. Some people just have it in their their

tribal knowledge within NOx but in in regards to taking a look at this you say, okay, well it should

be easy to test my network because all problems are understood and all problems are bound. I heard

someone say in the recommendation form recently when you go through the actual steps of starting from

like, okay, first of all, I need to look at the light levels. Danny's look at the actual like

maybe the buffers coming into the poor. What kind of service is good? Where is this go too? You

actually start realizing there's thousands and thousands and thousands of go as you go along the

service and just taking the moment to look at that and build a graph as to how we're going to

develop this service is way harder than it seems. But the sooner you get started the easier it's

going to be because in particularly a modern network service with overlays and components of

the network that are not yours and there's a lot of layers to it and then the dependency tree to

deliver the service becomes a lot more complex than say a very simple ethernet delivery that's got

one VLAN a routed port and another VLAN and even that's got a lot of steps to it honestly. Absolutely

it does. Sometimes we think about it just as like the packet flow, like walking that packet

from end to end, but sometimes also about compliance. It's security domains or micro segmentation.

These things come in as well and how can we confirm that at this exact moment we are compliant

with what the service is supposed to do? Are you positing a question or do you have thoughts on

how to answer that question? I've been working on it for a little while with some of the tools

and also the recognition for them. I find that there's some amazing frameworks that make it easier for

us like PiATS, Nuts is another one, and ETS, you hadn't created an episode on that. Yeah,

yeah. So Erzen and Marco talked with us about Nuts at length, yeah. But I don't think that there's

a one-size-fits-all. I think it's really each individual organization needs to look at this

and say like if we broke down all these layers under the service, you kind of had to get started

and try to use a tool like Nuts and PiATS can help you pull things out from that. Well,

PiATS also has testing elements as well. I mean, John Cabo Bianco wrote a whole book on it

with his colleague Daniel Wade, I think it was, yeah. The challenge of testing the part of the service

delivery, oh man, this is tough because you don't know what tests you need until your service,

all until you passed all your tests, but the service tool isn't working, and then you go back

getting a test. So there is a cycle that you have to go through, so you're asking the right questions

in your tests to understand the service truly is being delivered. I mean, but we could take a step

back and say you're arguing that you can't call a service a service until it has passed some

number of validation tests. You can't say that the service is working as it intended until it's

passed those tests, right? Well, do I have to have the application running across it? I mean,

that's definitely, that's probably the best way. If you start the high level and say if the service

is going into N, that's great, but how can I make sure that if something went wrong, do I have

the idea as to where it's going wrong? See, I, okay, I'm not, I'm not sure how I feel about this.

So I can, I can pave the road, and if I paved the road, job done, do I have to have a car go

across the road to prove that the road is truly there and is able to carry an automobile?

I'm not sure, you know, what I think about that. In theory, I can test and validate the road would

carry the application if it bothered to show up. So do I have to actually get right down to the

application itself is running across it? I'm thinking out loud here. I don't actually know how I

feel one way or the other, but do I have to have the app running across the network service that

I provisioned to validate that the service is indeed running? There's probably some like

some elements of you need to be reasonable what you can actually do, right? With sometimes some

applications only really work best in production. In the case of a voice call, right? Somebody has to

pick up the phone and make that call, right? So that's a hard one. You can definitely simulate

voice traffic, but it's not the same as this is the time of day when they went all the calls.

The call was made and all the teams. Yeah. Yeah. She talked about like egress pure engineering and

some like, you know, like basically traffic engineering elements saying, I'm going to make sure that

if this happens going to Microsoft at this, in an exchange point, I'm ready for that to happen.

But when the, you know, the all hands happens in your organization, they have like a

global company. That's when you really feel it. So yes, maybe it's hard to preemptively test

that one scenario, but doing something is definitely better than nothing. Yeah. There's a bunch of

challenges here because when, when you begin putting application traffic across the link as I'm

again, going through my own thought exercise live on a podcast, one of the things that happens is

unless your tests are very thoughtful, you are not going to correctly simulate application

conditions. And so you will run into things like, Oh, in testing, we never hit MTU. Oh,

turns out as soon as we did something breaks, you know, when we send really large packets through,

you know, for example, or QOS, you know, we never had a congested situation where our QOS policy was

was tested. Didn't think about it. Oops. And then that came comes to light when the application

goes across. So yeah, okay. But, but there is thinking that my testing must be quite robust

in mature to prove that this network service has indeed been, been delivered. Now, in the context

of network automation, you're, you would advocate, I assume that when I provision that service that

presumably is, is largely automated. Let's assume, assume I'm at that level of maturity with my

network operations, I would be testing as part of that service provisioning within reason.

Yeah, definitely. I think it's, it's a good thing. Of course, it's not all at once. You don't boil

the ocean. It's an iterative process. Maybe you realize you miss something about how can I

reasonably test MTU when the service is turned up? You add that on later iteratively. And I remember

you did a podcast episode about S&P versus Genimiz. It's, it's, it's, that's a whole like

in computer itself. It's a, it's a tough topic to have that type of observability in the modern

era of platforms. So it is really a hard process. But until we get started, we don't really get

anywhere with this. Yeah, it's funny. It is. Yeah. That's, that's also true. It's not merely running

some sort of a test. It's being able to measure the results of the test without you staring at

a packet capture, for example, but being able to get a structured data sort of answer back

into a system that can validate that the test did or did not pass. Huh, I'm considered that

because you know, back in the day, the way you would run tests is just like, did it work? No,

I wonder what happened. And you start digging around and, you know, looking at different

counters and whatever and maybe looking at a packet capture and figure it out, which is not,

how, which is not a good way to do automated testing. Yeah. Yeah. And you made an interesting

point. We were talking about like, we're looking at costs and, and to you, I remember there was

this one talk. And I feel like it was at Nanau once upon a time, there was an Amazon engineer

talking about how they had this problem that kept plaguing them once a week or once a month,

every once in a while. And every time they saw the issue, they're like, well, of course,

like a job in package because it's elephant flows. They kept coming back. So they tried to,

you know, test and consider how elephant flows within eventually came to the point where this one

Amazon engineer said or rough. Sorry, maybe he was even mad at that, that said this. He says,

if you think it's elephant flows, get out of the room. I need something with a different thought

pattern. And so when he chased down the application developers, he found out it was something that

the application was doing itself that was really ruining the network because they were trying to

get around in that work in a way, but that was breaking it, right? So, so I don't exactly know what

the root cause was. And if we could maybe link that issue as we could find it. But it's just

fascinating. Sometimes the paradigms we have as an earth engineers actually getting her own way

because we look at it as one simple set of tools. Maybe there's other problems out there,

like I, if you think about like in the wireless space, like the Fresno zone and maybe

tides come up into that zone, it causes like degradation of the radio signal. We don't think

about that as data center engineers. But that's a different paradigm of people like in the radio

space know about the first time I ran into application level acknowledgments was troubleshooting

a problem that turned out to be a packet loss in an ether channel. There was four pairs of

fibers that were bundled together to to give us some bandwidth between a couple of course,

which is any traffic that happened across to one of the optics that was in that bundle would

experience some packet loss. But the symptom was nothing like that. And the symptom was,

well occasionally a transaction will come through, but not get acknowledged into the transaction.

Runs again and then it hangs up in the queue and if just everything falls down or we got to

re-cute this is terrible. What the heck is going on? And it's like, I don't know, you got an

application problem, buddy. You tell me. Yeah. You know, it was what it felt like. Well, and then come

to find out talking other developers. No, we do application level acknowledgments. We do not

rely on TCP for delivery. When a transaction was received by the remote side and acknowledgement

is sent back. And if the acknowledgement is not sent back, it's the way the thing structured

gets hung up or will whatever all the details were I don't even remember anymore.

Long story short, we will detract that down to this is a packet loss problem. Sometimes the

acts are lost. And then eventually we were able to determine, you know, bad optic in an ECMP pair

that was a very gray failure. You just, it wasn't consistent at all. It drove us nuts

to we got there. How do you write a test to solve for that problem? I don't know. You don't.

Right. I think the idea of like testing as much as you can is to leave your team from the

mundane stuff the day to day. It's like a really show where they have the value to look for those

corner cases and collaborate, have time to collaborate with the other teams and look for those unicorn

problems. And service validation is a different animal than troubleshooting to be to be fair.

Absolutely. Yeah. Sent me down this, you know, down this way of thinking.

So it's okay. So we've been chatting for about a half an hour now a little longer. How do you

define a service then? Do you have an example in mind? I mean, well, it's exactly that. It's like

once upon a time, I mean, you could probably say that like if you did a set amount of BGB commands

or show commands and you maybe put them in a script and ran them all and it looked okay in the end,

the service is great. But today, I think a service is really, the vague way of saying it's great

listening with part, but the real answer is that every component or every portion that this

application goes through needs to be considered validated and documented in the humans that run

this business need to understand it at some level. That's the way I would describe it. Let's

walk through a scenario then. Let's walk it through and and tell me how you think about these

things. So I've got a data center and I am needing to stand up a service that lives in AWS.

Is this a plausible scenario? Is this work? Yeah. Okay. Okay. So let's walk through it.

I'm beginning the transaction happening in my data center. Walk through how you would think

about this process. Yeah. I mean, think about it. It's like maybe originates at the

nick of your computer and it goes into perhaps your local brand trotter if there's not a switch in

between it from the router. It could be many things that are getting it across. It is an SD-WAN

connection. Yeah. Right. And then the SD-WAN portion is essentially a modern complex SD-WAN

or IPsect tunnel that goes over that. There's all the implications there as that packet goes over

which care. So just so just so just so far. So now I've got to do some kind of validation to get me

from do I run a test in your mind? Do I run something at the host level? Or am I going to be in a

layer to whatever the first network devices that the packet touches within our domain? We're

probably starting at the network devices. We're going to look there saying I know that based on this

moment and this test I can do at this time, it's probably coming into this interface. Some tightly

integrated shops will include the host. Yeah. But by and large, that's probably not realistic for

most of us. That's too separate of a domain. Yeah. You're probably in like high frequency

trading. You're looking at everything all the time and it really matters those every single microseconder

nanosecond. But in this scenario, let's just assume that we the network team are doing this, right?

What are we testing to get to get for those to get to that network edge so far where I've gotten

to the SD LAN router? I already translated at least a switch and then the router if we keep

really really simple, it was probably more. But let's just say what would you be testing at that level?

Well, I'm like, yeah, assuming that maybe there's no like campus scenario and all the implications

there. If it's just like like making a simple puzzle, there's the host, the switch, the router.

And then at the router, the implication of SD LAN or trans-appears, HTTP, it could be like which

actual link is it going over to get to the quote unquote internet or my carrier or maybe my own way?

Oh, this is interesting because in this scenario of SD win, that can change.

It could change on a dime, right? Based on the scenario, what type of traffic it is. So we're

going to say what traffic is going across and what link would that take? And how did that link,

well, how can we validate that link is good at that time? What does the path look like from here?

So wait, so wait. There's testing of a service that we provisioned and we need to

demonstrate that the service is viable. We have successfully provisioned the service

once the service is provisioned at the front end of its lifecycle. But now that the service is alive,

you're also talking about ongoing validation. Yeah, it never ends. I mean, yeah, unfortunately,

I mean, like there's there's no sleep for the wicked as network engineers like the network is

24 seven. But this is not just red light green light. You're talking about ongoing testing of the

service for its entire life cycle. Yeah, absolutely. I would say so. I mean, if that's that's

the business in terms of what does that give you in terms of value? But if you wanted to know a lot

of test data to store somewhere, if it's an office in nobody's there at 8 p.m. What are you testing?

Right. But when people are there, it definitely matters. But if perhaps let's say if you

back in the day, we used to have like little closets in our offices and they'd be running maybe

backup jobs over that one, that definitely matters at 8 p.m. and onwards. Right. So if that matters,

your business outcome, you're reframing. I think how a lot of us have thought about testing in that

as a network engineer, I would be testing. I would basically, I'd be testing network transport.

I wouldn't care what the application is going across it. I am delivering a connectivity fabric

just to speak about things in a generic way. Don't read into it. He don't have fabric or anything

too specific about that. I'm delivering this connectivity across this network and I'm validating

that. All my nodes are up in my circuits are up and maybe I'm I'm certainly monitoring bandwidth

utilization and routing convergence events and other sorts of things. But it is abstracted

from the services. It is not married to the service. If I was doing any sort of service level

monitoring, was probably application performance monitoring with synthetic transactions,

the validating, I can log in. And if I request this page, I get this page back and not a 500 error

and so on. Not interconnected tests. You're talking about something that's a substantial step

beyond that. That's quite a bit more intense. Well, it depends on the scenario. You didn't

brought up an interesting point. We think about this. I mean, for a long time, when we think about

networks, such as we get start to understand them, we are very non-discrimination on terms of

the packets that go across our network or the frames that go across our network. We don't really

care about it so much. But when you start talking about business outcomes and you start saying,

what is this service versus that service, then we start coming back to the things you were talking

about on the nuts episode. We're talking about like, there's integration testing, there's end

and testing and there's synthetic testing, right? The synthetic testing is a lot more closer to

the application, the end to end portion of it, right? The integration testing is more about when

this thing comes online network, does it converge in the way that I expected, right? So it's,

when you start to ask those questions, it's like, when does this matter and for who at what time?

So I noticed a lot of people in their automation are looking at the automation testing saying,

when I turn it on, it looks good for five seconds after it was turned on and then I just walk away

from it. People aren't really thinking about the day two, the day three, the day four of things.

Once upon a time, there was a service called the Internet Weather Forecast or a certain of those if

I don't know if it's still around anymore. We used to watch it all the time because it would give us

since we operated a North American continental win and what was happening across the internet

mattered to us because it could impact latency of an operation in a transaction world where

latency mattered. We'd want to know what was going on. So once I was having a bad day in this AS,

any traffic that transcends that AS is, it's going to suck for you. You're talking about

that on some level. Yeah, when you turned it on, it was great network, meaning everything is

configured correctly. That does not mean that everything stays the same. The network is not static in

its the demand of traffic that it's carrying whether or not a circuit is up or down, whether or not

the internet, if it's used as part of your transport link and it probably is in 2026,

behaving like it did the day you turned up that service. So you're advocating for a robust set of

tasks that I'm doing constantly on the presumption that the network is constantly changing and I

must know at any given point in time whether or not I am able to deliver the service that I am

for lack of a better word contracted to deliver. Is that true? I always say so. Yeah, I mean,

like there's definitely, when you start out with this journey and you start asking these questions,

you kind of say like you were alluding to as well. You look at some basic commonalities. I

understand these things in my box. Is it looking good? Is it cooled properly in the closet or in the

data center? These things are obvious commonalities. But in the in the context of remote work and people

going all over, it's harder and harder to find what actually this happening somewhere else is

impacting my seemingly right here in my network. Some companies really care about when a transatlantic

fiber goes down, they fuel the impact as maybe another one goes down, another one goes down.

Maybe some of those are more regional or for example, maybe a company that operates entirely in

Ohio. It's more about what their carriers are experiencing and they can work with them collaboratively.

They can't do anything about it, but they can have that relationship and maybe build that knowledge

graph to say, if this sort of happened, who can I communicate with to get that better insight

into it? Yeah, but there are services like this. I mean, you're a thousand eyes. That's a great

example. Yeah, a thousand eyes will tell you those kind of things. So what does that mean? You

bring in that sort of a service to give you the data required to be able to understand that my

service at this time of day is in fact able to deliver. Yeah, absolutely. I mean, you would take

the information, the data there and then apply some human insight to it and then say,

this is my service. If thousand eyes plus these like local homegrown tests together conclude that

everything is a okay, then this service is ostensibly a okay. I will argue, I think I will

argue that we don't have enough human capacity to make those inferences that probably this is an AI

problem. Is that reasonable? AI can help us in many ways. Sometimes people are talking about maybe

displacing certain things, but I think AI is maybe something that could level set some of these

problems where we can communicate, we're talking about communication for example, communication

is higher between humans. It's way harder than it seems, but AI has been around for a long time

translating for us and in Google Translate or whatever else. We can use these things to say like

maybe we have technical debt in terms of documentation. AI can kind of get rid of those

that for us, not to depart too far from the thousand eyes portion, but like I know there's this

concept in the vibe coding and such like that for the AI where you use the essentially the AI

agent to write all your code for you, but they've kind of moved on to something different now called

Spectrum and Development where you work with the AI to basically build these sets of constitutions

and specs and task lists and it's documenting the application as it builds it and then it's

referring back to that. So this is the constitution of how this application be worked and but humans can

read these documentation and I err to my see some like that I say why can't we just write documentation

for humans on the outside because we're too busy usually. So AI takes away those mundane

portions and works with us. The same thing is true when you know I talked a lot on this episode

about mentors that really took the time to work with me and show me to learn the hard way. AI is

making that so much easier of saying that like I'm going to work with the agent to create this

knowledge graph for organization or knowledge graph for the AI to make it easier for front-level

engineers to work on my network to understand that service. Maybe even to speak better between

the application folks and us because AI maybe can understand when they say like CRDs and Kubernetes

it's going to translate that for me without me going to stack overflow and asking a question

and getting destroyed for asking it. Challenge I see with all of this speaking of

understaffed and we don't have the time is is that so it's one thing to build the service. It's

another to build and maintain an infrastructure that is capable of ongoing testing with this sort

of rigor that is required to be able to handle and integrate all the bits of data for all the

different network segments and the kind of data that's flowing in from them and be able to come

to some kind of a conclusion about that. This is, dude, this is not easy. So like not everybody should

be like like school like which is written in C and they have a hundred percent test coverage.

I think that's infeasible but if you could at least start the journey saying can I put in my mental

model what the service is or all the components and relations to the service so that when it breaks

I can use manually test these things one by one instead of sitting around for 10 hours saying I'm

really not sure where this is broken or why it's broken. If you start to say like maybe it's an

exercise in documentation to understand the service and that's still better than you know doing

nothing. The problem though if you spend so much time trying to just document and document documents

sometimes the rate of change in an organization outpaces the documentation and that's why trying to

write maybe that 50% test coverage helps you and that it becomes a documentation and you're

iteratively building on that automation as opposed to writing really long word documents

or mark them files or building and get another wiki in your internal organization.

But yeah wiki's are easy to understand and building a physiodocumentation is a building of

physiodagram is easy to understand whereas all these other more iterative and programmatic processes

are a bigger technical hurdle to overcome to be able to make use of them I think would be for

some folks is going to be part of it. Yeah definitely I mean the business skill every network you might

have a bigger team and if you have maybe more of a small network maybe it's something you can kind

of get started with today and look at frameworks like nuts and PITS and just say what can I what's the

20% that's going to give me 80% of the value right and maybe that's like this common knowledge we're

talking about earlier the obvious like this box shouldn't be that hot at this time or any time

of the day right let's not leave it outside in the middle of Arizona. Well Mark Prosser how do people

get in touch with you if they have ideas on whether or not testing validation and or even what a

network serves it what words mean Mark you know if they want to reach out to you and and have a

longer conversation how do they do them. Well probably the best way to reach out to me today we

mark at tornaug.ca and go to our website and if you want to talk about these things in person you

can come to tornaug in April 13th but I'm also in the pack to push your slack all the time and I'm

really really in the network automation form slack as many people in there know so it's a good way

to find me. Excellent thanks for easing your hand I had again we're at nanog 96 in San Francisco

speaking to each other live in an actual room across the table sharing microphones and actually

mark as his own microphone and I have my we're not sharing microphones I mean come on we all

remember COVID it was terrible we're not yeah we're being carried careful here. I have an

Ethan Banks you can follow me on LinkedIn or the pack up pushers community slack which you can

join at packupusures.net slash community and thanks for listening to heavy networking this week

like share and subscribe better yet tell your friends your colleagues your peers and your

parents they should listen to this show and how do they do that we'll have them search for pack

up pushers on YouTube Spotify Apple podcasts or anywhere that they listen to podcasts to find

our entire lineup over it doesn't shows I think 13 that's over it doesn't right and we are

sharing thoughtful technology education for hands-on professionals by super nerds deep in the weeds

IT engineered instructors practitioners and industry analysts who share their knowledge to make you

better at your job thanks for everything you do out there we do appreciate you and hey don't

rack that router by yourself it costs a lot of money it's really heavy and you're going to cut

yourself in a cage nut so get someone to help you you don't need to do it alone and until next week

just remember do much networking would never be enough

HN823: Defining A Modern Network Service

About this Episode

Hosts & Guests

More from The Everything Feed - All Packet Pushers Pods

IPB198: IPv6 Privacy and Temporary Addresses

N4N053: Well Actually 03 – Multicast, Routing Protocols, RFC 1918

D2DO300: Open Source Malware!

PP105: Cybercrime Has Gone Industrial: Insights from HPE Threat Labs (Sponsored)