Loading...
Loading...

Today's episode is sponsored by METER, delivering a complete network as a service offering, wired,
wireless, and cellular in a unified solution. Find out more at METER.com slash Happy Networking.
Welcome to Heavy Networking, the voicing your head is mine and I am Ethan Banks, sharing deep
packet thoughts with you since 2010. My usual co-host Drew Connry Murray is not here today,
but he will be back very soon indeed. You can follow us on LinkedIn to help build our fragile
egos. Now, Drew is not here because we are in San Francisco for a Nanog 96 here in early
2026 and I got a chance to meet up with some people in person and I'll introduce our guest today
in just a moment, but did you know that you can watch this show on YouTube now? You can and Spotify.
Yeah, if you would like to watch our talking heads, instead of just listening to us, you can.
And I apologize in advance, I do wish we were better looking. Joining me today is Mark Prasser.
Mark's self-describes is a network operator advocate and a network automation dreamer employed
by the nice people at Nokia these days, although Mark's not here to talk Nokia stuff specifically.
Instead, Mark and I are embarking on a thought exercise about network services.
What is a network service exactly? How do we define it? And if we can define it,
when considered in the context of a larger network, does that change what a network service is?
If Mark and I come to some sort of a framework, what's that mean for a network operations
and network automation practice? Hey, Mark, you've been on the Pack of Buses Network
I've come up all the time, so welcome back and for the folks that don't know you,
tell them who you are and what you're doing and about Tornog. You're on your shirt there.
Very happy to talk about Tornog. Tornog, I'm one of the founders of Tornog and it's a Tornog
network operators group and I'm really an advocate, not just for operators, but the Canada needs
in fact, coast to coast to coast. I'm surprised you don't already have dogs. There's only four
people in Canada total, right? I know all four of them and so we can, you know, one in Vancouver,
one in Nunavut and one in East Coast, like up north of Boston and then here in Toronto at the
end of the universe, right? So all four of us are going to get together and make it happen.
Yeah. Tornog is actually a big tech city and so I imagine what I've heard about Tornog and
you started at what about it? We're almost a year to the day. Yeah. Yeah. We're at the point.
Six right now. Yeah. So and but it's taken off is basically what I'm here and you've had really good
attendance and good sponsor support and so on. Yeah. Absolutely. So like when I when I put it out there
and that was the real difference, I just made the call saying, hey, everybody, I'm going to do this.
And what I expect at the beginning is I would have operators come maybe five people, you know,
four of the five coming in from from all of Canada to come to my apartment and we have two
piece boxes and we say, yeah, let's go ahead and do this. Instead, unfortunately, we broke the
fire code in my building very quickly. We had to find a new venue. So really, and a lot of people
came to me and said, like, I was looking to do this as well. And somebody inspired me. I think
was Vincent Lindrow from Shionog. He said, the hardest part is sending that invite up a call of
action for a way to come. So now we've had two meetups under our belt and we're moving to our
first full day event, April 13th, 2026. We have quite a bit of capacity for anyone who wants to come
from far near and far. And we're going to be talking about regional dogs. Okay. So if you're doing a
full day event, something has changed event because now that means you've got enough interest
enough momentum and enough content and enough commitment from people who think they're willing to
take a day out of work to spend the day at Torna. Absolutely. People are saying that like they wish
they had more commitment from us to make that platform available to them. Because doing it meet
up in the evening, you know, there's a lot of me things in life. But if you can a lot a day,
an actual work day come in and talk about the content or get that hallway check out. People
are really interested in that as well as when we came here to Nanog and now I'm going to say six
of my colleague James Henderson and I, we've been realizing we're kind of the bell of the ball here.
And people were saying we always wanted this to happen in Canada. We're just waiting for someone to
do it. Now, to be said, we're not the only dog in Canada. It's also Montreal, not Claire. Okay.
Yeah. Yeah. Okay. Well, well done. Good for you. I'm glad you got a bunch of people in Toronto
that are interested in that and then a bunch of people to draw from. So I am director of the New Hampshire
Nugger director. I don't know what my title is anyway organizer. I think that's the word.
And the struggle is that's more rural community. It's not a big city. There's not
hundreds of thousands of people that potentially draw from and lots of businesses and so on.
And so it's been a struggle to get that going and to get the communications out there. There's
lots of network operators and lots of interesting networks even in a rural community. But just to
find the people and bring them together and all that has been a bit of a challenge.
Yeah. Yeah. I mean, like Toronto is like pretty much like the biggest city in Canada. We have
we call it so we're a big city, but we are small town interiors of our community. We all kind of
know each other. We work together in many times. But I'm actually from really close to New Hampshire
in New Brunswick as a province is about rainbow of Maine. So the idea, the theme of poor
dog is about like, you know, considerations of regional dogs, like things about pulling fiber in
places that don't look exactly like San Francisco in the Bay area. So I think like,
if somebody wanted to do New Brunswick dog or Nova Scotia close to Coast Coast or up in the
Arctic, we're talking about like some of the concerns with, you know, the current headlines.
If you are looking at the Arctic right now, so I'm willing to be someone's major dome off. They
want to reach out to me and get some support on that. And but we do have people coming from
US as well, Cleveland. So if you never know, if somebody didn't do Brunswick dog, maybe even
the New Hampshire people can come up a little bit. All right, Mark. So let's get into our
discussion here about about what a network service is. So you brought this topic up to me.
You slapped me and said, Hey, what's record at Nanonc about this? So what's on your mind?
Well, I noticed that it's something that we kind of once upon a time, everyone just kind of
assumed that they knew what a service was. Like, you know, you had maybe a Cisco certification.
You say, this is how I could figure your SDP. This is how I get a service. Like, you know,
and how do you like connect a VLAN from this port to that port? Or you get my BGP hearing,
I give someone internet access. Things were kind of very well understood 10, 15 years ago.
Well, from an engineering perspective, sure, in that we know technically the thing we need
to provision to deliver whatever the result is. Oh, they're looking for VPN access. I gotta do
this. I gotta set this. I gotta tweak this policy. I gotta update this access list. Boom. We're
there. And that's that would be a service in our minds. Is that what you're getting at? Yeah.
And I would find that the thing that has really changed over the past 15 years is now, for example,
in the network automation form, we're talking about delivering services everywhere, all at once.
And people are talking about different things as to what a service is. For some people,
they're saying, I'm just pushing configuration to an endpoint of the network. And I consider
that deployed. But in my perspective, a real service is end-to-end. It's many layers of
abstraction. And if you build a road, the cars have a lot of congestion. Is the service really
servicing the applications or the business outcomes? That's what I'm seeing. People aren't really
thinking about that. They deploy it. You know, wash their hands of it and walk away. So I'm thinking
more, we need to come back to some of those fundamentals and think was the impact of the service
at any time of the day, the many layers of abstraction. And as optical networks coherent optics,
get more complex. You know, what a service is with now firewalls and more complex is really
changing. And you go to different parallels or different verticals within our industry and
how we define services is very different. How hyperscaler coming to the conference, talking about
the way they do things is very different than an edge DC out in rural Canada.
So we need to define the service because the new and brought up an interesting context here,
so we're talking about the context of automation. Okay. So let's take a step back and go back
15 years again. If we're provisioning a service, there would be a lot of little services being
provisioned online. I don't want to call them microservices. That's going to complicate things.
But many services, perhaps, or maybe a better analogy is legal bricks that make up a larger
a bit of infrastructure. There's this provisioning of a VLAN and there's tweaking, I don't know,
spanning tree. And then there's maybe a tunnel that's got to be stood up in some encryption.
All of those things are little mini services along the way that have to get stood up to deliver
a bigger service. Are you making distinctions between that? Do you only care about the begin to end
services first, this conversation goes? No, I think that even for small organizations,
what the game has changed a lot, I think the biggest difference between way back then is we kind
of took like in the data center, we kind of said that we have our border, we have distribution,
we have our access, we set the core fundamentals, and then we're just adding compartments on,
we're adding a customer on, we're taking a customer off, maybe we were like cycling one rack
versus another rack. But now when you're talking about an organization like maybe a retail organization,
they have SDUAN to consider, they have like you know, layer seven firewalls for MGE, FWs,
like servers can really change based on the requirements. And now Mac's exit thing, we were,
I noticed impact busher is for quite a while, we're talking about overlays, and now we've achieved
them. Like they're almost taking brand. Yeah, I mean, we start with like cloud connects and all
these things that are happening. And how something gets from a cloud resource to you know, a private cloud
in organization to the branch is maybe not really understood because the documentation that we had
couldn't keep up. And when you bring up overlays, part of the implication there is that overlay could
be running across, oh, at the top, across infrastructure, we don't own. So how do we define services
in that context? Because we can't guarantee the service delivery of what's going on, they,
that over the top service that we're providing. Yeah, I mean, you think of things like
NXX, which might run over your own EVPN, which runs over someone else's EVPN kind of scenario.
So it could be many layers of overlays and that they, all these layers can be abstractions
at leak upwards. And so we don't think about those elements. What is, what, what could happen
until it does happen? So when we talk about automation, how do we validate these many layers, right?
So many great tools that are coming out, but I find we're just starting to come as elements,
we look at the operation side of automation. Okay, so what do you think a network service
should be? You've brought up things you're changing. And maybe we're not taking,
taking them into account properly. So can you, can you give me an example of what you're thinking
of as a network service has got more complexity to it now that we should be considering as we automate
it? Yeah, I mean, like I think you shouldn't really look at your, your whole network is just one
hole like boil the ocean. This is all the vendors and speeds and fees and protocols I have.
I think you really should have to do is look at a service and kind of walk the assembly line of
that service, see all the things as to how does this portion of the communication get to its
end zone? And is it point to point? Is it point to multi point? You kind of have to understand
that because sometimes in as they're kind of alluding to in the compute, it could go across multiple
teams and do they really collaborate? So defining a service to saying like, if I could create that
high level diagram and that low level diagram, what is really the technical debt that's within
this particular service as I life cycle it, rather than my data center fabric, which maybe has
many services on top of it. But I, so as a network engineer, I would know all of that instinctively
in my head, there's a switching layer, there's a routing layer, it may or may not go through a
middle box like a firewall or a load balancer. And I could tell you every step along the way of what
was happening to that packet of data is a left decline got to the server and came back. I might
have even known going up the stack, what was happening once it hit the server and then what happened
to it before it, the response came and it found its way back. How granular are you talking about
wanting to get? Well, I mean, it's either a collective knowledge among multiple teams or if
depending on the size of your nation, it'd be the one team. But what I'm finding when I go around
and talk to operators, they really are finally grained on their particular CLI commands, their
particular single pane of glass, which usually remains many panes of glass. And they're kind of
worried about their one segment on network. There are many architects that say I'm aware of the
entire shell, the entire warehouse, the really the materials moving from end to end. But not everyone
is quite like that. So for example, like when you say a service, if there was a problem and you're
looking for me, poor errors at the edge of the network and there are no poor errors, but there's
definitely errors occurring and does everybody know that or does everybody know who to contact?
You're arguing network segments are that siloed? I think so in the modern era, definitely they are.
I mean, like as people also outsource a lot more of those things to hyperscalers to carriers,
right? People are more connecting things up and outsourcing that portion. But I can't, I mean,
I don't know on that network. So I mean, I'm not going to be able to tell you about poor errors
that's happening in and as your cloud that I'm housing a service in. And I know you're not arguing
for that. So are you saying I'm using, I don't know, a thousand eyes or someone like that who can
make inferences about the infrastructure I don't own? Yeah, there's a many like observability
options out there that could help with those portions, as long as you consider that part of the
end end of the service, there's many ways to to skin that cat. Well, again, you're just to take
the vagueness away. You're arguing we should care about those things. Absolutely. Yeah. Like the
the people that the carriers and the hyperscalers won't really have those those granular open APIs
available to us, or will they reveal those secrets on this particular port was really you've
converged this time unless you maybe have enough power to ask for that. But there are ways to say
that if I couldn't transmit this packet across that service at this time, then yes, this service
was impacted. And I feel like when things were a lot simpler, we had less outsourced, you know,
transit pass. It was easier to determine that. But now it's getting harder, right? The service
really is is a collection of of sub services that are greater than some of their parts, right?
And in the case of the VPN, it's can someone connect in in the case of, you know, connecting all
my data centers together. It's the optical. It's maybe the IP DCI portions. It's maybe like the
people that are giving me this data center fabric in a collocation. So these things all come
together and we have to be accountable for them, even though the rate of change is faster than ever
in the AI era. In the AI era, yeah, you had to say it. Our sponsor meter is a network as a service
company. And that means a meter network is a tightly integrated stack of network hardware and
operating system software delivering wired wireless and cellular service. Now meter delivers the
gear to your premises as a service, right? And what that means is the tedious parts of network
operations are handled by meter for you. So you don't have to do with NOS upgrades. When gear
hits end of life, they're going to send you shiny new gear, all of that, and they monitor your
network for you remotely. Meter can handle sites as small as a branch office, as large as a
campus or data center. Meter doesn't just ship gear to you though. They handle other aspects of
bringing a site online. They can deal with internet circuit procurement and site surveys and cabling
all of that stuff in more is part of what meter offers. The network stack that they're delivering
gives you a full, full function, all the things, security, routing, firewalling, switching,
wireless, cellular, power, man, that is intelligent power, DNS security, VPN, SD-WAN, and even multi-site
workflows. In fall 2025, I did video in the packet pushes YouTube channel where I work with a meter
stack to show you some of those features. And I'm going to do an updated video in 2026.
I was at their headquarters recently, and I got a sneak peek at their newest hardware and a preview
of the new software features. There's a lot of stuff coming. These guys are iterating fast. And
I would put this, there's a vibe. It's been a while since I visited a startup that got,
I had that genuine thrill of enthusiasm from everybody that I was talking to there, but I got
that at meter 100%. The meter team is excited about what they are building. So thanks to
the meter for sponsoring and go to meter.com slash heavy networking to book a demo that's
n-e-t-e-r.com slash heavy networking to book a demo now.
To bring those services all together, that means there's a bunch of problems here. The least
of which is technical. I think that there's organizational problems here. That is, if I'm
very focused on the edge network that does the thing or one specific data center that
when it hits the fiber that leaves the data center out on care anymore, you have to cross
organizational boundaries to be able to bring all of those services and have them be managed
under a common management claim so that we can all see what the other one is doing. Typically,
if the responsibilities are siloed, probably the monitoring and management are siloed as well.
Absolutely. You agree? All right. I got absolutely agree.
Okay. So how do we get by that in your mind? That's great problem.
You're supposed to have the answer. No, no. I'm posing the questions. The organization can
ask themselves this. So there's this concept out there called socio-technical systems and you
kind of pushed us towards that by asking that golden question. Yes, you did. So the concept of
a socio-technical system essentially is this concept where there's a balance between the socio
and the humans and all of our faults and delicacies in life and in the technical parts. These
are the parts that we really love, like the nerd knobs. I could build this on top of this and
on top of this, but then one other organization or part of the organization does the other thing
and these two organizations have to communicate or collaborate. That's really the hard stuff.
And so this concept of socio-technical systems, which actually comes from 1950s,
British coal mining, is saying that we need to see that between the humans and the machines
there's a joint optimization. The human stuff is usually the tricky stuff. These pesky humans in
the loop, in the machine, along the assembly line, sometimes don't want to be so collaborative
and consider how can we work together to deliver this service for the business. The book,
the 1984 and 1986 book, The Goal, also talks about this and the Phoenix project is basically
a modern version of that, right? So this is the million dollar question is when we talk about
building like the controller, the automation platform, build up platform, which I did a talk
about at AutoCon 4. People don't realize that it's hard to determine what are all of the elements
and who has access to the database? Can I get access to the database and know how it gets from here
to here within the data center? That's really hard to do. And until you start asking those questions,
you don't know what it takes to get those people on board. It used to be a lot easier to do,
just to speak to your point. 20 years ago, it was possible as a network engineer,
generally speaking, to have the entirety of the network in your head. Even if it was a sprawling
when you kind of could do it and it doesn't feel like you can anymore. Well, I mean, like all my
mentors, they were all, they were in the engineers, but they're also kind of cisimists and all of them
kind of understood Pearl and they all work together, right? So maybe you're a DBA and
network engineer and a cisimist, but maybe you're called an earth engineer at the time. And the
the barriers between those two aren't the same as now because now you're talking to someone
talking about Kubernetes. And you're really on like the WAN routing side of network engineering.
You know, even speak the same language, right? I walked myself on a hotel for three days to learn
Kubernetes because I couldn't, I just wasn't picking it up. I had to just the vote hours and hours
and hours of time and isolation just to get a handle on it. It wasn't translating. So yeah,
your point is well taken. It's harder to pick up some of these skills. Whereas, like, you know,
I mean, somebody like his assessment back then could probably pick up like, you know,
Cisco, I have us language pretty quickly and have an idea of how they could automate it with Pearl.
Yeah. Yeah. Maybe no one can read each other's Pearl, but they still all wrote Pearl. Yeah. Yeah.
Yeah. The knowledge domains have become more complex. They've built one upon another all the legacy
stuff that we had to know. We still have to know. And now there's been all the years of iteration,
corner cases, and new technologies that have been brought to bear, deal with those corner cases
and how they're applicable. And there's a lot, there's a lot more to know. It really is a more
challenging job to be an IT than it was because of technology building upon technology.
I would have hoped we'd gotten over the siloing thing though. We just feel like we've been talking
about technology silos and how a lot of organizations are aligned upon technology silos for forever.
And the winning organizations to me have always been the ones where we're not drawing sharp lines,
like you're the security people and you're the network people and you're the this one,
you're the that. And we don't talk. You know, we it's been most successful in my experience
when those groups work together and your organization is built more along service delivery and all the
people that have a role to play in delivering that service are speaking together regularly and doing
whiteboard sessions together and figuring out how to best align their knowledge domain specialties
and the technologies that they're bringing to bear with one another. But it doesn't seem to work
that way. Yeah, I mean, sometimes you'd think it would seem simple. Actually,
let me take a step back. You touched on it. Very interesting is that like when humans, we're all
responsible for the same business outcomes if we work for a business, right? So we should be working
together and see ourselves as part of that process when it comes to like employee surveys coming out
in organizations, they say like you see yourself as part of the outcomes to business. If you're
answering no, that's probably a problem, not just in your organizational, but also in the delivery
of you delivering a service. But when you think about people working together and you say like,
oh, I love EVN. And I love EVN, too. And then we talk a little bit deeper like I'm a VX land person.
No, I'm a EVN MPLS person. Oh, suddenly the conversation breaks down. We're a completely different
sex of the same problem, but delivering it end to end from that DC to another DC, that's where that
stitching really happens. And then someone starts talking about similar routing and the TLVs, you know,
and like, sorry, maybe you lost me, we can go to Geneva in like the VX land world, but I'm not
talking about TLVs and I, so yes, right? So we thought we could collaborate, but if we try
a little bit harder, we're probably have very similar abstractions and how the X land works and how
MPLS work, we just took the time to understand each other. So even when it seems simple, it's hard.
Well, when you go between like platform engineers or site reload and site reliability engineers
or whatever they call themselves these days, to like network architects, like the gap was pretty
wide recently, but I noticed at Nanog and at the recognition forum, it's starting to come back
together somehow, some way, somehow we're going to get there. Maybe that's because of the stuff
that Limh Hendrix is doing with KubeNet. Maybe that's container lab. People are starting to get
onto the mindset where we're bringing network back to whatever the infrastructure is today.
This, some of them, I think there's also been a number of talks about you can go and talk
to someone just because they put in a help desk ticket request and they work in a different group
doesn't mean you shouldn't go talk to them. So go talk to the developer who's requested the thing and go,
what are you trying to do? How come? And then building a dialogue with them and building that
rapport translates to a better outcome because they asked for a technical thing thinking they
knew what they needed, but maybe they actually didn't know what they're asking for. There's a different
way, a better way that you can deliver that service. You're not going to figure that out if you just
give them what they asked for. Sometimes you need to just go and talk to them to deliver that
outcome. Go back to your example of the two network engineers who you think this would be a simple
conversation, but because their specialties are different than someone so doesn't have experience
with ISIS and delivering communications and metadata via TLDs. And all of a sudden the things break
down, you know, you can have those educational discussions within a network engineering team and
then across IT groups to help foster systems level thinking. Because this is another thing
of what you're talking about. We're really talking about systems level thinking across groups, domains,
organizational boundaries that deliver the business outcome that we all commonly are intending to
deliver or should be. Absolutely. I mean, like systems level thinking exactly where we need to go
and us participating and collaborating to what is this system is really the key answer there.
I mean, it shouldn't be as simple as like we both love the CLI, but you love ISIS XR and
I love Junels or SROS. These shouldn't break down. We all agree we love the CLI and we can work
together to say what is the common abstraction to understand the service as it goes across.
With the system level thing, the early, at the outset of this podcast, I did say don't think
of your network as a whole, but yes, do think of it as a system. Yeah, these are very clear
distinctions. Do you think about like all my speeds and fees rather than like everything I have is
an opportunity to deliver a service and I'm going to understand that. Yeah, going back to don't
think of your network as a whole. I don't think your argument is you shouldn't think of your
network as a whole as much as you probably can't because of segmentation and silos and so on.
Yeah, you shouldn't just simply say like I've got some of these and I've got some of these
and this is like my network and I have data center and I have you know optical. Don't think of it
that way. Think about like what are the actual services, what are the outcomes that are attached
to this mission critical network that we have. All right, I want to go back to something you said
near the top of the show we were talking about defining a service and you were getting into the
notion of testing evaluation as being a critical part of the service and helping to define the
dive into that. What do you mean by that? Well, I mean like you think about like you could say at the
outset of I'm going to start validating my network and maybe I'm going to do some automation around
it or maybe it's more manual testing that we'll do. Some people just have it in their their
tribal knowledge within NOx but in in regards to taking a look at this you say, okay, well it should
be easy to test my network because all problems are understood and all problems are bound. I heard
someone say in the recommendation form recently when you go through the actual steps of starting from
like, okay, first of all, I need to look at the light levels. Danny's look at the actual like
maybe the buffers coming into the poor. What kind of service is good? Where is this go too? You
actually start realizing there's thousands and thousands and thousands of go as you go along the
service and just taking the moment to look at that and build a graph as to how we're going to
develop this service is way harder than it seems. But the sooner you get started the easier it's
going to be because in particularly a modern network service with overlays and components of
the network that are not yours and there's a lot of layers to it and then the dependency tree to
deliver the service becomes a lot more complex than say a very simple ethernet delivery that's got
one VLAN a routed port and another VLAN and even that's got a lot of steps to it honestly. Absolutely
it does. Sometimes we think about it just as like the packet flow, like walking that packet
from end to end, but sometimes also about compliance. It's security domains or micro segmentation.
These things come in as well and how can we confirm that at this exact moment we are compliant
with what the service is supposed to do? Are you positing a question or do you have thoughts on
how to answer that question? I've been working on it for a little while with some of the tools
and also the recognition for them. I find that there's some amazing frameworks that make it easier for
us like PiATS, Nuts is another one, and ETS, you hadn't created an episode on that. Yeah,
yeah. So Erzen and Marco talked with us about Nuts at length, yeah. But I don't think that there's
a one-size-fits-all. I think it's really each individual organization needs to look at this
and say like if we broke down all these layers under the service, you kind of had to get started
and try to use a tool like Nuts and PiATS can help you pull things out from that. Well,
PiATS also has testing elements as well. I mean, John Cabo Bianco wrote a whole book on it
with his colleague Daniel Wade, I think it was, yeah. The challenge of testing the part of the service
delivery, oh man, this is tough because you don't know what tests you need until your service,
all until you passed all your tests, but the service tool isn't working, and then you go back
getting a test. So there is a cycle that you have to go through, so you're asking the right questions
in your tests to understand the service truly is being delivered. I mean, but we could take a step
back and say you're arguing that you can't call a service a service until it has passed some
number of validation tests. You can't say that the service is working as it intended until it's
passed those tests, right? Well, do I have to have the application running across it? I mean,
that's definitely, that's probably the best way. If you start the high level and say if the service
is going into N, that's great, but how can I make sure that if something went wrong, do I have
the idea as to where it's going wrong? See, I, okay, I'm not, I'm not sure how I feel about this.
So I can, I can pave the road, and if I paved the road, job done, do I have to have a car go
across the road to prove that the road is truly there and is able to carry an automobile?
I'm not sure, you know, what I think about that. In theory, I can test and validate the road would
carry the application if it bothered to show up. So do I have to actually get right down to the
application itself is running across it? I'm thinking out loud here. I don't actually know how I
feel one way or the other, but do I have to have the app running across the network service that
I provisioned to validate that the service is indeed running? There's probably some like
some elements of you need to be reasonable what you can actually do, right? With sometimes some
applications only really work best in production. In the case of a voice call, right? Somebody has to
pick up the phone and make that call, right? So that's a hard one. You can definitely simulate
voice traffic, but it's not the same as this is the time of day when they went all the calls.
The call was made and all the teams. Yeah. Yeah. She talked about like egress pure engineering and
some like, you know, like basically traffic engineering elements saying, I'm going to make sure that
if this happens going to Microsoft at this, in an exchange point, I'm ready for that to happen.
But when the, you know, the all hands happens in your organization, they have like a
global company. That's when you really feel it. So yes, maybe it's hard to preemptively test
that one scenario, but doing something is definitely better than nothing. Yeah. There's a bunch of
challenges here because when, when you begin putting application traffic across the link as I'm
again, going through my own thought exercise live on a podcast, one of the things that happens is
unless your tests are very thoughtful, you are not going to correctly simulate application
conditions. And so you will run into things like, Oh, in testing, we never hit MTU. Oh,
turns out as soon as we did something breaks, you know, when we send really large packets through,
you know, for example, or QOS, you know, we never had a congested situation where our QOS policy was
was tested. Didn't think about it. Oops. And then that came comes to light when the application
goes across. So yeah, okay. But, but there is thinking that my testing must be quite robust
in mature to prove that this network service has indeed been, been delivered. Now, in the context
of network automation, you're, you would advocate, I assume that when I provision that service that
presumably is, is largely automated. Let's assume, assume I'm at that level of maturity with my
network operations, I would be testing as part of that service provisioning within reason.
Yeah, definitely. I think it's, it's a good thing. Of course, it's not all at once. You don't boil
the ocean. It's an iterative process. Maybe you realize you miss something about how can I
reasonably test MTU when the service is turned up? You add that on later iteratively. And I remember
you did a podcast episode about S&P versus Genimiz. It's, it's, it's, that's a whole like
in computer itself. It's a, it's a tough topic to have that type of observability in the modern
era of platforms. So it is really a hard process. But until we get started, we don't really get
anywhere with this. Yeah, it's funny. It is. Yeah. That's, that's also true. It's not merely running
some sort of a test. It's being able to measure the results of the test without you staring at
a packet capture, for example, but being able to get a structured data sort of answer back
into a system that can validate that the test did or did not pass. Huh, I'm considered that
because you know, back in the day, the way you would run tests is just like, did it work? No,
I wonder what happened. And you start digging around and, you know, looking at different
counters and whatever and maybe looking at a packet capture and figure it out, which is not,
how, which is not a good way to do automated testing. Yeah. Yeah. And you made an interesting
point. We were talking about like, we're looking at costs and, and to you, I remember there was
this one talk. And I feel like it was at Nanau once upon a time, there was an Amazon engineer
talking about how they had this problem that kept plaguing them once a week or once a month,
every once in a while. And every time they saw the issue, they're like, well, of course,
like a job in package because it's elephant flows. They kept coming back. So they tried to,
you know, test and consider how elephant flows within eventually came to the point where this one
Amazon engineer said or rough. Sorry, maybe he was even mad at that, that said this. He says,
if you think it's elephant flows, get out of the room. I need something with a different thought
pattern. And so when he chased down the application developers, he found out it was something that
the application was doing itself that was really ruining the network because they were trying to
get around in that work in a way, but that was breaking it, right? So, so I don't exactly know what
the root cause was. And if we could maybe link that issue as we could find it. But it's just
fascinating. Sometimes the paradigms we have as an earth engineers actually getting her own way
because we look at it as one simple set of tools. Maybe there's other problems out there,
like I, if you think about like in the wireless space, like the Fresno zone and maybe
tides come up into that zone, it causes like degradation of the radio signal. We don't think
about that as data center engineers. But that's a different paradigm of people like in the radio
space know about the first time I ran into application level acknowledgments was troubleshooting
a problem that turned out to be a packet loss in an ether channel. There was four pairs of
fibers that were bundled together to to give us some bandwidth between a couple of course,
which is any traffic that happened across to one of the optics that was in that bundle would
experience some packet loss. But the symptom was nothing like that. And the symptom was,
well occasionally a transaction will come through, but not get acknowledged into the transaction.
Runs again and then it hangs up in the queue and if just everything falls down or we got to
re-cute this is terrible. What the heck is going on? And it's like, I don't know, you got an
application problem, buddy. You tell me. Yeah. You know, it was what it felt like. Well, and then come
to find out talking other developers. No, we do application level acknowledgments. We do not
rely on TCP for delivery. When a transaction was received by the remote side and acknowledgement
is sent back. And if the acknowledgement is not sent back, it's the way the thing structured
gets hung up or will whatever all the details were I don't even remember anymore.
Long story short, we will detract that down to this is a packet loss problem. Sometimes the
acts are lost. And then eventually we were able to determine, you know, bad optic in an ECMP pair
that was a very gray failure. You just, it wasn't consistent at all. It drove us nuts
to we got there. How do you write a test to solve for that problem? I don't know. You don't.
Right. I think the idea of like testing as much as you can is to leave your team from the
mundane stuff the day to day. It's like a really show where they have the value to look for those
corner cases and collaborate, have time to collaborate with the other teams and look for those unicorn
problems. And service validation is a different animal than troubleshooting to be to be fair.
Absolutely. Yeah. Sent me down this, you know, down this way of thinking.
So it's okay. So we've been chatting for about a half an hour now a little longer. How do you
define a service then? Do you have an example in mind? I mean, well, it's exactly that. It's like
once upon a time, I mean, you could probably say that like if you did a set amount of BGB commands
or show commands and you maybe put them in a script and ran them all and it looked okay in the end,
the service is great. But today, I think a service is really, the vague way of saying it's great
listening with part, but the real answer is that every component or every portion that this
application goes through needs to be considered validated and documented in the humans that run
this business need to understand it at some level. That's the way I would describe it. Let's
walk through a scenario then. Let's walk it through and and tell me how you think about these
things. So I've got a data center and I am needing to stand up a service that lives in AWS.
Is this a plausible scenario? Is this work? Yeah. Okay. Okay. So let's walk through it.
I'm beginning the transaction happening in my data center. Walk through how you would think
about this process. Yeah. I mean, think about it. It's like maybe originates at the
nick of your computer and it goes into perhaps your local brand trotter if there's not a switch in
between it from the router. It could be many things that are getting it across. It is an SD-WAN
connection. Yeah. Right. And then the SD-WAN portion is essentially a modern complex SD-WAN
or IPsect tunnel that goes over that. There's all the implications there as that packet goes over
which care. So just so just so just so far. So now I've got to do some kind of validation to get me
from do I run a test in your mind? Do I run something at the host level? Or am I going to be in a
layer to whatever the first network devices that the packet touches within our domain? We're
probably starting at the network devices. We're going to look there saying I know that based on this
moment and this test I can do at this time, it's probably coming into this interface. Some tightly
integrated shops will include the host. Yeah. But by and large, that's probably not realistic for
most of us. That's too separate of a domain. Yeah. You're probably in like high frequency
trading. You're looking at everything all the time and it really matters those every single microseconder
nanosecond. But in this scenario, let's just assume that we the network team are doing this, right?
What are we testing to get to get for those to get to that network edge so far where I've gotten
to the SD LAN router? I already translated at least a switch and then the router if we keep
really really simple, it was probably more. But let's just say what would you be testing at that level?
Well, I'm like, yeah, assuming that maybe there's no like campus scenario and all the implications
there. If it's just like like making a simple puzzle, there's the host, the switch, the router.
And then at the router, the implication of SD LAN or trans-appears, HTTP, it could be like which
actual link is it going over to get to the quote unquote internet or my carrier or maybe my own way?
Oh, this is interesting because in this scenario of SD win, that can change.
It could change on a dime, right? Based on the scenario, what type of traffic it is. So we're
going to say what traffic is going across and what link would that take? And how did that link,
well, how can we validate that link is good at that time? What does the path look like from here?
So wait, so wait. There's testing of a service that we provisioned and we need to
demonstrate that the service is viable. We have successfully provisioned the service
once the service is provisioned at the front end of its lifecycle. But now that the service is alive,
you're also talking about ongoing validation. Yeah, it never ends. I mean, yeah, unfortunately,
I mean, like there's there's no sleep for the wicked as network engineers like the network is
24 seven. But this is not just red light green light. You're talking about ongoing testing of the
service for its entire life cycle. Yeah, absolutely. I would say so. I mean, if that's that's
the business in terms of what does that give you in terms of value? But if you wanted to know a lot
of test data to store somewhere, if it's an office in nobody's there at 8 p.m. What are you testing?
Right. But when people are there, it definitely matters. But if perhaps let's say if you
back in the day, we used to have like little closets in our offices and they'd be running maybe
backup jobs over that one, that definitely matters at 8 p.m. and onwards. Right. So if that matters,
your business outcome, you're reframing. I think how a lot of us have thought about testing in that
as a network engineer, I would be testing. I would basically, I'd be testing network transport.
I wouldn't care what the application is going across it. I am delivering a connectivity fabric
just to speak about things in a generic way. Don't read into it. He don't have fabric or anything
too specific about that. I'm delivering this connectivity across this network and I'm validating
that. All my nodes are up in my circuits are up and maybe I'm I'm certainly monitoring bandwidth
utilization and routing convergence events and other sorts of things. But it is abstracted
from the services. It is not married to the service. If I was doing any sort of service level
monitoring, was probably application performance monitoring with synthetic transactions,
the validating, I can log in. And if I request this page, I get this page back and not a 500 error
and so on. Not interconnected tests. You're talking about something that's a substantial step
beyond that. That's quite a bit more intense. Well, it depends on the scenario. You didn't
brought up an interesting point. We think about this. I mean, for a long time, when we think about
networks, such as we get start to understand them, we are very non-discrimination on terms of
the packets that go across our network or the frames that go across our network. We don't really
care about it so much. But when you start talking about business outcomes and you start saying,
what is this service versus that service, then we start coming back to the things you were talking
about on the nuts episode. We're talking about like, there's integration testing, there's end
and testing and there's synthetic testing, right? The synthetic testing is a lot more closer to
the application, the end to end portion of it, right? The integration testing is more about when
this thing comes online network, does it converge in the way that I expected, right? So it's,
when you start to ask those questions, it's like, when does this matter and for who at what time?
So I noticed a lot of people in their automation are looking at the automation testing saying,
when I turn it on, it looks good for five seconds after it was turned on and then I just walk away
from it. People aren't really thinking about the day two, the day three, the day four of things.
Once upon a time, there was a service called the Internet Weather Forecast or a certain of those if
I don't know if it's still around anymore. We used to watch it all the time because it would give us
since we operated a North American continental win and what was happening across the internet
mattered to us because it could impact latency of an operation in a transaction world where
latency mattered. We'd want to know what was going on. So once I was having a bad day in this AS,
any traffic that transcends that AS is, it's going to suck for you. You're talking about
that on some level. Yeah, when you turned it on, it was great network, meaning everything is
configured correctly. That does not mean that everything stays the same. The network is not static in
its the demand of traffic that it's carrying whether or not a circuit is up or down, whether or not
the internet, if it's used as part of your transport link and it probably is in 2026,
behaving like it did the day you turned up that service. So you're advocating for a robust set of
tasks that I'm doing constantly on the presumption that the network is constantly changing and I
must know at any given point in time whether or not I am able to deliver the service that I am
for lack of a better word contracted to deliver. Is that true? I always say so. Yeah, I mean,
like there's definitely, when you start out with this journey and you start asking these questions,
you kind of say like you were alluding to as well. You look at some basic commonalities. I
understand these things in my box. Is it looking good? Is it cooled properly in the closet or in the
data center? These things are obvious commonalities. But in the in the context of remote work and people
going all over, it's harder and harder to find what actually this happening somewhere else is
impacting my seemingly right here in my network. Some companies really care about when a transatlantic
fiber goes down, they fuel the impact as maybe another one goes down, another one goes down.
Maybe some of those are more regional or for example, maybe a company that operates entirely in
Ohio. It's more about what their carriers are experiencing and they can work with them collaboratively.
They can't do anything about it, but they can have that relationship and maybe build that knowledge
graph to say, if this sort of happened, who can I communicate with to get that better insight
into it? Yeah, but there are services like this. I mean, you're a thousand eyes. That's a great
example. Yeah, a thousand eyes will tell you those kind of things. So what does that mean? You
bring in that sort of a service to give you the data required to be able to understand that my
service at this time of day is in fact able to deliver. Yeah, absolutely. I mean, you would take
the information, the data there and then apply some human insight to it and then say,
this is my service. If thousand eyes plus these like local homegrown tests together conclude that
everything is a okay, then this service is ostensibly a okay. I will argue, I think I will
argue that we don't have enough human capacity to make those inferences that probably this is an AI
problem. Is that reasonable? AI can help us in many ways. Sometimes people are talking about maybe
displacing certain things, but I think AI is maybe something that could level set some of these
problems where we can communicate, we're talking about communication for example, communication
is higher between humans. It's way harder than it seems, but AI has been around for a long time
translating for us and in Google Translate or whatever else. We can use these things to say like
maybe we have technical debt in terms of documentation. AI can kind of get rid of those
that for us, not to depart too far from the thousand eyes portion, but like I know there's this
concept in the vibe coding and such like that for the AI where you use the essentially the AI
agent to write all your code for you, but they've kind of moved on to something different now called
Spectrum and Development where you work with the AI to basically build these sets of constitutions
and specs and task lists and it's documenting the application as it builds it and then it's
referring back to that. So this is the constitution of how this application be worked and but humans can
read these documentation and I err to my see some like that I say why can't we just write documentation
for humans on the outside because we're too busy usually. So AI takes away those mundane
portions and works with us. The same thing is true when you know I talked a lot on this episode
about mentors that really took the time to work with me and show me to learn the hard way. AI is
making that so much easier of saying that like I'm going to work with the agent to create this
knowledge graph for organization or knowledge graph for the AI to make it easier for front-level
engineers to work on my network to understand that service. Maybe even to speak better between
the application folks and us because AI maybe can understand when they say like CRDs and Kubernetes
it's going to translate that for me without me going to stack overflow and asking a question
and getting destroyed for asking it. Challenge I see with all of this speaking of
understaffed and we don't have the time is is that so it's one thing to build the service. It's
another to build and maintain an infrastructure that is capable of ongoing testing with this sort
of rigor that is required to be able to handle and integrate all the bits of data for all the
different network segments and the kind of data that's flowing in from them and be able to come
to some kind of a conclusion about that. This is, dude, this is not easy. So like not everybody should
be like like school like which is written in C and they have a hundred percent test coverage.
I think that's infeasible but if you could at least start the journey saying can I put in my mental
model what the service is or all the components and relations to the service so that when it breaks
I can use manually test these things one by one instead of sitting around for 10 hours saying I'm
really not sure where this is broken or why it's broken. If you start to say like maybe it's an
exercise in documentation to understand the service and that's still better than you know doing
nothing. The problem though if you spend so much time trying to just document and document documents
sometimes the rate of change in an organization outpaces the documentation and that's why trying to
write maybe that 50% test coverage helps you and that it becomes a documentation and you're
iteratively building on that automation as opposed to writing really long word documents
or mark them files or building and get another wiki in your internal organization.
But yeah wiki's are easy to understand and building a physiodocumentation is a building of
physiodagram is easy to understand whereas all these other more iterative and programmatic processes
are a bigger technical hurdle to overcome to be able to make use of them I think would be for
some folks is going to be part of it. Yeah definitely I mean the business skill every network you might
have a bigger team and if you have maybe more of a small network maybe it's something you can kind
of get started with today and look at frameworks like nuts and PITS and just say what can I what's the
20% that's going to give me 80% of the value right and maybe that's like this common knowledge we're
talking about earlier the obvious like this box shouldn't be that hot at this time or any time
of the day right let's not leave it outside in the middle of Arizona. Well Mark Prosser how do people
get in touch with you if they have ideas on whether or not testing validation and or even what a
network serves it what words mean Mark you know if they want to reach out to you and and have a
longer conversation how do they do them. Well probably the best way to reach out to me today we
mark at tornaug.ca and go to our website and if you want to talk about these things in person you
can come to tornaug in April 13th but I'm also in the pack to push your slack all the time and I'm
really really in the network automation form slack as many people in there know so it's a good way
to find me. Excellent thanks for easing your hand I had again we're at nanog 96 in San Francisco
speaking to each other live in an actual room across the table sharing microphones and actually
mark as his own microphone and I have my we're not sharing microphones I mean come on we all
remember COVID it was terrible we're not yeah we're being carried careful here. I have an
Ethan Banks you can follow me on LinkedIn or the pack up pushers community slack which you can
join at packupusures.net slash community and thanks for listening to heavy networking this week
like share and subscribe better yet tell your friends your colleagues your peers and your
parents they should listen to this show and how do they do that we'll have them search for pack
up pushers on YouTube Spotify Apple podcasts or anywhere that they listen to podcasts to find
our entire lineup over it doesn't shows I think 13 that's over it doesn't right and we are
sharing thoughtful technology education for hands-on professionals by super nerds deep in the weeds
IT engineered instructors practitioners and industry analysts who share their knowledge to make you
better at your job thanks for everything you do out there we do appreciate you and hey don't
rack that router by yourself it costs a lot of money it's really heavy and you're going to cut
yourself in a cage nut so get someone to help you you don't need to do it alone and until next week
just remember do much networking would never be enough

The Everything Feed - All Packet Pushers Pods

The Everything Feed - All Packet Pushers Pods

The Everything Feed - All Packet Pushers Pods