EPISODE 1922

[INTRODUCTION]

[0:00:00] ANNOUNCER: Observability emerged from the need to understand complex software systems and involves tracking metrics, logs, and traces, so engineers can detect and diagnose problems before they affect users. However, modern applications often encompass hundreds of services, containers, and dependencies, generating more observability data than dashboards and alerts alone can effectively surface. New Relic is a leading observability platform with a history that spans the full arc of modern software operations. Today, they are working to apply AI to move observability beyond passive monitoring toward active intelligence, where systems can surface what matters, reduce alert noise, and ultimately take autonomous action before problems reach engineers or users.

Nic Benders is the Chief Technology Strategist at New Relic, where he has worked for 16 years. In this episode, Nic joins Lee Atchison to discuss the evolution of observability from dashboards and alerts to AI-driven intelligence, how LLMs and statistical tools work together to surface meaningful signals from massive data sets, the emerging challenge of observing AI systems themselves, and what the rise of AI means for the future of software engineering as a profession.

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale, is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com and see all his content at leeatchison.com.

[INTERVIEW]

[0:02:09] LA: AI and observability, how exactly do they work together? My guest today is Nic Benders. Nic is the Chief Technology Strategist for New Relic, one of the major observability platforms that is now focused on AI. Nic is also a personal friend of mine. So, Nic, welcome to Software Engineering Daily.

[0:02:28] NB: Thanks, Lee. It's great to be here.

[0:02:30] LA: Now, you and I go back a long time from New Relic days, early in the New Relic days. As a matter of fact, we've just spent a little bit of time talking about some of those earlier days, but can you catch listeners up on what you've been doing since those early days in New Relic and what your role as Chief Technology Strategist really involves?

[0:02:48] NB: Absolutely. When I think about New Relic's journey, and really, in some ways, the industry's journey, back when we started at New Relic, we were very much solidly in this instrumentation era. The thing that we sat down at our desks and tried to figure out every day was, how can we instrument more of the systems that matter to people? Oh, well, we started with Ruby. How do we get Java? How do we get .NET? How do we get Python? How do we get into the browser, or onto mobile apps, or add a new library?

Pretty soon, we were instrumenting so many things that there was more data than we could deal with. It moved from this instrumentation era into this data platform era. For New Relic, that shift was around 2013, 2014, when we introduced NRDB. NRDB gave people a way to ask questions of a system that you didn't know you needed to ask. All the data goes into it. Then after the fact, you're like, "Oh, where are my slow queries coming from?" Oh, well, that's mostly a test system. Exclude that test system. Where are the rest of the slow queries coming from? Can you break that out by country? This type of interactive questioning system that powers dashboards, it powers just a interactive data explorer, it powers alerts. There's all these things you can do with the data platform.

That was over 10 years ago now. What we've seen is the ability to have all the data and to put it all in a place so you can ask any question, is no longer enough, because you have so much data, you don't even know what to ask from it. Instead of just being about the ability to ask something, or to make a dashboard out of anything, you need a tool that tells you what are the questions you want to ask. It tells you what are the things you want to look at. That's that shift from the data platform era into this intelligence era.

Intelligence, everybody jumps on immediately, you're like, "Oh, it's AI." I'm like, yes, AI is a piece of intelligence, but it's also about product design. It's about the way that we use something, having those built-in opinions, those flows. Because when somebody sits down at a tool, they don't want to just see a prompt and say, "Oh, I can dashboard anything." Great. They want answers. They want to know what's important in their system. That's that intelligent shift. That's the era we're in now.

I'm going to talk about this a little bit later, but also, the intelligence era won't last forever. It may already be bringing to a close, as we need to move into an action era as an industry and for New Relic. New Relic's journey since those early days has been getting into each of these, pioneering it, figuring out what has to be done, and then asking the question of what has to be done next. As the Chief Technology Strategist, that's where I come in is I've been with the company now for 16 years, just working with every piece of the system. I've been using observability tools for 30 years now. Back from when we used to call it monitoring was just ping.

[0:06:03] LA: Right.

[0:06:05] NB: The question isn't real. Where are we as much as where are we going? That's my job.

[0:06:10] LA: Got it. Got it. That makes sense. I was going to ask you, what's the one biggest change that occurred in your mission? But it sounds like there's really two big changes. There's a change from instrumentation to data and then from data to intelligence. Are those really the major changes in New Relic that have occurred over time? I mean, in New Relic product -

[0:06:32] NB: Yeah. Well, obviously, in the - There's a million things that go on in any tech company. I think, yeah. I think that realization, originally back a decade ago, that the secret was to have a strong data platform.

[0:06:46] LA: Right.

[0:06:47] NB: And that you had to be able to ask anything of your data platform at any scale. Then, more recently, just a few years ago, that realization that we've built this industry, like I said, I've been using these tools for decades, that dashboards that we build today are fancier, they're easier to build, they monitor different things. But fundamentally, they're not that different than dashboards I built in the 90s, where you sat down, and you're like, "I don't know, I'm going to put a dashboard together. It's going to monitor here's how much free memory we have. Here's my CPU usage. Here's my network usage." I mean, I did it in TKTCL then, instead of just doing it in NRQL in a browser. But it's the same crap.

[0:07:29] LA: The difference, though, instead of having three or four graphs, we have 300 or 400 graphs.

[0:07:33] NB: Yeah. Oh, yeah. Right? People want to build these charts. But there's no widget I can make that small enough to actually watch everything in the system. When that struck us as a company, we realized dashboarding and alerting as the way you do observability has reached its conclusion. You can't go forwards anymore with a, I'm going to make fancier dashboards, or tinier widgets, or I'm going to give people more tools to set alerts. Nobody wants to set alerts. Nobody wants to build dashboards. They want to know answers. They want to sit down and know what's going on in their system. If dashboards and alerts are a tool, you can use that tool, but it's not really people's objective. That to me, that's that change that's occurred to us a couple of years ago. I think you're going to see all across the industry, especially with these AI-powered capabilities.

[0:08:23] LA: I remember back in the early days in New Relic, in the early - in our DB days in particular, we talked a lot about machine learning and the value of alerts generated from some magic machine learning algorithm that did magic things and said, "Oh, this is a problem, because such and such pattern exists." We use that as - we were very clear, right? I do. I remember standing up at conference after conference, saying, this isn't AI. This is not AI. This is something else. I forgot exactly the term I used. He had a term that he used for it. But basically, it was machine learning, is really all it was. What's changed? Obviously, machine learning isn't the core of what you do anymore from a intelligent standpoint. What really changed? Why did this change occur?

[0:09:16] NB: Yeah. It's a good one. I like to think of those techniques as falling into three buckets. Obviously, it's all computers. It's all just software. There's no magic here. When we look at simple things like you're going to do some type of component analysis on a signal, or you're looking for baseline deviations. You've got mathematical formulas; you apply them. The parameters are stack, even if the math is fancy. I said, the first category of this is like, this isn't AI. It's just math. This is just statistics. You can look it up. You can execute it on it. And you'll see a lot of functionality that's like that across every product, that tells you when there's a significant deviation, or things like that.

Then there's machine learning, where we're taking those parameters and instead, we're defining hyperparameters. We're saying, "Oh, hey, could you tune these baseline alerts so that there's only one alert per period, or make these hyperparameters work out and some simple algorithms there?" That made up a lot of what people thought of as ML ops, or these AI functionality, up until about two, three years ago. Everything just got clobbered by the return of neural nets. Neural nets obviously are not a new technology. We've been kicking them around the CS department since the 1960s. But suddenly, they started to work with that transformers architecture and some of these other optimizations, and infinite money poured into it. Now we have neural nets that work.

I think of the neural net-based systems as being what we currently call AI, although maybe five years from now we'll feel differently. Those are more complicated and less predictable than machine learning-based systems, which have more constrained set, but they tend to not actually use any neural nets. Those are simpler, but a little bit less predictable than static statistical systems, just like using math. A good product is going to have all three. I think that if your view of AI is I send everything to OpenAI and they send me back answers, you can get a lot out of that. That's cool. But it's not the right tool for every job. There's a place also for the more traditional machine learning approaches and even for just good old-fashioned statistics. I think that as a company and as an industry, we're exploring where those boundaries exist and how to tie them together.

Today, the neural net and especially the foundation like API, so calling OpenAI, calling Gemini, calling anthropic, that's the boss. That's the piece of the system that's in charge. But the tools that it has access to in an intelligent system should include lots of traditional ML and statistical tools.

[0:12:08] LA: It's easy to imagine this transformation from the static to the ML to the AI when you think about an individual stat. Like, I have a server, and it's got memory usage, right? The simple analysis, 80%, right? Above 82% send me alert. Fine. You hit 80%, you get 50 alerts an hour, and you try and deal with the noise. When ML did, then I say, well, 80% is not the right number. Let's make it 86.2, and we'll leave it there for five hours, and then we'll make it 82.1, and we'll make those adjustments as we go and get rid of more of the noise. That's great as well, too.

What the AI, the LLMs of the world now are trying to do is say, well, what this looks odd. What does this mean for this pattern to occur? Maybe I should alert on that. These are the growth patterns that you occur. I think one of the big values that I'm not sure yet we're seeing it yet, but I think could be coming, and I'd like your opinion on this, is the value of LLMs in the breadth of data, versus tell me everything about this piece of data, but instead say, here's a whole ton of data. In my mind, observability, not monitoring, but observability, is really about making a complex system understandable. That's really what you're trying to do. What's better at making a large, complex system easier to understand than an LLM, right? That's one of the things they're very, very good at is summarize and explain what's going on in a large system, without a human having to go through and analyze it bit by bit and understanding everything that's going on. I don't see that yet, but I see it coming. What do you see of the role of LLM in this large-scale system understanding model, versus just the, yeah, this is anomalous data model?

[0:14:03] NB: Yeah. I mean, I completely agree with you. I think that that's the key step to move from that data era to the insights era is that, I said, you can't make dashboard widgets small enough to watch everything that's important in your system today. We have these Kubernetes clusters that have hundreds of nodes. We're running thousands of pods. We have many, many clusters. There's so much data in the system today. A human can't possibly search through it all, nor can you pre-write ahead of time, alert rules, or build dashboards ahead of time that will show you everything that matters. The thing that matters might be a signal you've never looked at before, but you could see a major excursion on one particular metric that correlates with this failure on the user side, and you say, "Ah, that's what I was looking for." That's the place where you want to combine those statistical tools and the LLM tools.

Searching across huge amounts of data is tricky for LLMs. I'm not going to tokenize a petabyte of data and then feed it into an LLM. I mean, not certainly without making anthropic even richer than they are. But what I might do is perform a statistical analysis, look for anomalies using well-understood methods. Then take those anomalies, take those timing ranges, and send that into my reasoning system and say, hey, are any of these interesting? The missing pieces to do this, if you wanted to sit down and we're going to build this this afternoon, we're going to vibe code up, whatever, our future system. What do I need? I need lots of data in, but I need structure to the data. I need to understand, well, what metrics are about what systems, how do the systems relate to each other?

It has to be all temporally organized, so that I know when something happens, but I also need it to be spatially organized. The user is on this browser. It's calling this service. This service relies on these other services. They rely on these infrastructure. They rely on those. Because you want to draw across that graph when you are performing, whether it's a root cause analysis of a failure, or even a predictive fault, where you say, what's the linkage between response time and database? How do I piece this together? I think that there's a ton of work that can be done there with general-purpose LLMs.

I think that you need a lot of tooling. When I say tools in this case, I mean MCP tools, to give it that statistical capability. It's where you need that statistical understanding and the ability to feed relevant context into an LLM. LLMs are fabulous at summarizing. They have huge context limits on some of these systems now, 100,000 tokens, million tokens, but the data world we live in is billion.

[0:17:00] LA: Million tokens is nothing. Right.

[0:17:02] NB: Right. We've got to go from the billions down to the hundreds of thousands, or thousands, in order to make it processable.

[0:17:10] LA: One of the things that this sounds like to me is back in the early days of monitoring, the problem was alert fatigue, right? We could get all sorts of alerts and all sorts of things. We could have all sorts of notifications of all sorts of anomalies, but we just got so many of them that we ignored them. It's essentially what the type of thing you're talking about is now a system that can take all of those alerts and not be fatigued by them, and find the patterns that exist and find what's really going on. Is it that simple, or is there a lot more to it than that?

[0:17:44] NB: I think it is occurring at two levels. It's funny you said in the early days. It's still, this is one of the top problems I hear from every customer and from every team I talk to is they said, when something goes wrong, we do a retro, and what happens at the end of the retro? Every retro ends the same way. It says, oh, well, one of the things that we should do to not repeat this incident is we should establish alerting on monitoring for X, Y, and Z, so that we can spot it earlier next time.

By the time you run that forwards for a few years, there's an alert for everything in the universe. I remember Aaron Bento, who we used to work with, gave a fabulous talk one year on how additional alerts does not improve responsiveness. Because what it does is it trains the users to, instead, you get an alert, and you're like, "I'm going to give this a minute and see if it resolves." What you've done there is you've delayed your time to response. Noisy alerts corrode that response time, even though teams think it's going to add.

One place that we can use intelligence is we can be consuming those alerts and trying to determine if they're serious or not before signaling a human, right? Great. You can look at that and say, well, what's the things that every on-call engineer knows? If I just see the one blip, I'm like, "Oh, that's funny." If I get six alerts from six different systems, I better run to the keyboard.

[0:19:11] LA: Right.

[0:19:12] NB: So, okay. AI can do that. An ML system can do that. It's not rocket science. Another thing we can do is we can actually look at the setup of the alerts. Are a lot of your alerts useless? Do they always go off at the same time as something else, or are they never actionable, where they always come and go? They're not associated with real downtime? Let's look across the whole system. Let's tune some of those down, so we can improve the human-configured alerts. Both of those are after the fact. The real root of this is, why do we even have these damn alerts? What are alerts for? We create the alerts because we are afraid that something is broken in the system and we don't know about it.

Can we get to the root of that issue instead using these techniques and say, well, when I see it anomaly, I'm going to evaluate it and figure out, is this interesting? If so, is it actionable automatically? Can I just roll something back and notify the user? Is it actionable, but it needs a human? I should grab a user now. Or is it interesting, but not yet over that threshold, and I should feed it to my accumulator and then wait for a few other signals to see what's going on? I believe that in the near future, when a user sets up their observability solution, which, as you correctly point out, should be called an understandability solution, because nobody actually wants to observe. You want to understand.

When I'm setting up those observability tools, I don't want to go through and have to configure a lot of alerts. I want to say, here's my signals that matter the most. These are people using the system. These are the source of truth for whether it's working. Walk through it yourself and tell me when something is going to break before it breaks. Sometimes it sounds like really pie in the sky, "Oh, you're going to have the HAL 9000 telling you that this sensor array is going to fail in two days." A lot of this is actually really straightforward. It's like the same stuff that we do today as humans. We say, oh, well, here's a CPU threshold that we're going to set before it's a problem. Here's a memory threshold that's going to set before it's a problem. But then, we are responding to those on human scale time and not on machine scale time. If we can detect it, respond to machine scale, and correct it, it stops looking like an on-call rotation and starts looking like other systems that we consider so boring that they're barely even technology.

When programs crash, and they get restarted automatically, that was like, when do we start doing that in the 90s? When Kubernetes, Kubernetes says, "Oh, well, that node went away, put a pod back up." That's self-healing. It's just so nuts and bolts. We don't even give it credit anymore. We're like, oh, well, Kubernetes just took care of that. All of our systems should be like that. So many things that require a runbook today are going to be bridged over into just something that you find out about. When I lose nodes and my pods get rescheduled, I don't get paged. I don't even get emailed. It's just a thing that happens. I think that a lot of incidents that exist today, they belong in that category. They should be things that I don't get paged. Maybe I get emailed about it. Maybe it's just a thing that shows up in the log. It's like, oh, yeah. Then there was a problem, and we had to deal with it.

[0:22:27] LA: Like customer support of the olden days. Did you try turning it off and back on again? I mean, that's really what we are getting to, is the model where literally doing a runbook equivalent for a problem is what things should I turn off and restart? This one, this one, and this one. Did that solve the problem? Nope. Let's try this one. Okay, that solved the problem. Done. The cloud did a lot of that. Kubernetes did a lot of that. Both of them are working together really was the core of that change.

That's really the way most of our infrastructure works nowadays, right? Even our networks, right? The network's causing problems? We'll reboot our network. You can imagine doing that in the past, but you can imagine doing that now. You're basically saying that all can be automated, so that it just happens naturally, and AI obviously being the central part about that.

[0:23:20] NB: I think you mentioned cloud, which is a great example of this. With cloud, especially with these container orchestrators, like Kubernetes, even system management software. There's so many things that we expect will just take care of themselves. We were able to do so because those problems became really well defined. We said, okay, these are all the things that can happen. If this happens, then this. If this happens, then this. What are we left with our operations engineers, our SREs say, well, the parts of the system that can't be automated are - it's like, I got to pay attention, because if we ran out of IPs, I got to get called in, because there was an IP blockage, or we had to - there was a certificate issue upstream and I had to go and jostle it. There's a bunch of stuff that's fuzzy. You would never put it in like a runbook.

If we had to sit down and build a piece of software that looked at every possible case, we would finish out our lives covering a tiny percentage of all the things that can go wrong. If you can have a reasoning system that has a fuzzy interface between the structured data, and that's what LLMs do, and that's what these neural net systems do is they say, well, this looks pretty similar to this. Then we can start grouping problems into things we know how to fix, and we just take action on things that we believe are interesting, but they're not critical yet. So, we'll just put it on a list for humans. You say, "Oh, I've created a ticket for you. That certificate's going to expire." And things that are an emergency and do require human intervention, because we don't know what to do, and then we're paging people.

We can take, I think, a lot of what people believe today is just unavoidable work and move it into that automatable bucket. We've redefined, because of this, what qualifies as toil and say, oh, a lot of things that we thought of as human unique are actually just automatable toil. Let's go automate it.

[0:25:15] LA: That's an interesting take, because one of the first things that I come to is that means we'll need less human toil in order to maintain our systems. On the other hand, our systems become more and more complex. And so, ultimately, the amount of toil that we go through ends up being the same, or maybe increasing still, but we're now more dependent on these things to make things happen, which is the normal space. That's the way things work. That's the way things grow and the way things expand. That's really the case, right? AI is going to make this easier for us, so we can do harder things, really, is the way it works.

[0:25:48] NB: Exactly. Now, I remember talking to an engineer. We've implemented all this fabulous platform work. So many things from stuff you would barely recognize the way deploys work at New Relic now. Everything is so much better. We're like, if everything's so much better, why are we still working so hard? Everyone's like, "Oh, well, because we just do more." The constraint on what we can achieve is that human bandwidth. If we let our humans accomplish more, then we can just get more done. We've never backed out and said, "Oh, well, we're just going to work less now." Somehow, that never pans out.

[0:26:25] LA: I wish I could get the mainstream media to start reporting AI and job use in those sorts of terms, right? It's like, AI doesn't remove jobs. It makes it so each person does more in their job. Therefore, we do more. Yeah, it's not that we do less. We do more. Anyway, that's a whole different discussion than this discussion.

[0:26:46] NB: Yeah. But it touches on a good subject, which is, I do think - I don't want to be overly rosy on the economy at large. I do think AI will change jobs. While it won't remove jobs, it may move jobs. People have to change the way they do their jobs and how they do them. I think back to the scary side of this would be the Industrial Revolution 200 years ago, where obviously, the economy today is massively larger than it was before. Industrialization created so many jobs. For a lot of people whose jobs were directly impacted -

[0:27:22] LA: It wasn't supposed to. People were afraid. People are afraid their jobs are going to go away.

[0:27:27] NB: They did. Their jobs did go away, and it didn't turn out well for them, even though it turned out well for society. I think that we have to be always a little bit wary on that one. You and I were talking a bit before we started, as to, like, how does that change the software engineer's job? I think that's an important question. Thankfully, I'm not writing production code for New Relic anymore. I'm safely away from the production keyboard. I still keep my hand in, and I've been working with AI tools and things like that. I feel it shifts the level at which you need to think, and none of this should be a surprise. The march of higher and higher level languages and virtual machines of cloud in just our careers in a few short decades has been radical. The amount to which I tend to have to think about the infrastructure I run my software on has already changed entirely. I couldn't tell you when the last time I inspected assembly language was. Whereas, when I started, it was a routine part.

[0:28:25] LA: On your TRS-80 Model I.

[0:28:28] NB: Yeah. Please, I was a TI-99/4A guy.

[0:28:32] LA: Oh, no. Okay.

[0:28:36] NB: I couldn't afford a TRS-80.

[0:28:38] LA: Well, I worked at RadioShack, and so I could use the computer at store.

[0:28:41] NB: Oh, there you go. But you'll remember that there was a very long time where I would feed something into a compiler and it would misbehave, and I would pop open the hex editor, or I would just be stepping through them like, "Okay, what code did you generate, you dummy?" We don't do that anymore. When's the last time you look for a compiler bug? It's super rare. That's great. That's been freeing. I let those parts of my brain be filled in with bigger architecture patterns. How do we apply these data structures? How do we move data between systems at scale? How do we work with petabytes? That, I think, is that shift that we have to be ready for.

Also, looking back at New Relic, the purpose of our business is to make life easier for developers and software operators. As their jobs are changing, it's not just that we ourselves change what we build, it changes who we build for and what problems they're facing. If people are in a world where it's easier to create software, well, did we also make it easier to operate software? Because all that software that everybody you know is banging out with vibe coding, it's running somewhere, and it's going to break. Now you can't just ask the person who wrote it. Is it supposed to do that? Because maybe no person wrote it. It changes the field, because it changes the problems that our users face.

[0:30:04] LA: Right. Exactly. I was talking about that in the role of growing people from developers into more of software architects, and how that's going to be critical. It's the same thing. You move up the value curve because your tools are now smarter and more involved. I feel that's the way we're going to go. I worry that I see so many people talk about, this is just going to replace all of that. It's like, no, no, no, no, no, no. There's always still going to be humans involved in developing software. What we do to develop software is going to change drastically, but we're still going to be involved in writing software. It's just the way it works.

When I think about AI and observability, those two things together - This is a shift, by the way, where change subjects a little bit here. When I think about AI and observability, I actually think about two different roles. We've talked about one of those roles. That is using AI to help improve your visibility into an application. Great. That's wonderful. That's obviously from a product offering standpoint, something New Relic is very interested in, etc. Let's talk about the reverse of that, and that is, AI itself is still an application that needs to be observed. How do you monitor AI systems?

[0:31:20] NB: Yeah, we somewhat confusingly try to - because people love to lump this stuff together. We've tried to distinguish between, there's AI for observability, which is, what are all the ways that we can use AI to make a better observability product? That's important because the job of the software developer, or the job of the software operator, gets harder every single year, and AI gives us a chance to make that life a little bit better for them and try to claw back some of that complexity in today's super complex systems.

Then the other side of this is, as you pointed out, it's observability for AI. For people who are building these non-deterministic, weird AI systems, often at an API length, but it's a whole new tool chain. And so, how do we help them? We look at that. There's a couple of different levels, similar to how we were just talking about the shifts of AI to the industry, that, although they're unprecedented in speed and the types of things we're doing, the pattern smells awfully familiar, like if history doesn't repeat, at least it rhymes. You can see that also on the observability for AI side, that we've introduced disruptive new technologies in the past, cloud, the web, some of these NoSQL databases. Each of these has required a different approach to what matters.

What are the real golden signals of an AI system? They're going to be different than of your static web system. Just the same way that static web systems are different than Kubernetes, where the way we deploy today has changed what we consider to be those golden signals. We're working to develop those AI golden signals. We're working this time in concert with the open telemetry groups. There's so much enthusiasm, so much breadth in this, and so much stuff that's just on the other side of an API, where you won't necessarily even be running the software systems yourself, but you're relying on them. Open telemetry gives us a way to work together with the other members of the industry and to have something that each new framework that comes out. It can be instrumented from day one. It can be born observable, whether it's by New Relic, or by another open-source system, or another commercial system.

Our initial offerings for AI monitoring, that observability for AI, we started with our own agents, which are open source, but are proprietary to New Relic. Now we are looking to evolve that as we've been working with the open telemetry group on generative AI.

[0:34:06] LA: I'm not sure that answers one of my base questions yet, though, which is there's fundamentally different signals you need to monitor AI. Is there a fundamentally different observability models in general, or are they the same model of just different triggers?

[0:34:23] NB: I believe it's the same model. It's just different triggers that a AI system is no more different to a web system than a database is. That they have their own language. They have their own metrics, like we need to track tokens, we need to track sentiment, be set up for performing judges against, like, I want to know, am I getting the right answers? Are my users seeing the things that I expect them to see? I also know I'm going to be paying costs. I know that there's parameters to this that are a little bit different. That a lot of people who, when they sit down and they build AI into their platform for the first time, it is their first time. They're not experts. They don't know necessarily what they're going to be looking for, and they're probably and very reasonably a little bit worried, like, is this going to do something weird? Is it going to eat all my money, or anger my users?

[0:35:20] LA: Yes, yes, and yes, by the way.

[0:35:21] NB: I got to keep it in a - that's right, this mode of we're doing this for the first time. That's a place where, as a tool vendor, you have to come in with a set of opinions. You can't just say, "Oh, users, you can do whatever you want." Yeah, of course, you can observe any aspect of this. That's not helpful to somebody who's trying to bring a weird new technology into their production app for the first time. You have to say, "Hey, this is what you look for. These are the most important things. This is where a reasonable limit is. Structure it like this. Watch this data. Here's the type of thing you should be worried about." Walk them through that process. I think that that's a tremendous opportunity for everybody in this space is not just, are you providing that observability of the data, but are you structuring the understanding, so that people know what they're supposed to even be worried about.

[0:36:13] LA: When I build an application that relies on AI to do some things for me, one of the things that I buy into when I do that is the AI system is non-deterministic. That's both a benefit and it's also a curse, both at the same time, right? This non-deterministic nature is critical to how AI systems function. It creates the ability for it to come up with unique ideas and hallucinate both at the same time. Those both become viable outcomes. Is there a role in observability in monitoring, or observing, excuse me, or understanding, there we go, that aspect? Like, my system is hallucinating more today. Is that a problem? Or the data I'm getting in is causing more variation in my responses than is typical? I mean, are these the sorts of signals we're talking about here? Or is that a level higher than what we're really thinking about?

[0:37:12] NB: No, I think that's exactly correct. That's one of the key signals is you should be able to take, let's say, I'm feeding all of my questions to a third-party API. I'm using their fast and affordable model. Maybe I want to sample one out of a thousand of these and run it against somebody else's model, or a more expensive model, to say like, "Hey, this question, did it get answered well?" And so, to judge it. That action of continuously supervising, right, you've created, essentially, a call center with a bunch of pretty unreliable actors here with all these AI agents. You need to have a different model, a supervisor of some form that's walking down the aisles of this virtual answering system and making sure that those agents are giving the answers you want. I think that this is one of the new signals that's different than a database, or a piece of cloud infrastructure, in that you have to evaluate quality of answers.

We can think of it in some ways as being equivalent to looking at response times or error rates. It is just a signal. In the same way that AI is capable of turning these unstructured questions into unstructured answers, it also gives us a tool that we can use to take those unstructured answers and map them back and evaluate them on something that, in the past, you say, "Well, I don't know. How am I going to tell whether these answers are any good? Go feed it to a human and see what they think of it." We want to flag them so that a human could review them. But if we're going to do this at scale, it has to also be done by AI.

[0:38:56] LA: Do you see companies like New Relic, do you see this part of the observability space to do that level of analysis, or simply to report on that level of analysis being done?

[0:39:11] NB: This is a dynamic question, because it's such a fast-moving industry. At this point, we think of it as being, we want to guide users towards doing this and give them reporting tools and a structure so that you say, "Hey, you should be doing these type of sampling and evaluation, or judging of your answers, and we'll give you a way to fold that in." It's not something that we are today doing for users. Some of this is just because of things are moving so quickly, and some of it is around data privacy. It's just this question of like, what do you want to send upwards? I do think that this is something that we have to consider very closely as more and more people go into this field and need it easy getting started, that they want that five minutes to joy of I set this up, I started running it, and now it told me that when I move from Sonnet 45 to 46, all of the questions that got asked of my system about this particular purchase flow started to get weird.

Oh, okay, that's really important to me. I need to go back and look at my prompts, or to look at maybe I've done a fine-tuning or something that doesn't work anymore. I need to pay attention to that quality. I wouldn't know that other than by getting angry customer tickets, right? The goal of being an understanding, or observability, company is you should know what your system is doing. You should understand it before your customers complain. I think that that extends not just into technical elements like, are you using CPU, or traditional web elements like, is your page flow, but also into quality of responses.

[0:40:55] LA: You could expand that well into beyond just, is your AI giving your customers answers that are not nonsensical? You could expand that into, where are my customers focused on my web page, and why are my customers going from here to there? What I'm really trying to say is that once you expand into the how your application works with your customer, you open up a whole realm of areas beyond just the AI communication piece. Do you see observability moving into all of that as well?

[0:41:27] NB: Absolutely. I see observability as being in all of that, like moving nothing. I think, if I go back to my start, I said, increasingly a while ago, I remember working at a startup in the mid-90s, or early 2000s. That's foggy. Probably early 2000s for this one. We had so many different alerts, and all of these things that would tell us when something broke. There was one TV up in the corner, and it had the one chart that mattered, which it showed our sales per minute. We knew that if you broke something, that number was going to go down. The business impact would be there. It didn't tell you what you broke. That was the job of all of the CPU, memory, and database alerts. The job of that chart was to tell you were you achieving your business goals?

I think that that's still fundamentally the most important question, more important than any of the stuff, whether it's AI, or cloud, or data, it's are you achieving your business goal? Why? Do I understand what people are doing and whether they're successfully accomplishing it? If you're doing that, then everything else is just diagnostic. It's just helping you understand how to improve it, or if something has hindered it. The source of truth is whether if you're an e-commerce, are you getting sales done? If it's a social network, or people clicking on things, like whatever it is, your business exists to do, the reason you have this software, that's the most important thing.

[0:42:57] LA: That makes sense. Thank you. This has been a great conversation. I like to end with one completely off-topic question.

[0:43:05] NB: Uh-oh.

[0:43:06] LA: For new developers, people just out of college, just out of trade school, are just about to enter into their career and worried about whether AI is taking away their jobs, all that sort of stuff, all those normal things that are going on, what's the one thing you want to tell them?

[0:43:24] NB: If I can distill it down to one, I would say the first thing is, I really do feel for them. I have friends who are just now getting started in the industry. It's a really rough time to get started, because companies feel so much uncertainty. That uncertainty, whether it's macroeconomic, global trade, it's AI, it's all these things are stacking up, and it makes companies really reluctant to take risks and to take on people. They don't want to hire people who they're going to feel they have to let go soon. Every company right now is really slow on hiring. I think that that's dispiriting. It's tough. It's just radically different, that I'm sure the stories everybody heard about, "Oh, it's great in tech. There's always a job for you." It's got to feel really disappointing.

My advice to people is going to be, it's like, first of, there will be a job. It will be there. You probably, though, are going to have to pound pavement and work your network and do all the things that it took to find a job 30 years ago, before tech was hot. I think that that's the first one. When it comes to skills building, yes, spend time, spend time with Cloud Code, spend time building software with this new tool set. As you do it, take advantage of the ways in which those tools can be set to explain what they're doing, to be not just a magic machine that prints software. A little bit more of the diamond age, right? Young ladies illustrated primer. You want it to work with you as a customized teacher and to walk you through some of the things it's doing. And take advantage of the ability of these LLM systems to give feedback and to explain. This is something that even today, I tell lots of people when they're writing, I say, don't use your AI tool to write prose. I don't want to read an AI-written document.

Do use your AI tools to read what you've written before you send it, and say, what questions are unanswered? As a reader, what's confusing to you? Use those tools to hone yourself and to get sharp. You'll probably never end up building software that's the same foundational levels that Lee, you, and I started at when we started our careers, and I wouldn't expect you to. I would expect you to start today and to go to places that we can't even think of. It is a bumpy time to start. And so, I do feel for you.

[0:45:51] LA: Great. Thank you. I appreciate that. My guest today has been Nic Benders. Nic is the Chief Technology Strategist for New Relic. Nic, thanks again. Thank you so much for joining me on Software Engineering Daily.

[0:46:03] NB: Thanks, Lee. It was great to talk.

[END]