EPISODE 1799

[INTRODUCTION]

[00:00:00] ANNOUNCER: Beeps is a startup focused on building an on-call platform for Next.js. The company is grounded in the key insight that Next.js has become a dominant framework for modern development. A key motivation in leveraging Next.js is to create a developer-first experience for on-call. Joey Parsons is the Founder and CEO of beeps, and he previously founded effx which was acquired by Figma in 2021. Joey joins the show to talk about the platform, starting a company without an explicit AI focus, the limitations of current on-call systems, building on Next.js, and more. 

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. 

[INTERVIEW]

[00:00:54] SF: Joey, welcome to the show. 

[00:00:55] JP: Hey. Thanks, Sean. Great to be here. 

[00:00:57] SF: Yes, absolutely. You're the Founder and CEO of beeps. Tell me about the company. What's the vision, and where are you guys at today? 

[00:01:05] JP: Yes. Beeps is an on-call platform built for modern developers. When I say modern developers right now, the area that we're focusing on is Next.js developers, right? What we've seen, if you go to Product Hunt or Hacker News on a day-to-day basis, probably 60 to 70 percent of the companies that are being launched today are being actually launched on this platform. They're Next.js developers building on top of Vercel, and they actually have very unique challenges when it comes to infrastructure. 

Beeps is an on-call platform that's being rethought from the ground up with for now the needs of the developer. It's been pretty fun to build. We're about a year into it right now. Just launched our platform, and it's a pretty exciting time. 

[00:01:52] SF: Yes. I mean, I guess, given that you're seeing that kind of trend with Next.js and Vercel, I guess that's good job, Vercel and the team over there. I mean, that's a good sign that they're doing well. This is not your first company, right? You've founded companies in the past, and you were on actually talking to Jeff in the past on here about a prior venture. I guess why put yourself through this pain over and over again? 

[00:02:15] JP: Yes, yes. It's an interesting origin story. I guess to rewind a little bit, I've been in tech, but largely in the infrastructure and reliability side of things for a little over 25 years now. A lot of people these days might not know Rackspace, but they were one of the early hosting companies, one of the precursors to Amazons and the whole cloud today. I was actually probably the first 100 employees there, so way back in the early odds. I got to see at that point how companies were building globally-performing, globally-reliable platforms at scale, and was lucky enough to be able to really build a career and an expertise around this. 

I guess most notably, I ran reliability in a large part of the infrastructure team at Airbnb for a handful of years and got to see the evolution of infrastructure from that perspective. Then as you mentioned, after that, I left to start a company called effx, which is more of tracking microservices sort of company. We were acquired by Figma in 2021, and I spent a year and a half there working on developer productivity. Again, restarting thinking about reliability. 

When I left Figma, I started thinking about the next thing. A lot of people were like, "Yes. Why would you put yourself through the pain of starting another company and going down this journey?" It's just on-call has been such a big part of my life. Reliability has been such a big part of what I've built my career around. If you look at maybe the last 10, 15 years of software development, everything that an engineer does on a day-to-day basis has gotten exponentially better, especially in the last two to three years. Whether it's writing code, testing code, deploying code, observing code and production, all of it's just worlds better than it was at the advent of mobile and cloud. 

The one thing that literally hasn't changed is on-call, right? There's been a handful of companies that have dominated in the space, and they've built great brands and great trust out there. But nothing has really evolved in the way that actually is looking out for the engineers that are on-call. We're just making this process better. Having been through this for such a long time and had the weight of a deck of corns on my shoulders in the middle of the night and sometimes being on-call for early-stage startups where I'm literally the only person on-call for months on end, I'm pretty sure it's taken its toll on me. I think that I feel obligated to make this better for the next generation and at least try to drastically improve this experience. 

Yes, I think it's a worthwhile endeavor. The fact that you're on-call for other people's on-call is a little bit crazy. It makes a little bit different than what a traditional startup would be. But I think if we can really crack this, then I think that it's going to be worth it in the end. 

[00:04:53] SF: Yes. I always say during - I founded a company years ago, and I would say I was on-call for seven years, basically during that time. It's a lot. It's stressful. I want to get into some of the downside of existing on-call today. But before I get there, one other question I had regarding starting a company, especially right now, have you found this time around that given that we're in this AI hype boom, it's hard to start a company that's not focused on AI right now. 

When I was doing my company back in 2009, 2010, everything was about social. People would always ask like, "What's your viral strategy, or what's your social strategy?" There was all this pressure to do stuff that wasn't necessarily core to the product, just to have a story around it to tell investors and stuff like that, so I guess - then when these hype cycles are coming on and you're doing something different, it's hard to bust through the noise when everybody's paying attention to this one thing. I'm curious about your experience there. 

[00:05:52] JP: Yes. I think it is a little bit funny. I think when I was sharing the vision for beeps with one of our investors, halfway through the conversation, someone interjected. It's like, "Oh, this is the longest we've gone without ever hearing AI in a pitch." This was last year, so I think it's an interesting time. I think that one of the challenges I see these sort of days is folks are so much more fixated on technology than they are actually the problems, right? We actually are heavy users of LLMs at beeps. Both for our own development and in beeps itself, there's a bunch of stuff that's powered by AI. But we actually don't talk about it, right? It's because we're much more focused on the problems, and our users care about the applicability of it much more so than they do the actual, like the technology that drives them. That's always been the case, right? 

But it's not to say that it doesn't get folks excited. It doesn't get folks ready that they're using to be able to tell people and their peer group or their company that these AI products are making them better. Yes, it's an interesting space out there. I think in some companies that are relatively in the same space as us, seeing the seed round that they're raising are pretty astronomical and the tens and millions and things like that. But, ultimately, it's going to come down to who builds the most value for their users, and regardless of the amount of money that they raised. We'll see how that shakes out in a long time. 

To me, it's actually a really exciting time to build a company. It's a really exciting time to be an engineer, a developer building products and getting to try a bunch of these tools and seeing how quickly they're evolving and how quickly they're getting better. It's like every few weeks, my team's asking me about some new IDE that they want to try out. They love it more than the last one. It's just like, "Okay, when is this going to stop?" But at the same time, if it keeps unlocking a lot of potential there, then that's a really great thing. A pretty long-winded answer there, but I think that's how I've been thinking about it and how it's played out for us. 

[00:07:44] SF: Yes. I think we'll know that we're out of the hype cycle when I think companies do go back to somewhere more focusing on the value of what they provide versus the technology that's behind the value, right? It's like when a company can essentially talk about like, "Hey, we can solve this problem for you," and that's really what the buyer cares about. But maybe there is AI that's powering that, plus a lot of other stuff, right? But it becomes less of its essentially title or the company's name. 

[00:08:14] JP: Yes. Yes. So many people are launching on .AI domains. I don't blame them. But it does kind of - I think it pigeonholes you a little bit in terms of the solution that you're providing. But it's a tricky thing. I think it makes sense in this moment, and I guess we'll see how it plays out over time. Yes. 

[00:08:31] SF: Back to on-call, what does that on-call process look like for most companies? 

[00:08:37] JP: Yes. It's kind of growing, right? Or it's kind of evolving, right? Traditionally, most companies have a very simple rotation, whether that is a across one team or a bunch of different teams where you have basically your primary on-call. It's like the first person that's getting notified. You have secondary. Then sometimes, you might have a manager. It's like your tertiary on-call. You hook up your observability system, so this sort of platform. It ends up being like a routing layer for who gets notified when something breaks. 

For most companies, it's just that, right? I think that it's a pretty standard thing at most companies once you've reached some semblance of product market fit, or you have users that actually matter. One of the things that's like the companies that have been around for a long time, they do this really well like PagerDuty, for example, right? Their CTO, Tim, says all the time like, "Nobody ever gets fired for choosing PagerDuty," right? I think that it's a powerful thing, but they've dominated the market simply based off that brand and that trust and that reliability. 

Even someone that has run reliability at a company like Airbnb or Figma, I know I can't easily just walk in there and say like, "Hey, I helped build this team use my new product." We just don't have that sort of credibility, and I definitely understand that. So it's like there's definitely the technology piece of this, but there's also the brand building over time that's going to really matter to be able to win over the hearts of developers and be that recognizable name that when they begin to feel that fit, when they have those users that matter, they ultimately have an option to choose there. 

But it's the most interesting thing about the Next.js space and the Vercel space is that folks are willing to try and actually want to try modern tools that are tailor made for them, right? If you look at their choosing to host on Vercel and Fly.io instead of running their own infrastructure on top of AWS, instead of rolling their own authentication systems or using older solutions like an Auth0, a lot of them are choosing to use Clerk or use Supabase sauce instead of running on databases. 

There's this passionate community that have all these different new problems, these new challenges. They're willing to make bets on earlier companies and ride the wave with them. They have unique challenges, unique things that we need to think about for how we deliver a platform that helps the mother on-call. That's the path that we're going. I can get into that a little more. 

[00:10:59] SF: Yes. How did you come to recognize that the subset of perhaps the Next.js community was the ideal customer profile for you guys? They're open to using these various managed services. They're on the forefront of the technology adoption curve. How did you, I guess, stumble into recognizing that this is probably a good spot for us to actually go to market with? 

[00:11:23] JP: Yes. I guess a pretty simple answer. Just as a developer myself, when I started hacking on things, I was using the Next.js toolchain. It was just really simple for me to get going, right? Then you look at the credibility of the companies that have grown up on those platforms over time. It really reminded me of the late movement of the 2008 or the late 2000s where people were choosing Ruby on Rails and running Django, right? Then you have companies like Airbnb and GitHub that were built on top of Ruby on Rails. I think that same sort of movement has been happening over the last few years in this space. 

If you look at tutorials to get started on building any rap, right? Back then it was like how to get started with Rails. It's like how Mongo really got going, really targeting the Node.js space. But now, it's kind of like everything out there on terms of how to get going, what people are live streaming on Twitch, and what a bunch of YouTube videos are for getting started with building a web app, it's all Next.js. A lot of it was personal experience. A lot of it was just what you're seeing in community and the excitement about it. 

If you go to Twitter and people are talking about it's really popular to have that list of like, "Oh, what's my tech stack in 2024," you look at those things, and one of the leading things on there is probably Next.js. It just became really obvious to us.

[00:12:41] SF: In terms of the existing on-call systems, what are some of the problems with the current approach? Has it evolved? Sure. But why is that a problem, I guess, for companies? 

[00:12:53] JP: Yes. I guess I'll take my back to some of the companies where we build reliability and things like that. A lot of the times, these companies do a great job of notifying you, whether it's through a push notification or SMS. But it would just kind of drop there. What engineers actually need are the rituals around on-call, as well as the things that actually help them during an incident. If you break it up like incident management, there's all the things that you do before an incident happens to make yourself resilient to them. There's everything that happens during an incident like getting the context to understand what's happening. 

Then after an incident, there's all the stuff that you do to follow up with that, whether it's incident reviews or post mortems to get going. Every company at some level of scale that has things that matter ends up building a lot of the tooling around the existing on-call solution. A lot of it's very similar. A lot of people ended up following the Google SRE handbook, that book that was released, what was this, 10 years ago, I think, at this point, and were really focused on building those processes. It was strange that some of these incumbents didn't take the ball in front of them and didn't build anything to actually improve the full on-call life cycle there.

I think that with beeps, one of the things that we're really focused on is helping engineers build context when something bad happens. One of the key features in our free product is that we work with a handful of the modern observability tools that folks are using in this space, so your Highlights, your Axioms, your Sentrys. Sentry's been around for a while, but they're the preeminent error tracking tool in this space. What happens is most folks, especially at the early stage, and as they begin to grow, they just have these alerts piped into Slack. 

Let's say you have an exception being triggered or a notification coming from Axiom about some trend that's happening in logs. You'll get this nicely printed out message there. Our assistants in Slack will actually listen for these messages and knows how to pull context out of them. Let's say you get something about your users aren't being able to log in, right? Something's triggered in Axiom where you've set up alert that happens. We'll automatically listen for that and immediately be getting a thread to help you understand context. 

If I rewind a little bit, what most happens, let's say I get - you probably experienced this having been on a call for years on end is that the first thing you do is you probably open up a handful of tabs in your browser, right? You're looking at, okay, what got deployed recently? What were some of the commits in those deploys? Are there any services that I use that are down or any servers down, right? What are the most recent logs for my application? There's this playbook that you're running through every single time, just to kind of get context to eliminate potential factors to make you understand where you're going to go do next. 

Depending on the experience level of the engineer that's responding to this, you may or may not do these things. You may miss some information that should have just been an obvious thing to look at. That's really hard when you're woken up at three o'clock in the morning to remember to do all these things and remember which places to look. It's a time-consuming process, right? You're navigating all of these different user interfaces that aren't consistent, may change. The runbook is usually outdated when it comes to this sort of stuff. Instead of taking the 10 minutes to do that, we'll actually get all that information for you and print it for you there nicely in Slack in less than 20 seconds. 

Imagine being woken up at three o'clock in the morning by a notification, instead of the X-amount of minutes that would take you to get to that context. You could in your bed understand what you're going to do next when you wake up and get to that context really quickly. We hooked into all the different systems in this Vercel Next.js ecosystem really well and have been able to build some really tight integrations to be able to build that context really quickly. We think that that is going to be a really great lever for people to reduce the amount of time that it takes them to understand the problem, which then in turn helps them understand how to fix it faster. 

[00:16:57] SF: Can you walk me through, what's actually happening in the software behind the scenes when I know where it comes in in order to build that context? How are you actually going and essentially pulling in the right data from all these different places? How's the integration work and so on? 

[00:17:11] JP: Yes, yes. One of the things that we really tie with beeps is that you can get started in literally like minutes, right? We just have simple integrations with Vercel, where most people are hosting their apps, a simple integration with GitHub, and then a simpler integration with Slack, right? You actually don't need to configure beeps to talk to Sentry or Axiom or Highlight if we've done all that hard work for you. Through the Vercel API, we're able to grab a lot of really great, enriched information about your app, about its deployments. We have read-only access to your code base to understand the different providers that you use. We'll look at things like your package JSON to see the packages that are related to maybe a Clerk or a Supabase and things like that. 

Then when our assistant is listening for messages in a Slack channel that you direct us to, and we actually will work with any sort of tool that is observability-based at setting alerts there. It doesn't have to be Sentry, Axiom, or Highlight. It could even work with a Datadog where we are looking at the messages and using an LLM to basically decide whether or not this is actually an alert coming in from an observability provider. 

Then at that point, we kick off the integration process across a handful of agents in the background that understand we have a different tool for every integration that basically we will go out and fetch this information, send that back up to the primary agent that's communicating through Slack. Then it uses that information to decide what it wants to share, what's the most important context based on the alert that came in to help the user understand what to do next. 
Again, the background a lot is AI and LLM-based. When it comes to the prompt moments, it's much more streamlined than that. But it gives us some flexibility to be able to not just give a static list of information back but actually have context based on [inaudible 0:19:00]. 

[00:19:01] SF: Yes. Then you can take advantage of some of the things that these AI tools are really good at like summarizing information essentially. 

[00:19:08] JP: Exactly, exactly. 

[00:19:10] SF: In terms of the context, how do you test that the context that you're providing is actually valuable in the right context? 

[00:19:18] JP: It's a lot. I guess this is probably one of the hardest things to do right now. Obviously, we've written a bunch of tests ourselves that provide inputs and based on the prompts that we have and the context that we pass this prompts. Are we getting information back? Then at the same time, we have a bunch of test apps running wild in production that actually get real traffic and actually have real issues. That's probably the best way to test that like, okay, was this actually valuable to me? 

And one of the things that we do with beeps is that we actually ask you afterwards, "Was the context that we provided useful to you?" And we're using that to power a lot more stuff in the future. I think that there's definitely a world where as we're able to provide better answers that might give you a hint about what to do next, being able to push that into a direction where eventually a user will have enough trust with what solution we're providing to where maybe they would hand the keys over to us and say, "Go fix this automatically," right? 

And I think that while I don't think we're there yet, and there's a lot of scar tissue with the whole AI apps promise over the last 10 years, I think there's a world where this could get to the point where there's a large share of issues that are relatively common across a code base that could be solved without human interaction. And hopefully we get there because that's one of the ways you make on-call direct drastically better. 

[0:20:45] SF: Yeah. Even if you get to a place where you can provide through an assistive like a pretty accurate depiction of what the solution should be, that's massively more useful than like, "Hey, here's an alert. There's a problem. Go figure out the problem and the solution essentially," right?

[0:21:02] JP: Yeah. And we're very keen on being suggestive and not sort of promising the world on this sort of stuff. I think that there's other companies in the space that kind of promise the world when it comes to that sort of stuff. And it's a different game than sort of like your Copilots of the world that are helping you code, right? Incident response is very high stakes. And again, trust really matters here. We'll see if they're able to have success by doing that. But I think it's one of those things where if you give someone the wrong answer and you give it to them with a lot of confidence, then you can lead someone down a bad path that could have detrimental effects for their users and their company. And that's a big bet to make at this stage. 

[0:21:40] SF: Yeah. I mean, I think one of the biggest challenges with LLMs is that they're inherently in some ways like the worst form of an employee where they're overconfident and incompetent at the same time. Very sure of themselves, "Here's the wrong answer."

[0:21:56] JP: Yeah. And obviously, that'll improve over time, right? I think we'll get to where they're not as incompetent as the worst employee. But yeah, we're definitely not there yet, especially when it comes to solving some of these tricky novel problems. 

[0:22:10] SF: How do we get to a place with on-call where you're not bombarded by the noise of everything and you only have to deal with more novel issues? 

[0:22:19] JP: Yeah. At least my opinion is that this is sort of the path to get there. What you don't want to have, you're exactly right, is so much of on-call, at least from what I've seen, is just kind of you're getting through the muck of the annoying, sort of repetitive things. The things that happen like once a week that aren't actionable that almost really ruin the experience for you. And then when something novel does happen, you're too tired or you've been wrecked for like the last few days by these sort of pestering alerts. 

And I think that the process that happens outside of on-call is really important there. And I think the on-call actually has a lot of influence on that. And the idea of sort of what happens in larger organizations, even smaller companies, is that you have sort of these handoffs that may lead to action items of things to go look at and things to fix as a result of that. But it's a very human process, right? It's a ritual that happens maybe not very consistently. 

And the great thing is that with LLMs and some of these tools is you can power a lot of that, right? You know the history of alerts that happen. You know that they've happened at some level of frequency. You know that one alert has come in every Wednesday night at midnight UTC and being able to sort of track that and understand what to do about it and what's noisy and what's not. There's an obligation I think as an on-call provider to provide that information and make it easy to digest and easy not have to be a human process on the outside. And I think that's how you build continuous improvement into this process. 

Another thing that we kind of have a hot take on is the concept of incident reviews and post-mortems. At a lot of companies, this ends up being a lot of theater, I think. And there's probably a whole group of people that learn from incidents that are going to come after this. But I think that that's one of the things that we definitely want to improve upon. Because what an engineer really needs is not a document that they're not going to ever read again that has a lot of really great information about it but it's like surrounded by pros that they have to read through in sort of like a thick moment. 

What they really need is that context when it happens again, right? What is the evidence of this incident happening in the past? What is the historical context here? How has it resolved last time? What are the different things to try? And I think that a lot of that could be actually like auto-generated and provided at the moment something critical happens and provide that context. Instead of, again, this other document that was discussed and learned from, which is obviously great. A lot of folks probably aren't going to remember that when the world's crashing around them and they've got to solve this issue. It's a tricky thing, and we really want to make that better. 

And I think a lot of these rituals that happen outside of actually being on-call are one of the best places for helping an engineer have confidence, have context, and be able to walk into a varied novel situation, understand history, and make it better. But I think you're right. The first thing is just really getting rid of the noise, eliminating the non-novel things from having to be resolved by a human and then sort of taking it there. But we are ways away from that, just to be honest, right? That's something that we want to build into. I'm sure there's other folks building into this as well. But I think that there's a big problem to be solved there, but it's an evolution. Yeah. 

[0:25:41] SF: For where the product is today, do you see the main value proposition for an organization using Beeps is around just saving time? Essentially, engineers are probably one of your highest or highest-cost employee resources within a company. This is a way to essentially reduce the cost where they're maybe not building essentially a core product. - 

[0:26:01] JP: Yeah. As a CEO myself and having been a leader of large engineering teams, there are three things that I've always wanted, right? You want your engineers to have all the tools that they need to be productive, but you want them building products, right? A lot of times you don't want them to focus on infrastructure and reliability. I'm sure every CEO dreams that they can have an engineering team that instead of dealing with tech debt and incidents and things like that, they wish that they were all building product and building value for users. The more that you can eliminate there, the better. 

And then obviously, number two is you want to have a fast, reliable, secure product. Making sure that you of meet the expectations of your users and terrible things not happen is vastly important. And then you want to have a really happy, engaged team that cares about its users, that cares about a product, right? Those are usually the top three things that you want as an engineering leader or even a CEO when it comes to thinking about engineering. On-call touches all those things. It really impacts people on the weeks that they're on-call if they've had bad experiences. And every minute matters for your users when it comes to building something that's actually meaningful. And we're past the days where you can have a 9-to-5 on-call because your business is predominantly in the United States or regional. Everybody's got global customers that care and matter and having a team that's being able to support that is really important. 

And I guess to take another step back, one of the other things that we're seeing is, I even saw this at Airbnb and Figma, is that we're moving past a world where people like me were the ones on call, the ones that grew up from sysadmin, to ops, to SRE. I guess people are calling it platform now, where you have these highly trained engineers that have a compendium of shit in their brain that they've been through, right? They know how to deal with these things. They recognize what database instability looks like really quickly. They recognize. They know how to sift through a bunch of graphs and build context. They can see a wall of logs and immediately recognize things. 

Companies are relying a little bit less on those going forward, right? Especially in this sort of like Next.js space, people are growing up. There's just not a lot of need for that. And even at companies that are built on top of the cloud and built on top of a bunch of these other providers, you're moving less and less to where you have these highly trained folks that have been in this for a while that recognize these patterns really well as the folks who are on-call. And instead you have just software engineers on every team, regardless of experience, sort of taking this burden on. You have new grads six months out of college and like they're being woken up in the middle of the night to solve these really hairy issues. And they're just not super confident about it. 

One of the things that we're very prescient about is we're building for these developers, right? We want to be able to - we're focusing specifically on their needs. How do we help build that confidence? How do we help build that context and building a tool and a platform that does that?

[0:29:12] SF: In my experience, like when I was at my time at Google, when you had super junior people on-call, it's a learning experience for them. But it ends up actually creating more work, essentially, for everybody else because there's not that much stuff that they can actually handle on their own. They end up sort of almost a proxy or relay to the more informed people on the team that need to actually jump in there and actually solve our problem. 

[0:29:34] JP: Yeah. Yeah. Imagine if we could just make that better, right? Imagine if those engineers - again, going back to opening up the different tabs on your browser, right? They could just not be getting all of that context because they just don't have it ingrained that these are the different places to look and these are the different tools I need to look at. And instead, if we can just give you that context and immediately you know if it's like a recent code change and you can just roll back really quickly, that's a huge win for most companies. 

And going back to like the other big sort of like value of beeps right now and what we're really providing is that one of the things that's very distinct in this space that we're building in is that a lot of times you're not thinking about servers anymore. You're not thinking about servers but you're thinking about services, right? You're thinking about these different APIs. 

Again, going back and using the example of Clerk and Superface for auth, and like Neon, PlanetScale and other places for your database, and using tools like Resend for email. And a lot of these companies that are building on top of LLMs, they're talking to Anthropic, they're talking to OpenAI, Groq, all these super providers. And one of the challenges they have is, sometimes when your app breaks, you don't know if it's you or if it's them. You don't know if they're down or if they're actually having issues. 

One of the big values that we provide is that we keep track of the different providers that you use. Anytime your package JSON changes or some of the other signals that we look for in your application code base, we update the list of providers that you use. And we have a pretty comprehensive list of all the different tools that folks use in this space. And anytime they go down, we let you know a lot of times within a handful of seconds. You immediately have that context why they're down and can make decisions appropriately based on that. 

And it's just that it's a pretty simple thing, but it actually provides an absolute ton of value. Because, again, it's not something you have to go look at. You don't have to keep track of the status of the 16 different providers that you use and we do that for you. And not only tell you when they're down. But when you're down, we'll actually tell you if it's you or it's them. It eliminates a big source of anxiety that engineers in the space have been dealing with for a while. 

We did a cheeky thing where like Vercel has - they had like, "Are we turbo yet?" And some of these things to track sort of the status of their different big initiatives. And I was sitting at Next.js Conf, it was last year, and they were talking about, "Are we turbo yet?" And I went out and bought, "Are we down yet?" And we have a sort of portal where you can see sort of the last time an issue was reported on any of these big providers. It's pretty interesting to see who's down all the time and who's not. And it's kind of like an honest look at this industry and some of the tools that they use. And hopefully, we can help promote better reliability in the space and help this ecosystem grow in a more solid way with a lot of this data. And we're going to be ramping that, revamping that up a little bit more to maybe show some comparisons between different tools and things like that. But it ends up being a big value to our users. And one of the things we're really excited about is just knowing that simple information. 

[0:32:42] SF: In terms of the engineering of beeps, what's the stack look like? 

[0:32:47] JP: We're 100 % TypeScript. We want to feel the pain of our users. And predominantly, our web properties are all Next.js. Again, completely dogfooding the system that we're building for. And then we have a bunch of TypeScript services running on the back end using Node.js and basically running on a handful of platforms that make it really easy to run containers. Fly is one example there. And it's pretty simple stuff. But I think in order to really build and understand sort of like the challenges that our users have, it was important for us to build in this space as well. 

Obviously, when we're measuring the health of all these providers, we can't use all of them, right? Because beeps needs to be more reliable than sort of even the systems that we're monitoring or the systems that our users are managing. It's a pretty fun challenge. Again, just pure TypeScript at this point. 

[0:33:42] SF: In terms of deployment then, this is run as essentially SaaS. 

[0:33:48] JP: It is run as a SaaS. Yeah. We have our web app that is used for setting up integrations, getting information about the different apps and sort of the history. We have a whole status page product where, as part of our free product, you get a status page where you can communicate status of your application to your users, right? And we have a bunch of different themes that are sort of fun for the sort of community. And it's a good way of expressing your brand through our themes. And yeah, again, those are all just kind of like web properties. Yeah. 

[0:34:22] SF: In terms of what you're doing with some of the LLM work, how do you manage orchestration? I think you mentioned that you have like a number of agents that are going off and performing independent work. How does that workflow work? Is that something that you built in-house or are you using some sort of framework for that? 

[0:34:38] JP: Pretty much we've built that all in-house. A lot of the agentic frameworks out there, a lot of them are written in Python and just not matched up to our stack. Whether or not those agentic frameworks have a shared consciousness or each agent has its own consciousness was some of the things. We went out and built our own internal package in TypeScript ourselves that we're using. I think it might be something that we eventually open source. Because from a TypeScript perspective, there's just not a lot out there to be able to build these tools. And I think that it's something that I think that we could contribute back. We probably need to do a lot of cleanup and not be so deep-specific. But I think there's a world where we hope to be able to share that out with the world and help that community sort of grow. Yeah.

[0:35:21] SF: Yeah. I've heard about it as a consistent issue with a lot of the - not just even agent frameworks, but essentially all of those sort of libraries are available. If they're not Python, solely Python, they're Python-heavy where their support for other languages is not as well documented and there's less of the community behind it. It's kind of hard to invest company resources in a framework that maybe not be there six months from now. 

[0:35:47] JP: Yeah. But we found that we originally built some of our own SDK to interact with the LLMs to basically do sort of failover and racing between Anthropic or OpenAI so that we could be resilient to sort of any sort of failures there. The Vercel AI SDK is actually pretty solid when it comes to this stuff. And we recently switched over to that to power sort of our connectivity to LLMs. But it started to catch up a little bit. But yeah, from like an agentic framework sort of perspective, it hasn't sort of reached that level again. 

[0:36:24] SF: Are you also using v0 from Vercel? 

[0:36:27] JP: I do a little bit of it. Yeah, we haven't used it for any of our webinar faces because they're a little complex, but I think like we probably could at some point. But our admin tools - I was telling you my background about being infrastructure and reliability. I'm not a -end engineer by any world, but I've been able to really build our admin interfaces really easily with v0. Just very simple problems and I can get going and be dangerous enough to build something to be able to help us do better customer service and understand sort of where our users are based on our data. 

Actually, going through that process was pretty shocking what I was able to do in a very quick amount of time. I'm pretty excited about that continued evolution there. And some of the stuff that I've seen demos on where it's using like Three.js to do spinning world, it's kind of mind blowing, right? It's really interesting to see what the power is there. And I'm kind of curious to see how that continues to grow. 

[0:37:27] SF: And as a startup founder where resources is always limited, using some of these tools, are you feeling essentially that you're going to be able to go further with sort of less people because they're presumably operating a little bit more efficiently? 

[0:37:42] JP: Oh, 100%. I think that there is a huge sort of like - I don't know if it's like step level or like exponential, but there's quite a big lift on what we're able to do. I'll give you an example. We use like kind of off-the-shelf sort of like background, like a scheduling system through pg-boss. Our database is Postgres. And instead of like going with a high-level sort of like scheduling system, it's like, "Let's use pg-boss and kind of go down that path." 

And we were investigating an issue with pg-boss and seeing some jobs being backed up. And they have a very particular schema and it was kind of being able to write a query to look what we were looking for was going to take a little bit of time. One of my engineers came back with the query really quickly and it had all of these different sort of SQL functions that I'd never seen before. I forget exactly what they were, but it was like, "Wait, what is that? What is that thing do?" And it ran and it worked and it gave us the app and what we wanted. I asked him, "Did you write that? Do you know all this stuff?" He's like, "No." I think he popped it into Claude 3.5 and it came back with like the right answer really quickly. And this is probably something that would have taken, I don't know, if not like an hour or like multiple hours to kind of really get right, it was something that we had in like a matter of seconds that gave us the exact answer. That wasn't anybody's particular expertise in terms of like how to extract this information. 

And just those little moments like that, even beyond just sort of code generation and like the tab completion stuff, that sort of stuff ends up being, I don't know, things that really, really save you a ton of time and give you sort of the superpowers that I think AI intends. And pretty cool to see those specific examples play out very regularly for us. 

[0:39:26] SF: Yeah, I mean, it adds up, right? Especially when you're dealing with maybe technology that you're not necessarily an expert in or you're not working in it day-to-day. And yes, you could figure it out, but how many hours or days is it going to take to figure out versus being able to leverage some of these tools where you kind of know what it is that you want enough to vet the output, but you might not know all the obscure functions that you've kind of been purged from your memory because it's been a while since you touched that thing. 

[0:39:54] JP: It's pretty wild. One way to think about this from a beep's perspective is if you think about I definitely see the world where you can build a lot and leaner of an engineering team to solve really valuable problems for users. You're even seeing it today, like some of these early stage startups that have a ton of traction and built amazing products. And you look at Winton to see how many engineers they have. They have what? It's a huge company. What seems like a huge company. And it's 10 engineers, right? That's a pretty common thing that you see. 

The tricky thing though is you need a decent amount of engineers to have a pretty solid on-call rotation. You wouldn't want your engineers to be on-call every other week. It's a limiting factor in what you're able to do there. I think that a company like beep can help solve that. Where instead of having these really rough on-call rotations where you don't want to have them, but every two months, you could tighten the loop a little bit by not making them so difficult. Implement strategies that may be different than just the pure week-to-week sort of strategy, or eliminate a whole class of issues and maybe a human's not on-call for something. But yeah, we're not there yet. But I think that what we we're talking about before, there's a lot to be improved upon. But there's a lot of potential there, and I think that it's one of those ways that on-call needs to catch up with the rest of the software industry has evolved. Yeah. 

[0:41:16] SF: In terms of the engineering, what has been one of the hardest things to build so far in order to bring this product to market? 

[0:41:23] JP: The easy answer there is the engineering isn't the hardest part. It's like building the right thing. That's the easiest answer. But I think that we have unique challenges in that, again, we're on-call for your on-call. We have to be incredibly resilient and have to build that brand of trust and that could be eliminated. Having all those sort of best practices in terms of resiliency and understanding sort of our own monitoring of our own system and how we handle those failures I think has easily been the hardest part. 

There's not like a specific thing, but it's more so just making sure that we're redundant across providers, redundant across the systems that we use. And thinking about that way from the start is one of the challenges about building a system like this, right? It's easy to launch an MVP of a product that doesn't have those sorts of requirements that if you're trying to go after this market in this space where that really matters, it's time-consuming, it's challenging, but it's a very important thing to do. Yeah.

[0:42:24] SF: Yeah. I mean, I don't think you can underestimate the engineering effort that goes into building like a really reliable tool. And you can't be in the space of on-call and reliability where the product is not reliable itself, right? 

[0:42:38] JP: Yeah, exactly. It just won't work, right? It's one of those things. I don't know. Everybody thinks it's easier than it actually is, right? Even myself, every time I dreamt about building a product comparable to a PagerDuty, I used I always think, "Ah, that's something I can build in a couple of weeks." I mean, it's a lot harder than that, especially to build that resiliency. And even just working at Airbnb for a long time, everybody was like, "Oh, Airbnb is the simplest product." What do all those engineers do? It's so much more complex than what you see on the front-end. And so much goes into building a product like that. And you take the consumer part out of it and you're building this enterprise product that people that are relying on it. Yeah. 

[0:43:22] SF: Look at Twitter. Twitter is the simplest product in the world. The first version of that was probably built in a matter of days or something like that. But if anybody remembers the Fail Whale era of Twitter, they were falling over because they basically couldn't meet the scale and reliability challenges. Luckily, they had a talented engineering team enough to navigate their way out of that and build this resilient system. But the product itself is simple. It's the backend infrastructure. To meet the scale needs, that's really, really hard, especially back then when you couldn't just go and stand up a super elastic system on the public cloud. 

[0:44:00] JP: Yeah, I remember 10 years ago when people would talk about how simple Twitter was, you kind of ask them, "Okay, how do you handle the notifications though when -" I guess back then, it was Justin Bieber, right? If Justin Bieber tweeted, how would you send all the notifications to all the people, the tens of millions of people that followed him, right? And you could really see somebody's engineering chops by how they would think through that problem. Because that was a serious thing to think about back then. Yeah. 

[0:44:28] SF: Absolutely. Well, Joey, thank you so much for being here. This was really great. 

[0:44:33] JP: I appreciate it. Yeah, on-call is like a personal thing to me that I really want to get right. Because again, it's a whole different set of developers that are taking on the on-call challenges for these companies. And I would love for them to not have sort of the experience that I had and like the pain that I went through. 

And I'll share like one last story that kind of exemplifies this the most. And it's about 10 years ago now, October 2024, my wife and my then sort of like girlfriend and I were kind of in San Francisco. We were at a bar in Pac Heights having a few drinks and decided to walk back to our place in Mount Hill after having a fun little night. And we're in Alta Plaza in San Francisco overlooking sort of like the Golden Gate Bridge, just kind of admiring the view. And all of a sudden, my phone buzzes in my pocket. And instead of like, "Oh, crap." I was working at a Flipboard at the time. I was like, "Oh, Flipboard, h-base. We're having some issues here." And I could see sort of like the sadness sort of wash over her face. Because, to that point, we'd been dating for a few years. And we had family in town and I had to disappear to go solve issues. We've been on vacation where I had to like run back to the hotel room in like a panic. 

I remember one time we were in LA, in Manhattan Beach and we were at a restaurant, and I literally had to leave her at the restaurant and go sit in the car to help revive a startup that doesn't even exist anymore and deal with these sorts of issues. I just kind of see like the wave of sadness kind of wash over face. Instead of reaching into my backpack to pull out my laptop, I pull out an engagement ring and propose to her. 

And it's a great origin story for beeps. But if you really think about it, it's pretty sad, right? I involved on-call in one of my life's biggest moments and used her sadness about all the times that we've been impacted as a way to kind of shock her with the proposal. And it kind of hits home with like how much this has been a big part of my life and how important it is for me to to really make sure that we fix this and do this in the right way and make sure that 10 years from now I don't have engineers telling me that they replicated that story and had to have relationships ruined because of on-call. 

If I can just make this one incrementally a little bit better or inspire somebody else to kind of like join us and help us achieve this sort of like mission of making on-call better for this next wave of modern software developers, I'll be really happy. 

[0:47:22] SF: Absolutely. That's awesome. And that's an awesome story. Great way to end the recording today. Thanks for being here. 

[0:47:27] JP: Thanks, Sean. I appreciate it.

[END]