EPISODE 1675

[INTRODUCTION]

[00:00:00] ANNOUNCER: Event-driven architecture is a software design pattern where system components communicate through events that are generated by producers and pushed to consumers. This design is often contrasted with a request-driven architecture where components communicate with each other by sending requests and receiving responses. Hookdeck is an event gateway for receiving, processing, and delivering asynchronous messages. It centralizes and streamlines communication between services like a third-party API such as Shopify or Stripe and internal endpoints or other APIs. 

Alex Bouchard is the co-founder of Hookdeck. He joins the podcast to talk about event-driven architecture, building event bridges, expanding Hookdeck beyond webhooks, and much more. 

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[00:01:05] SF: Alex, welcome back to the show.

[00:01:07] AB: Thanks for having me, Sean. 

[00:01:08] SF: Yeah. We were chatting before the show and it sounds like you are in the Yukon right now. I think this is probably – you're setting a record for the most northern latitude individual that I've ever recorded an interview with. Are you in Whitehorse? 

[00:01:22] AB: Yeah. That's right. I'm in Whitehorse. Well, that's funny. If that's the first, I'll take it. And maybe next time I can try to beat that.

[00:01:29] SF: Yeah. You have to be basically at Santa's Village to go more North next time. But that's awesome.

[00:01:35] AB: Yeah, something like that. 

[00:01:36] SF: Yeah. You were on the show probably a little over a year ago. And at the time, we were talking about webhooks at scale and how messy webhooks can get when you start to manage a lot of them. Hookdecks shifted a bit away from focusing purely on webhook management. Can you start off by setting some contexts around like where is Hookdeck now and how has it evolved over the last year and a half or so? 

[00:01:58] AB: Yeah. For sure. You might remember during our conversation, I'm pretty sure I called webhooks the gateway drug to event-driven architecture. And I think that was already setting the stage for how we were thinking about what Hookdeck is and what it does. 

And over the course of kind of like the last three years, webhooks – like kind of receiving webhooks built as this primary use case for Hookdeck. But we always saw people building other or solving other event-driven architecture with the same product, with the same platform. And that was really because webhooks are events and they're just kind of one specific way of receiving events or like one convention. But what became really, really clear to us is that the value of the product and kind of like a common denominator of how people were using it really wasn't specific to webhooks per se. 

Hookdeck is an event gateway. And that's how we think about it. The event gateway is kind of like this bridge between the outside world sending events into your system or your system sending events to the outside world. And webhooks are just like the biggest manifestation of that. But there's plenty of other use cases, which I'm sure you'll want to touch on.

[00:03:05] SF: Yeah. And then you start off by saying that like webhooks are this sort of gateway drug to event-driven architecture. For those that maybe are less familiar with that term, how would you describe event-driven architecture? What does that like really mean in terms of building systems? Software systems? 

[00:03:21] AB: Yeah. It's a great question. When we think about event-driven architecture, I think we're really talking about asynchronicity. And those are kind of almost like the two programming or infrastructure kind of paradigms when it comes to building web services now. We have synchronous workflows. We're very API-driven. And then you have the adoption of event-driven architecture that's in large part being driven by webhooks adoption. 

The asynchronous nature of it means you're now building on a model that's eventually consistent. And you rely on those events that are being produced by systems to kind of perform your own operations in your system. Let's start with like really straightforward example. If you're building on top of Shopify, Shopify will emit events whenever things happen within Shopify. Orders are created. Product is updated. Checkout session was abandoned. That kind of stuff. Right? 

And then you can imagine your system will want to react to this in so many ways. And what you can do is subscribe through those Shopify updates through webhooks and then perform your own kind of business logic when those events occur. And that happens the other way around too. You might be sending events to your own users or to other platforms and so on. 

And the true beauty in the asynchronous processing is that you kind of decouple those two different operations. You decouple Shopify's dependency from your own dependency. And this event that's processed kind of like in due time, right? Asynchronously, lets you build those system in a way that's sufficiently decoupled and sufficiently scalable that you can have the integrity that you need and make sure that you perform the actions that you were supposed to perform and so on. 

And I think as software is more and more about integrating software, this kind of like new dependency on those asynchronous workflows is just growing more and more over time. Shopify is one of those example that people have been receiving Shopify webhooks for like a decade, right? There's nothing new to that. But we see more and more adoption through like the longtail of platforms, and SaaS providers, and tooling in general that will go down that direction. And it's not just like the Shopify, the Twilio, and the GitHub of this world are now like big producers of events. 

[00:05:25] SF: And then this also apply internally if I'm using something like Kafka, or Pub/Sub, or maybe even just like a message cue where I'm going to have a bunch of these like events sort of queued up and then I have people, basically internal services, listening to those and reacting to them. Essentially, that's an internal – basically you're getting the same thing as you get from a webhook but you're just doing it internally. 

[00:05:47] AB: Yeah. Yeah. Totally. I think that's where most of the expertise in like event-driven architecture has kind of been built in the last 10 years. There's very good kind of message cues that have been developed and commercialized like Kafka and so on to build that internally. 

I think the part that's really interesting about the event gateway is that how you deal with the outside world. It's when this event is coming from a different platform where now you have throughput concern because you're not the producer. You have security concern because you're not the one generating event. You have standardization issues. You have management issues and so on that you don't like really tend to see internally. 

I think there's like that category of problem hasn't like totally yet broached kind of like outside of your own infrastructure. But we're more and more kind of being forced to. And now we improvise, right? We build those kind of like ad-hoc solution to receive those Shopify webhooks and so on. But like they're not the first-class citizen the same way that events that you produce between your own internal system. That's how we think about the gateway. The gateway is like the interoperability ability for your event-driven architect. It's like what happens when you're not either the consumer or the producer of that event. And either the producer or the consumer is a third party, or some device, or something you don't control. Really have no bearing over. Right? 

[00:07:00] SF: I see. When I hear a gateway, a lot of times – and I think this is probably true of many people, I think like API gateway. With an API Gateway, it's like a server that's acts as an intermediary to manage your requests from like the client side to the various backend services. And with an event gateway, is the right interpretation that it's essentially kind of like a server that acts as an intermediary to manage and route events from external to internal and sort of vice versa? Is that what's happening there? 

[00:07:27] AB: Yeah. I'm glad you latched on to that. Because, really, that's intentional. We're trying to like coin this term in many ways. We do see a lot of competitors already doing this and so on. But I think there's like no verbiage around it. Yeah. The reason why we go with event gateway and why we call it that way is because of this like proxy to API gateway and what we already know it does. And the API gateway and the event gateway solve very similar problem in terms of the value proposition. But how does it and the nuance is completely different. 

Specifically, the event gateway is stateful. When you think of an API Gateway, it's yes like this middleman. But it only acts at a specific point in time. This request is accepted or it's denied. This request goes to destination A or goes to destination B. In the case of an event gateway, all the data is retained and cued. What that means is when an event is received, you can send it to multiple place where it can be kind of duplicated. It can be paused. It can be retried later. It can be delayed. And all those kind of like fundamentals mean that it serves a different kind of problem space. It serves a problem of where the nature of the data that you're receiving is asynchronized and you don't need to specifically reply. But now that means you can send it to multiple place. You can filter it. You can do kind of like all those things I was referring to earlier. 

And I think like one of the clearest example of that is one of the value proposition of API gateway, which is throughput control. In the API gateway kind of synchronous first-world, throughput control, you're going to say for instance, "Okay, I don't want this endpoint to receive more than five requests per second. And above that limit, I'm going to return HTTP 429." You're sending too many requests and now the data is going to get denied. It's going to get refused. 

In a event gateway model, data is never refused. Because you don't control the producing of that data. And therefore, if Shopify decides to send you 2 million events, it's your own damn problem to deal with those 2 million events. It's not Shopify's problem, right? 

[00:09:23] SF: Right. Yeah. 

[00:09:22] AB: An event gateway will receive any data that you sent to it. But the throughput control will happen at the consumption rate. Instead of saying after five request per second, don't accept requests anymore. You'll say, "Okay, my server can process up to five requests per second." The event gateway deal with all the traffic, cue it, and then send it to me at this rate. 

but you can look at all of the different value props in the API gateway and kind of do a similar proxy. Security, observability, visibility, monitoring, error recovery. Kind of all those value proposition exists in a similar context but are applied very differently. 

[00:09:59] SF: And then you mentioned being able to push the event to multiple places. Is that essentially something kind of like a subscriber model where I could have multiple services subscribing to the events? That's very different than what you would sort of expect the behavior of like an API gateway to do.

[00:10:16] AB: That's exactly right. Specifically, Hookdeck. But, I mean, that's true of all event gateways. Really, they're message brokers. That means there's a Pub/Sub pattern where you can have multiple publishers, multiple subscribers. In the specific context of Hookdeck, publishers and subscribers are in a many-to-many relationship. That means you can fan out where a single publisher will go to multiple subscribers. And you can fan in where multiple publishers will go to a single subscriber. 

There might be like small nuances in kind of like how different API gateways are implemented. In our case, we don't call it publisher-subscriber. We call them sources and destinations. But that's really just a terminology thing. For all intents and purposes, you can think about it the same way. Yeah, that's completely right. 

[00:10:57] SF: Yeah. That's my previous life as a Googler working on Pub/Sub coming in there. What are some of the things that I can actually do with an event gateway? We've been talking about some of these things like fairly high-level sending these events to different places. But what's like a specific use case that I'd want to use this kind of technology to help me solve? 

[00:11:16] AB: Yeah. I'll call it kind of like four main use cases. But the reality is that like the lines get blurry at times. People have use cases that will kind of very much like overlap with multiple of those like use case categories so to speak. But I can give you kind of a brief overview and then like dig in whichever ones you find most interesting. Or maybe applies to your previous life. 

The kind of bread and butter is receiving webhooks, which is what we've been kind of spelling out for the last three or four years. And that is the scenario that I was describing earlier with Shopify webhooks coming in and your system needing to process those. 

Not to kind of like go on to the next one. the second one that we see often is sending webhooks or really just publishing event streams. Those are example in kind of like the other use case where it's on the outbound. It's you as a system now needs to publish event to subscribers that you don't control. That could be your customers. That could be another platform. That could be a public event stream that anyone can subscribe to. Things like that. 

The third one is asynchronous APIs. And that's maybe the one where we're having the hardest time kind of like finding a firm label for. But when really you think about it, there's two types of API endpoints that you're going to send up. You're going to have the API endpoints are kind of like classically the ones you think of for your API where someone can create a new entity and you return the entity that's created and so on. 

But we also see more and more people building what we call asynchronous API endpoints. Those endpoints are publishing endpoints where you send data to. But you don't expect any specific response beyond kind of validation. We see a lot of adoption for that in IoT devices context, on customers kind of sending data to your own API. 

One example of that is customer of ours is a metered billing solution. And their customers will publish metered event data to that endpoint. For instance, this customer has processed this amount of gig for this hour. And then we'll kind of continuously just like report on that usage, right? 

The last thing that we see a lot is SDKs. Being embedded in SDKs, and libraries, and so on. For instance, Sentry is an open-source software. And you can go check how their ingestion pipeline works for all the errors that people publish to Sentry. And it's more or less like the same as what we've built at Hookdeck. It's this decoupled asynchronous system that will receive all the error data from all the distributed SDKs that people deploy on their website, on their services, and so on. That's the asynchronous API use case. 

And then the third one is connecting third-party API. We see a lot of people kind of like almost using it as a very technical version of Zapier. Kind of like the Zapier for engineers where they'll stitch together different tools. For instance, GitHub webhooks with their chat operation software. Or marketing tools like Marketo with Iterable and so on. And that's really like the use case that we see a lot of.

[00:14:07] SF: Yeah. You could almost use it – the Zapier use case, you're kind of almost using it like the transformation part of like ETL where you're sort of transforming probably the format of the object or whatever the call is into another structure in order to call the other API.

[00:14:22] AB: Right. Specifically, in our case. But all event gateways have dysfunctionality. In our case, you can run a JavaScript post-processing function on all the data that comes in. You might format it into valid data for another API. You might transform XML into JSON. You might decode GWT tokens. Kind of like all that stuff. Right? That's something that we definitely see a lot of. 

And I think something to note is the event gateway, really, it's a cloud-computing primitive. And what that means is that it's building blocks, it's capabilities. But then you stitch them together to build the product experience that you want. The event gateway is not a SaaS product where you dictate a specific product experience that people want to buy into. It's much more set of capabilities and then it's up to you as engineers and developers to kind of get clever. 

And for us, kind of like at Hookdeck, I think our role is to say like, "Hey, by the way, here's kind of the most common quickstarts that we see." Those are like starting points. But then we have people building all sorts of product experiences on top of it. It's very unopinionated in the way that you can use it.

[00:15:23] SF: Going back to the first use case that you touched on, which was like consuming events from webhooks. If I'm not using something like this, are there actually cases where if I was consuming Shopify, I could basically accidentally deny a service attack myself with the number of events that I could potentially be getting?

[00:15:41] AB: Yeah. I hope I'm not going to repeat myself now. I probably should have listened to last year's episode. But it's funny. Because I did that once at my first employer. The first time I took down a production API was exactly how I did it. At that time, I was working for this basically kind of Expedia for buses. And we had those email campaigns that I would send you other trajectories that you could be. 

If you took a bus to Boston, then we might tell you, "Hey, you might want to go to New York, or Washington, and so on." Right? And I triggered this like first campaign after I was building the service that would generate those emails and like figure out the itineraries and all that and completely forgot about it. And then proceeded to receive something like a million webhooks in the next 15 minutes. It was Friday night. the lead engineer on the backend team really did not like me that night. But that's like very common. 

What you'll see is like things like Black Friday or like flash sales and so on causing like fairly significant issues where the [inaudible 00:16:36] very irregular. And we see that in our own kind of like users' data. In a way, I think of Hookdeck as like arbitraging that capacity. Because we have so many customers now that any single customer spiking out, it's like whatever. But for that individual customer, they got 10x, 20x, 30X the volume over a minute, two-minute time span. And there's just like really no way that we have been able to deal with this without kind of like the engineering work associate with the capability of doing that, right? 

[00:17:04] SF: Yeah. It helps with like sort of aggressively spiky traffic. You don't have to have necessarily all the – 

[00:17:10] AB: Yeah. One kind of like more an anecdote I guess on this is one of our lead investor was Twilio's VP of product. And it was a common issue at Twilio that they would DDoS as their customer. It's one of the things that led them to invest, right? It was like, "Whoa. We used to see this all the time." 

And there's a lot of other things that you realize that like best practices are not necessarily well-implemented. Twilio, for instance, knew that most people weren't verifying the webhooks that were being sent out by Twilio and so on. I think our responsibility goes beyond scale. It's also like best practices around processing, and security, and all that.

[00:17:41] SF: Yeah. It's hard to like necessarily assume that someone who's consuming an API necessarily – even if it's unintentional, they might not have like the engineering resources to even think through some of those problems. Or they think like, "Okay. Well, that's a high-quality problem if we get to that state." And they kind of maybe forget about it at some point. And maybe they encounter this problem like a year down the road after they've already done sort of the MVP version of it and launched it to production.

[00:18:07] AB: Yeah. Similar story with verification. You're just trying to ship things. And usually your priority is on like the value ad that you're doing. It's not on whether or not like this is safer production and has all the edge cases covered and all this. What you then do is you kind of do the minimum valuable of that and then eventually it bites you back. Hopefully, we can build something that kind of avoids this downside. You can just build your application logic, put it out, and then know that you're safe for your production workload.

[00:18:35] SF: Yeah. I want to also talk about one of the other use cases you talked about, which was essentially being able to handle asynchronous APIs. Because that's something that I've countered in a prior product that I worked on. That it was really hard for the people who were consuming the APIs to understand how that works. It's a little bit different than a lot of times people are used to interacting with. But for certain types of use cases, it essentially has to be that way. Because there's like processing steps that need to take that are long-running. And the job might not be done immediately to be able to be returned and expect a result. You might just have to essentially execute an API endpoint and then some processing step happens. And then, asynchronously, you're going to consume something to let you know. Or you have to hit another API endpoint. Is that like a growing use case that you're seeing as APIs transform to handle more of these kind of long-running jobs and more complex scenarios? 

[00:19:28] AB: Yeah. Long-running job is a great example. Actually, long-running jobs are really interesting because they're a combination of two use case. The first use case is that you receive the request to trigger that job. And then the second use case is that you need to then notify that that job was completed, right? In which case, then that becomes kind of like an outbound webhook. 

Yeah. And that is also an example of blurring those lines that I was talking about where it's kind of bidirectional. It's not necessarily unidirectional. And now you're blending different ways of kind of using those capabilities to build that specific product experience that you're talking about. And I think that's a great example. 

A lot of what we see too has to be with contexts or use cases where the answer really doesn't matter. For instance, you have IoT devices on the field are tracking packages, are sending status update on where that package is at. Now that kind of becomes a fire and forget so to speak. And you're really not going to rely on the tracking device and all the retries and kind of all that. You want to have whatever is set up in the cloud be able to accept those requests no matter what. Yeah. I think that's definitely one of the top example that we see a lot of. And we love those kind of use cases too because they make use of all capabilities. And it's not just solely focus on one specific way of using it. 

[00:20:43] SF: And if I'm not using something like an event gateway for handling this now, what are people typically doing? How are they sort of navigating these problems currently? 

[00:20:55] AB: Yeah. I think so. The status quo right now is basically stitching together a bunch of different solutions and kind of like gluing or building the glue kind of in between. And that tends to lead to much more complicated infrastructure in terms of what you actually need to spin up to be able to do this. And, also, something that's a lot less maintainable. 

And one of the things that we see a lot of is it's especially difficult to recover from errors. One thing we see a lot for instance is people in AWS environment, they'll be using kind of the full suite of services to do similar capabilities. In the AWS environment, you'll probably be using API Gateway. You'll be using S3 buckets. You'll be using Lambda. You'll be using Cloudwatch. You'll be using SQS and kind of all those different services. And the code that you need to write to have those kind of like interop together nicely. 

The other thing too is I think there's kind of like fundamental challenges in event-driven architectures that have been kind of maybe solved from more a technical standpoint. But not really from a developer experience standpoint. And that's the other set of things that like technically you don't really have to build. But if you have a big outage, now you're probably going to spend 24 hours dealing with it rather than the 10 minutes that you could take because you haven't really built the tooling around it. 

The more sophisticated kind of teams that we talk to, they also have like a whole set of custom tools that they built around dealing with their dead letter cues, and processing errors, and polling to make sure that the data integrity is correct from the events that we're supposed to receive. 

There's kind of like no real limit to all complex. You decide to make this. And depends on your requirement. Do you need to be able to search a payload of the events you've received? Yes. Okay. Well, now you're probably going to need something like Elasticsearch or ClickHouse to be able to do that. Right? 

The problem kind of compounds based on the requirements of what you want to be able to do. And I think there's a need for this cloud infrastructure component that has like the right set of those functionalities together for those specific use cases. 

One of the tool that we consider, kind of the main competitor to Hookdeck right AWS EventBridge. But all the big clouds have kind of EventBridge equivalent. And I think, in many ways, it's a set of those functionalities. But that isn't like completely fully actualized into a developer experience that makes sense. And that's really what we're trying to build. We're trying to build kind of like the fundamental ideas that those are trying to address. But building in such a way that is compatible with everything that you work with. That is easy to understand. That is easy to adopt. And make sure that when issues occur, you're going to be able to recover from it and know that you have full data integrity.

[00:23:27] SF: What is the challenge that people run into using EventBridge? And then what is it that Hookdeck is essentially doing better or different? 

[00:23:36] AB: Yeah. The main issues revolve around the expertise that's needed to be able to do that. There's a lot of background knowledge that you need to have to be able to like spin that up. And if you've used EventBridge, you know that at least the first couple of days that you're going to be on it, it's going to be like always in the documentation and trying to like figure it out. And a lot of things are not necessarily going to be very intuitive. And if you don't have that like AWS background knowledge, then it becomes very difficult to put together. 

I think there's a concern around how is it to spin up? But, especially, that compounds as you have more and more event sources. Because then that becomes a management problem. Where are events coming from? Where they're going? Under which condition? It becomes very difficult for engineering organization to truly understand what this event flow is like. And I think tools like EventBridge are not necessarily making that easier. They might make it possible. But they don't necessarily make that easier. That's the first point. 

I think the second point has a lot to do with errors. And that's a little bit to what I was talking about before. There's kind of like this approach right now in event-driven architectures that like those errors are like real in quitted to dead letter cues. And they're kind of like, "Oh, you'll figure out a way to deal with them manually." Or so on. And there's very little around helping you understand what the context of the error are. How many events are impacted? What were those events supposed to be? Are you sure you processed all those events? And all that. 

And I think like Hookdeck plays a big role in this. It plays a big role in making you very confident that all the events that you received went to the right place under the right conditions and actually got processed the way they were supposed to process. That's what I tend to think as the the biggest challenge. But then there's also kind of like the general DX from AWS. And then, also, you're locked in into the AWS environment when you use EventBidge. Something like Hookdeck is cloud-agnostic and so on. There's kind of a lot of details to that. But I'll spare you some.

[00:25:34] SF: Yeah. Can you comment on the challenge around figuring out like where the events are going? How does Hookdeck essentially help me solve that problem? And are there situations where I find out that maybe the events aren't going to the right place and I can make an adjustment? Or how do I sort of handle those error scenarios? 

[00:25:55] AB: Yeah. For sure. One of the core concepts of Hookdeck is connections. And connections is this relationship from source event producers to destinations event consumers. And in the case of Hookdeck, it works as a kind of HTTP proxy. All the events come through HTTP request. And all the events go out through HTTP as well. That might be – this HTTP point will be calling your server. What you would usually have used as your URL to receive those events without Hookdeck. 

Those connections are kind of like mapped out to you. We have those kind of relationships clearly mapped out to you kind of like in the dashboard and so on. You can see like, "Okay, those connections are connected to those." And we let you express that. And we let you express those relationships whichever way you want. You could say, "Okay. I want to see all the source are going to which destination." Or the other way around. What are all the source that are feeding into my specific destination? 

And then from there, another thing that's fairly novel when it comes to event-driven architecture is that we give you a full list of what those actual events are. In that list of events, you can see, "Okay, what are the events? What's the data? What's the editors? What's the body? What's the request? Where did it go? To which HTTP URL? With what data? What's the response from that URL?" And so on. Nothing is kind of like obscured from you. It's not just a message in your queue that you really don't know what it is until you remove it from your queue. It's clearly just like showed to you. 

At any given point in time, you can also change who that consumer should be. For a specific destination if the URL is what have you. If you change that URL, then the changes apply kind of like automatically. And any new delivery will now go to that new URL. That means you have like a single pane where you can literally see every single piece of data that went through it. What happened to it? To where it went? And what are all the other connections that it would have received the event from? And I think that's something very powerful that you don't really get elsewhere.

[00:27:49] SF: What can you tell me about what's actually happening behind the scenes at Hookdeck? Let's say a webhook fires an event from a third-party service. Hookdeck receives the event. What happens next essentially? 

[00:28:02] AB: Yeah. We've built a system in kind of like a very decoupled way. What that means is data comes in. And that's how we can kind of guarantee that we're going to have very high uptime and very high ability to process those events as that event come in. And then there's no relationship whatsoever to the event life cycle itself. What that means is that there's no shared dependency between kind of receiving events and processing events. 

The event comes in. It goes through Hookdeck routing logic. We determine, for this specific event, what are all the eligible destinations of where this could go? And that is based on the conditions that you set. That's based on your filtering rules basically. Let's take the Shopify example again. If there's a product update and the inventory count in that product update is equal to zero, then this product just went out of stock. And then, therefore, that might go to destination A or destination B based on those conditions. 

The routing rules are applied. The transformation are applied. The delay are applied. We check whether or not this connection is paused or not. And then the first delivery attempt happens. One of our kind of big objective is now to make sure that this first delivery happens really fast. And then based on the result of that, then those events are now scheduled for future deliveries. If there's a failure, it's is going to get scheduled, let's say, in five minutes. Depending on what your retry logic is configured to. And at that point, Hookdeck basically becomes a queue. 

What that means is that all those events are scheduled for delivery either because you've already received too many or because they're being bulk retry or because you just un-paused and so on. Now it gets stacked in that queue. And Hookdeck works as an HTTP push queue instead of a pull queue. 

You might be familiar with that. Google Pub/Sub for instance does have a push-based mode and so on. In a push-based model, what's super interesting is that the producer or – yeah, the producer kind of becomes responsible for the rate. The queue itself determines the rate. And what that means is that you don't need long-running services to consume that data. That could be like serverless functions. It could be any number of workload that you're running into. But you don't need processes that will be pulling from that queue. 

And what that allows you to do is also have like a very good understanding of what the consumption rate for any given queue is. Because it's no longer kind of like a factor of how many workers you have. In kind of like standard EDA model, you're thinking about, for instance, each one of my worker can maybe process like 100 a second. Have five of them. And then, therefore, I'm probably consuming at about 500 a second. 

We flipped that a little bit where now it's Hookdeck's responsibility, the gateway's responsibility to make sure that the speed of delivery is set to like whatever you have it configured. 

[00:30:50] SF: Given that you are building up these queues, I can pause them indefinitely. How is things essentially built so that you can securely deliver those events whenever someone needs them and make sure that nothing happens on your end in terms of an outage that leads to like loss of events. How are you sort of managing the guarantees around event delivery? 

[00:31:13] AB: Yeah. To come back to a little bit what I was saying around us deciding to kind of like decouple this ingestion from the event life cycle, we made an engineering tradeoff here. And the tradeoff is basically to say, "We'll always want to optimize for data integrity over data latency." And I think it's all well and good if you could get the highest integrity, the lowest latency, and all that. But you kind of got to start picking and choosing somewhere. 

System is engineered in such a way that if things where to go wrong, the consequence is that the latency will go up instead of the data being lost. And that's really kind of like the philosophy with which we build things. 

When it comes to kind of like the securely – ultimately, Hookdeck will retain the data for a certain duration. Kind of like their retention window. And we retain data regardless of whether or not it's acknowledged or not acknowledged yet. Kind of success and failure. And because that data is like strongly persisted, that makes two things. 

First of all, you can audit what happened and what didn't happen. And also, you can always replay it. Even if your like system for some reason went – said it process the data but didn't actually process the data, you can go back to it and refetch – either fetch that data directly. Because we have it kind of strongly persisted. Or you can have it redelivered. And I think the combination of those things makes it – you can be like very confident of what you received and what you haven't received. And you can check us whether or not that's the case. 

I don't know if you – when you talk talked about security, I don't know if you also kind of referred about the transmission security itself and the signature verification and all that. Happy to get into this too. But – 

[00:32:46] SF: Yeah. I would assume that your – basically, customers are able to offload some of the guarantees around making sure that the person sending the event is the person that actually should be sending the event versus someone who is doing it for like a nefarious reason.

[00:33:00] AB: Yeah. Absolutely. And I think like security in this context is almost a bit of an odd topic. Because webhooks are not really like a standard. Or events in general are not really standard. You're going to have a lot of different mechanisms that people are using or kind of schemas and so on for doing that verification. 

Our approach is like I think we feel like a big sense of responsibility in trying to make that simpler. Hookdeck itself has built-in verification for a lot of different source. That might be like kind of the generic one that you expect like API keys and basic – and so on. But we also Implement custom signature verification for almost any big kind of provider you can think of out there. 

And the way this works is then Hookdeck will resign any of the data that we get with a Hookdeck signature itself. What we do is we standardize it. Where data comes in in any different security mechanism, we do that verification. And then we send it as Hookdeck. And now the only thing your system needs to care about is that Hookdeck sent it. It doesn't need to care about was it Shopify, or big commerce, or who commerce? And is it valid or invalid? And so on. You can only like solely kind of rely on the Hookdeck signature. 

And I think like what's great about this is like as you add new use cases and now suddenly you need to receive webhooks from a different platform, you don't need to like take this burden of making sure that now it's secured. Specifically, Hookdeck uses Hookdeck to receive webhooks, right? No surprises there. And we have this like generic handler in our server code where basically it verifies that any given request that's coming from Hookdeck has a valid Hookdeck signature. 

That means now I can add a new provider. I just added a new payment provider. And for that new payment provider, you don't really need to start paying attention to what that security schema is and so on. And you can just like go live knowing that it's going to get verified because of the Hookdeck signature. 

There's also things like static IPS and so on that kick in there too. We can redeliver all requests with expected IPs and those sort of stuff. But that is kind of like extra security so to speak.

[00:35:04] SF: I think that's a really nice value add to standardize that. Because I have seen people really struggle with – if you're using some sort of like HMAC, like sign signature, and getting that right, it can be like a real area of frustration as you're starting to – you're just trying to get up and running with this API to make sure that it works. 

[00:35:21] AB: Yeah. And at this point, we've seen it all. There's people coming to us like with all sorts of crazy verification. And the vast majority of them are not that bad or like fairly standard. Especially if you're already familiar with HMAC and like semantics of that and so on. 

But every once in a while, we're going to get one where it's like what the hell were they thinking? I think Zoom webhook come to mind. AWS webhooks come to mind. Google Cloud comes to mind. There's definitely handful of there that want to make sure that you're going to have a very difficult time verifying the authenticity.

[00:36:02] SF: Yeah. Even within the cloud providers, their individual APIs use different methods for essentially verifying the webhook event. Even if you're consuming only AWS APIs or only Google Cloud APIs, you still have to end up writing like a couple different verification methods, which is really kind of annoying and frustrating.

[00:36:22] AB: Yeah. And security is kind of like a fine line too. We've had like a long conversation actually on this that we published on YouTube with Hookdeck's CTO and I. But it's a fine line. Because if you make it too complicated, especially for something that's optional, like verifying the authenticity of the event coming in. Suddenly you run the risk of people not verifying. Now you got to ask yourself the question, "What's better?" A more straightforward verification that maybe is marginally less safe than like whatever other approach but that like half the people won't implement? Or the one that's super straightforward. Everybody understands and everyone will Implement. Yeah. You got to ask yourself that question. But hopefully, with someone kind of like abstracting that complexity away from you now, you don't really have think about it anymore.

[00:37:06] SF: Can you walk me through sort of your process where you pivoted towards an event gateway away from sort of the pure webhook management original vision of the product? I guess how did you think through that? How did you make that choice? And how has that impacted maybe your go-to-market? 

[00:37:25] AB: Yeah. It's funny because I take issue a bit with the word pivot. And the way that I think about this is, really, we repositioned. And now that might sound pedantic. But let me explain myself. The product really didn't change. The product remained the same. And I think really what we did is we looked at the way people were using the product and we decided to spell it out. 

Because before then, users would already use it as an event Gateway. Obviously, no one would really call it that. Although, actually, a couple of users did call it to like with similar terminology and full times. The users would kind of like use it in different ways that wasn't specifically receiving webhooks. Engineers are clever. They figure things out. 

If I can be lazy and kind of use this tool to solve my problem despite their marketing not saying that I can use it to solve a problem, why do I care about the marketing? I'm just going to do what makes sense to me as an engineer. And like I think a lot of people kind of picked up on that and it got to such a point – there's one specific week that I remember pretty well. Because we had multiple users ask us to support like custom URLs to receive their webhooks. White-labeling their URLs. Hookdeck will generate like a hk.dk.events. Kind of like custom URL for each one of your source. But now people want their brand on there. And it's like why do you care what your brand is? You're going to give that URL to GitHub. It's not that GitHub really cares what your URL is. 

And when we started probing, those people were embedding those URLs in their SDKs. And it would like show up in the network tab and so on. Or they would give those URL to their users. Or make them user-facing in some way. And that like made it really clear. But we got something like 25 requests for a custom domain in the span of like one or two weeks where it's just like, "Okay, I think we got to start listening to them." 

Those kind of use cases kept coming up and creeping up. And more and more of the event volume they were dealing with like just didn't neatly fit the definition of webhooks. It took some time to reconcile like what's really the common denominator of all those use cases that people are building on. 

And I think when we landed to event gateway, it was just very clear to us but also to the users that we talked to that that was the right framing for how they were using it. In that sense, we changed a lot of our marketing. It influenced our product roadmap for sure. But at the end day, it's the same product that we're like now, "Hey, not only can you receive webhooks. But you can do all those things. And, really, you should be thinking of this as like a new piece of cloud infrastructure primitive that you can use the capabilities to build your product experience on top of. And we want to make it clear that this is not necessarily like single purpose. It's really more this like set of tools for you to use." 

[00:40:11] SF: Mm-hmm. Yeah. Speaking of the white-labeling domain, we actually had a very similar situation at my day job, Skyflow. Completely different product. But, essentially, our SDKs – we have frontend SDKs that can show up in that network tab. And you can see essentially that some company that you might be using is using Skyflow to, I don't know, accept a credit card or something like that. 

And then a lot of companies want that to look like their brand. I completely get that. But that wasn't something that we started out with as a feature. It essentially came to us from the ask of the customer. And then we had to figure that out and build up for them. 

But going back to like this idea of the event gateway being this new primitive for cloud infrastructure, does that then create like a new challenge in terms of like you have to teach the market essentially that this thing exists and what it's used for. And what problems it helps address? And that's kind of the standard problem of creating a new category. There's pros and cons where sometimes it's necessary. And if you can win the category. It's fantastic. But you have that sort of uphill battle at the beginning to essentially teach people and teach the world that this new category product makes sense and is something that can be helpful for them.

[00:41:21] AB: And I think like in the prior positioning that we had, ultimately, we were still trying to create a new category. I think we learned a lot doing this that creating new categories is inevitable. Otherwise, we'd have nothing new, right? At some point, someone has to build a new category. It just has to happen. Yeah, sure. Are you taking on kind of like an additional challenge? You probably are. But you kind of need to do it. Well, not necessarily everyone. But some definitely need to do it. 

In that sense, yes. And I think like the burden of education kind of falls upon you. But one of the big difference when you choose and going kind of like establishing a new category, and I think maybe a mistake that we did in the first place and now try and correct for, is that you want like strong proxies. You don't want to just like be coming out of the blue with something completely new that has no competition and no proxies or like things to compare it to and so on. 

And I think it's really important for us to do this like proxy to the API gateway, to the event gateway that we were talking about earlier. Because now that helps like anchor and kind of like engineers and software developers' mind what the set of responsibility and kind of like where in the infrastructure spectrum like this is supposed to live. 

And now I also kind of welcome competition. I think as founders, there's this natural tendency to not like competition. Because competition seems to be in your way. But like really if you think about the big picture, it's like, well, anyone that helps establish this as a concept is more than like kind of welcome. Because I think the big challenge is, yes, build a good product, and build a good offering, and build a good marketing motion and all that. But I think also the bigger mission is to like say, "No. Wait. Hold on. This is a big problem area. There's no specific tools to solve that problem. Let's establish that as a new set of tools. And then if we do a good job, have a competitive product and a good product, we'll naturally be winning in that category." 

And there are existing products kind of already being an event gateway. I mean, EventBridge. Right? EventBridge, event gateway. Come on. More or less the same thing. I think there's just like really wasn't any real kind of like a direction for where this goes next. 

And I think that's a little bit where we see our responsibility of like taking inspiration from what was already done. Doing that proxy but then saying like, "Okay, where does this go in the future? And what's lacking from like the current ideas in the current implementation for this vision to kind of fully realize?" 

The teaching part I think for us is like the burden to make a credible point. And, obviously, talking with you as part of this. There's kind of no hiding that. But then also encouraging the other people also competing in that market to say, "Listen. We're all on this together. If we want people thinking event gateway as a cloud infrastructure they have to go and buy. Some of them will go and buy Hookdeck. And then we'll go and buy yours." 

I think it's important for us to like do those kinds of partnerships, and those collaborations, and so on with and everyone that kind of stands to benefit from that. And it's not just that other event gateways. It's also all the event producers and the event consumers. 

For instance, we're building now a middleware for Vercel and things like that where there's like all those different players that their users themselves are struggling with this. For instance, now we're referred in GitHub's documentation, and in Octa's documentation, and so on. And it's because like they know their users are also having that problem. In that sense, it almost like becomes kind of like a market responsibility to say, "Well, okay, there's this set of problem. There's now solutions for it. Let's spell it out." 

[00:44:59] SF: Yeah. I mean, it sounds like those are a great sort of ways of validating that. It makes sense to have essentially this new category. I mean, also, you said that some of your existing customers were already kind of referring to it as an event gateway. Going back to the idea of having a bridge or sort of proxy to another type of technology so that people can assist in creating those mental models for what you're trying to do. 

[00:45:24] AB: I think too on like the teaching side of things. We also have another kind of huge benefit to like lean on, which is people have been building some form of this for the last decade. And so, that means most engineers that we talk to, they're actually pretty familar with the problems. They live through it. There's a moment in their past where kind of like my story earlier with the bus tickets, an API went down or they spent way too much time building this and so on. And then, therefore, it's more like doing the translation of, "Hey, remember that problem that you had? Well, now there's like kind of solutions to this." Right? 

But we have something very tangible to lean on and like a real pain that people have lived through. And I think that's definitely really important for establishing the event gateway as this primitive.

[00:46:09] SF: Yeah. Makes sense. Awesome. Well, as we start to wrap up, is there anything else you'd like to share? 

[00:46:15] AB: Well, I'd love for everyone to like go check out Hookdeck.com and get in touch at alex@Hookdeck.com with any feedback questions. We're more than happy to help you with like solution engineering and kind of like think through where that could live in your infrastructure and how it would help you solve your problem. Please reach out. I'd love to hear from you.

[00:46:33] SF: Awesome. Well, Alex, thanks for coming back. This was really, really interesting. And I'm interested to see where it goes. And, hopefully, we'll have you back another year from now. 

[00:46:41] AB: I'd love to. Yeah. The yearly check-ins. 

[00:46:44] SF: Yeah. Exactly. 

[00:46:45] AB: Thank you. 

[00:46:45] SF: Yeah. Cheers.

[00:46:46] AB: All right. Take care.

[END]