[0:00:00] L: How do you cache your application? There are hundreds of techniques to satisfy dozens of different types of caching scenarios. Database caching, caching the response to queries to your database is an essential part of any caching strategy. PolyScale offers an innovative approach to database caching, leveraging AI and automated configuration to simplify the process of determining what should and should not be cached. Ben Hagan is the Founder and CEO of PolyScale, and he is my guest today. Ben, welcome to Software Engineering Daily.

[0:00:39] BH: Thanks, Lee. Thanks for having me. Good to be here.

[0:00:42] L: Great. I'm glad you are here. Full disclosure, you and I have worked together in a couple of projects in the past. I remember when I first started working with you, we worked – I was thinking, PolyScale was just another caching mechanism, and you had asked me to write a white paper for you that was comparing PolyScale to Redis. I didn't really know anything about it until I started doing some research.

As I did the research on PolyScale, I became more and more impressed with what you guys do. It really is a lot more than just a simple cache, like a Redis cache. There's a lot more to it, but very focused on databases. I'm wondering if you could give me an overview of what does PolyScale actually do for you?

[0:01:28] BH: Yeah. No, very much so. I think if you’re to summarize PolyScale in a single sentence, it's a fully autonomous distributed database cache. There's a lot in there. The fully autonomous part is, like you say, we take a different approach to caching, where, I think, most people and lots of engineers certainly have used traditional caching solutions in the past. It's easy to become overwhelmed with complexity. You're typically starting with a blank canvas and working out what to cache, how long to cache it for, and all of the other complexities that come with building a caching solution.

That fully autonomous piece was really where we're focused on here is making the platform this side for you, what it can cache and how long to cache that data for in its simplest term. The whole focus is being completely plug and play. That's really what we wanted to get to. The second part of that is that PolyScale is, as you mentioned, fully focused on just being a database cache. Solutions like Redis, or whatever, you can obviously cache any pretty much any type of data that you want to. PolyScale is fully focused on being just a database cache. We use a side-car architecture to plug into infrastructure.

Then the final component of that is we're fully distributed. You can run PolyScale in a single environment, single node, or you can have that in a fully distributed environment. PolyScale, we run our own edge network to support different workloads. Or if you're self-hosting, you can build your own multi-region network. Yeah, really the focus is to make a fully autonomous plug and play database cache that you can deploy literally in minutes, rather than implementing from scratch and writing code.

[0:03:26] L: Cool. Yeah. Let's use the term sidecar architecture. Just since different people have different definitions of what you mean by a sidecar in different contexts. Sidecars in Kubernetes is different than other places and different meanings. Why don't you tell exactly what you mean by the caches and insert it as a sidecar? What do you mean by that?

[0:03:48] BH: Yeah. If you consider a basic architecture of maybe an application server, traditional application server with a database back-end, and there's traffic passing back and forth between those, and typically TCP-based traffic, PolyScale is a completely standalone component that sits between those two. It's effectively a transparent proxy that inspects all of the data, all of that TCP data that passes back and forth between the application and the database. From an architectural perspective, that sits alongside the application. PolyScale is a completely separate component. It's external to the application and it's external to the database.

[0:04:30] L: You're reading and interpreting the SQL statements here, is essentially what you’re doing?

[0:04:33] BH: Exactly. That's right. What we actually do is while this scale focuses – we're downloads for TCP layer three, layer four, and we are wire protocol compatible with various databases. For example, we support PostgreSQL and MySQL, Maria, MS SQL Server. We've got MongoDB coming soon. What that means is that you can plug this in and it will inspect that traffic and you don't have to write any code. There's no libraries to install. You don't have to change your queries, or anything like that.

Literally, the traffic passes through PolyScale transparently. Then what we do is we actually, under the covers, we take a look at what is that traffic that's passing through. If you're using a SQL database, like some of the ones I just mentioned, we inspect those actual SQL statements and we work out what they're doing. At the highest level, like what are these SQL statements doing? Are they reads or writes? That's really the first thing we look at. Is this a read query? Is it a select, or a show in the SQL world? Or is it a manipulation query? Are we inserting, updating, or deleting? Depending on the answer to that first question, we then do different things with the data.

In the read scenario, a whole bunch of metrics that gets extracted from that SQL data, including how often the queries are arriving at the platform, how frequently the payloads are changing from the database, all of those data points get fed into our AI engine, which builds statistical models on every single unique SQL query. Using that data, we can then determine, is it a good candidate for caching? Is there a use case here where we're seeing repeat queries? Can we confidently cache those, knowing that the database is unlikely to change in the time period that we're setting the cache for? That's the read side of things.

What that means is in its simplest form is that you can plug in – if I send a brand-new query to PolyScale, that it's never seen before, I'll get a hit typically on the third request. Then if I send a query to PolyScale that's similar to another query, so let's say it's got different parameters, for example, I'll get a hit on the second request. It takes those properties into consideration and then build and learns based on what that traffic looks like.

[0:07:16] L: Some of the things, you say learn, but what is it learning from the queries? You're not just caching a specific query. A basic dumb cache would take the query, read the result, cache the results for that specific query only, and only return that result of the exact same query came again, assuming it was never flushed, or whatever. Writes to that table probably flushes the entire cache. I mean, that's a real simple caching approach. But what you do is not only are you intelligent about deciding when to flush the cache, but you also are anticipatory on related queries and figure out what possible queries might be coming and cache those ahead of time, is that correct?

[0:08:06] BH: Yeah, that's right. There's a few bits in there. Yeah, when a query comes in, we share the knowledge that we learn about queries. We extract their parameters. If we're passing the SQL query, we'll extract the SQL parameters from that and we will call that a SQL template. That gives us the core of a specific SQL query. We build intelligence at the template level, which is the query without the parameters, as I mentioned, and also, down at the unique, semantically unique query level as well.

We have these two levels of intelligence around how those queries are behaving. What that allows us to do is that we parse the SQL query and we also understand what tables and columns and rows and fields are actually affected by those queries. On the right side of things, when a write comes in, that firstly, it just passes straight through to the database. Obviously, we don't cache writes. They pass through to the database. What we do do is we inspect those and we understand what data is actually changed by those update statements, or inserts, or deletes, and we automatically evict that data from the cache, just the changed data.

We keep this relationship between the queries that are doing reads, what data they're affecting, and then the queries that are doing write and what data they're affecting. We're intelligently able to invalidate the cache for just the changed data that's coming through the platform.

[0:09:43] L: Okay, now you've got large language models that do this AI selection criteria for you to determine how to read this. Is that correct? Talk a little bit about the AI behind how this works.

[0:10:00] BH: Yeah. We don't use LLMs. We actually build quite complex statistical models around all of the queries and how the queries are behaving. At the specific, like I've mentioned, some those different tiers of introspection. We'll look at things at the SQL level, the raw SQL query. We look at those at the template level. Then there's all of the behavior around the queries as well. That all feeds into the models to determine how cacheable are they? If we do cache them, when do we invalidate the data?

Let's just say, there's a whole bunch of inputs. The core ones are things like, as I mentioned before, how often are the queries actually arriving at PolyScale? Are we seeing a 100 a second, or what an hour, right? That is a major influencer as to what's happening. Then secondly, we've got the – we inspect how often the payloads that are coming back are changing. We can go and look at the results of those queries. When they come back, and as we store them in the cache and determine it, has that payload changed? Is that now a different result set for the same query? Again, the frequency of change feeds into the AI models.

[0:11:16] L: Okay, so they're more statistical than large language models. That makes perfect sense. Talk a little bit about the reliability of these statistical models. There's a couple of angles to look at. One is likelihood of getting a hit, and hit versus miss ratios and things like that. The other one is accuracy and correctness of a hit, right? False hits can be just as much of a problem as too many misses. In fact, it could be more of a problem from an application standpoint. Using predictive analysis, how do you keep incorrect hits down?

[0:12:04] BH: Yeah. We do two specific methods. One, as I've mentioned before, looks at the – we know if we've served a piece of data that's actually changed. We know if we've served a – if a response to a query that's a miss has actually changed since the last time we've served it. We know that that's now changing. That's a statistical-based invalidation model. That's a time-based thing. We'll always be evicting, always errs on the side of correctness over performance. We'll always err on the side of, let's invalidate if we're not sure.

[0:12:48] L: You rather have a miss than a bad hit.

[0:12:51] BH: Exactly. Yeah. Either that fits most use cases. Then you couple that with what we call the smart invalidation, which was the piece I mentioned before, which is if we can actually see those writes coming into the platform, then that feeds us even more accurate data into the platform. The nice thing about that is we can effectively set an unlimited TTL. We can cache that forever, as long as we can reliably see the invalidations. There you can get very, very high hit rates if we're seeing those invalidations.

Now, there are use cases whereby things can be updating and manipulating the database that PolyScale can't see. For those, for example, in maybe a travel use case, where you've got lots of bulk imports coming in from multiple different data sources. They may be going through channels that PolyScale can't see. It's not connected through a typical web application, or direct TCP connections to the database, for example. In those situations, we support importing a CDC stream.

We use the BZM to bring in a change data capture stream. That can pipe straight into PolyScale and feeds that invalidation pipeline that we use when we actually obviously observe the queries ourselves. Exactly the same design.

[0:14:16] L: Cool. Cool. That's how you keep from inaccurate hits. What about increasing hit rates with predictive queries? You'll pre-cache queries expecting to receive a query. Why don't you talk about that mechanism at all?

[0:14:36] BH: Yeah. What we do there is we, as I've mentioned before, we look at similar queries. It could be a semantically different query, but it has different properties. Let's say, select salary from table where user ID equals five. It’s the same as select salary from table where user ID equals 10. Same type of query, but semantically different. We build intelligence at that level, and then we can share those across the queries.

That means that if we see a brand-new query coming in that's semantically different, we already know that it's something we've seen in the past, and that we can go ahead and reliably cache that and get high hit rates, because it's unlikely to be changing, based on the knowledge from the other queries that are similar to that. You could have high cardinality queries coming in and still get very high hit rates, because of that intelligence that shares between them.

Then the other part is, I think, the piece you're mentioning as well is that you can – what we want to do is be able to expose completely predictive environment, where you can say, this is really the personalization use case. All the time, we're logging into personalized apps and I don't know about you, but certainly, my experience of logging into my cellphone provider, or my iTunes account, or my bank, or whatever it is and the performance, because it's all personalized, really suffers.

What we're working on is the ability to personalize and cluster those queries. We know, you log into your cellphone account, your cellphone provider. We know you're likely to go and do your typical behavior, which may be see your latest bill, or look at your latest SMS messages, or whatever it may be, and preload those and preempt that back. Exactly.

[0:16:32] L: Who are the rows from other tables that have the same account ID, for instance?

[0:16:38] BH: Exactly. That really changes the dynamics really of what a cache is, I think. One of the things that we think about a lot is that if you think about data distribution and different access patterns and query patterns, they typically vary wildly depending on use cases. I think that's just another use case, whereby we want to be able to preload that data in. We know the access patterns of that user individually, or we can crowdsource that across all the users and work out statistically, what's most likely their path through the site will be and then cache that data where it needs to be. Right at the edge, right next to the user.

[0:17:19] L: Let's assume I have a standard application, which of course, we all know is well-defined and well understood with a standard application.

[0:17:27] BH: Indeed.

[0:17:28] L: Say, sarcastically here. But, no. I mean, you making up whatever tape and say, go along with that you want. But I just want an idea as a customer with nothing unusual, an application, e-commerce site. That's a good example. Let's just do that. What type of optimization could I expect using this algorithm? Are we talking about 10% here? Are we talking about higher hit rates and then what type of hit rates and what type of ultimate performance optimizations, realistically, do you expect your customers to see?

[0:18:07] BH: Yeah. The two bits, I mean, just taking a step back with two focus areas of PolyScale. What do we solve? One is the just the raw cash hit performance, the query performance. PolyScale will serve any cached SQL query sub-millisecond at massive concurrency. You've got the sub-millisecond execution of a query. Then the second part is you've got the latency to go and actually execute that query network-wise, speed of light.

You could, for example, have your database hosted US East and you've got customers US West and you've got latency between the two. Those two primary focal points are what we do. If you take an e-commerce application, it's a great candidate for caching, because obviously, it's very read-centric.

The other thing about e-commerce is there's a direct correlation between the dollars to performance, right? We know if we can serve pages faster, we can serve customers better, we get brand retention, we get faster times to check out and general dollar increases. One of the nice things about the platform is that in a traditional caching environment, typically, a developer will approach this by trying to accelerate the slowest queries that they can find. Taking that e-commerce application, a developer may say, “Hey, I'll study the slow logs, or whatever it may be.” It may take the top 10 queries, and I'm going to look to cache those in some way within the application and build that from scratch.

Now, what PolyScale does is, obviously, because we look at all of the traffic flowing between the application and the database tier, it has the opportunity to cache everything all of the cacheable traffic. Traditionally, we call that the long tail of, there may be – I can give you an example. We have an e-commerce application, an e-commerce customer that we're running, and there's 10,000 to 20,000 unique queries running through the platform every day. PolyScale inspects and looks at all of those, including things like, session tables and stuff you would never want to cache. You want that to be fresh data.

There we're getting about 90% hit rates, 92%, 94% hit rates, because there's a huge amount of re-traffic in those applications. That's, again, the cool thing about that is we've done no configuration. That's just plug it in, literally. You can expect those hit rates very, very quickly. They're not the things you need to mature over a week, or several days. As I mentioned, you'll be getting hits on the third, the second and third queries.

Yeah, I mean, 90% plus hit rates is far from uncommon in those types of use cases, where they are quite read-centric. We debate this internally quite a lot. It's like, I've been asked before, what's a number you consider a good hit rate? Then I step through use cases that we work with. I think it's impossible to answer, because there's one customer, for example, they’re a Web3 gaming environment where they run, basically, complex leaderboards for their global statistics. That's a very small proportion of their overall database traffic, but it's also a critical one to their business, right? It really is a core part of their entire business. Let's say, we're getting a 10% hit rate. Is that good, bad, or indifferent? It really does depend on the use case.

[0:21:54] L: Yeah. I imagine different types of applications with different types of caching needs. You’re going to have different statistical analysis that's necessary to get the results. Yeah, it's one thing that the analysis just returns different results, but are there adjustments to the algorithms that you either can make, or you do make based on the type of applications?

[0:22:18] BH: The automation is, I think the answer is no. You're running the same algorithm for all types of applications. I mean, there's certain situations that are triggered by certain behavior that you may see in different use cases. Overall, it's a single algorithm that's being used. I mean, the funny thing, I think, about this is it's every single query is being analyzed in real-time always. Yeah, because it's always on, always real-time adjusting algorithm.

This is the other part of traditional caching whereby, I may, as a developer, set a TT analyst specific query for 15 minutes. Who knows if 15 minutes is actually optimum? Is it 13 minutes, 24 seconds, or is it six hours? We've got no idea as a developer. We're just guessing, until you start analyzing that data. That's what PolyScale does 24/7 on every single query, is always adjusting the algorithm based on behavior on the site, based on how much that traffic's being requested, based on how much the data's changing on the database. All of that information is always being gathered. That's something that's really hard to do for a human.

[0:23:38] L: Yeah, it is. It is. Yeah, that's just constant database tuning, or constant cache tuning. I'm sorry, not database.

[0:23:45] BH: Exactly, exactly. You can come in and you can set manual TTLs and you can do that at a query level. You can do that even at a database table level. If you've got use cases where you want to be very specific about TTLs, you can go and overwrite that if you so choose. I think, 99% of our customers never touch the – they're running full auto mode.

[0:24:13] L: I'm intrigued by the use case you mentioned where you are using the caching to do geo diverse queries. You have a database in US East, you're accessing it from EU East, or from US West, or whatever. Being able to optimize queries that way. Talk about that use case a little bit, because obviously, there's a much larger impact that caching for those geo diverse locations. The results of what is cacheable and what is not cacheable is pretty much the same and US East versus US West presumably, and how that works. How does that, from a global standpoint, how does the cache coordinate and make all that work?

[0:25:03] BH: Yeah. You think about the tremendous choices people have now, that kind of application to it. They can host multi-region really easily now. Just with your hyper-scaler, I can pick multiple regions and deploy my app, or I'm using something like Lambda. Maybe I've got that running in a few locations. You've got all those DNS tools wrapped in there as well for latency-based routing. Makes it super easy to distribute your app.

It's much, much harder to do that at the database tier, right? You end up with things like, read replicas. You start splitting your code to work out what's a read, versus a write and then which node should it hit, and you get a bunch of complexity there. We designed PolyScale to focus on be distributed from day one.

There's that use case in what classify as the hyper-scalers. Then there's the more edge-facing use cases that we work a lot with with companies like Cloudflare and Cloudflare workers and Netlify and Vercel and Dino, the ability to deploy into multiple regions as default. I think that database challenge there is a lot harder to solve that data distribution when you're thinking about running across 15 regions, or 20 regions. How do you solve that, without burdening the – having the latency issues?

I think the reality is a lot of enterprises just have latency problems. It's slow to connect from certain locations. I think, the CDN industry paved the way here, like we can serve HTML and binaries and static data really close to people at the edge. PolyScale’s doing some similar, but without dynamic data and the step back from where the actual application tier is being hosted. We want to get as close as possible to the app tier. PolyScale needs to be as close as possible to the application tier to keep that latency low.

Yeah, on the edge use cases, we see some interesting challenges for developers and firstly, things like initiating a new TCP connection is actually pretty slow with TLS handshakes and things of that nature. Then you've got the actual speed of light, plus network latencies. Then you actually execute the query at the database, and then you return that payload. That will vary depending on the size of that payload. If you drop PolyScale into that mix right next to your application tier, we do things like, we all maintain connection pools. We can maintain a hot connection. You're making a tier less connection from your application tier, just at PolyScale, which is obviously, very much faster than going all the way back to the database.

Then with the pooling in place, if we have a cache hit as well, you can initiate a brand-new TCP connection from your app, or your serverless function, or whatever it may be to PolyScale, get a cache hit and return that response in sub-millisecond, or one or two milliseconds. It can be very effective. Likewise, that's all tied into the invalidation side of things. When something gets updated in US East, that actually ripples through to every PolyScale location and invalidates that globally as well.

[0:28:33] L: How have your customers been taken to this? Obviously, you've got something here that's valuable for improving performance, back-end performance of databases for large key customers. But presumably, also for smaller customers, it's a lot to be benefited using this algorithm as well. What do your customers say?

[0:28:59] BH: Yeah. I mean, if you think, we definitely have lots of small customers who can be anything from people running hobby sites, blog sites, and maybe they're deploying, using Dino, or one of the services I've mentioned before, deploying globally. Why not just plug in PolyScale? That's going to remove, or lower that latency. It's going to accelerate performance. Drop it in. No configuration and you're good to go.

Then the other side of the larger customers, we had – what the customers are really enjoying is that you can literally deploy from scratch in less than 30 minutes. An example, last week, we had a pretty large customer signed up. They were using Cloudflare workers for a global use case. They serve about between 20 and 40 million requests per day, database queries, and they're doing that using a serverless HTTP API, and they were up and running a sub 30 minutes.

Getting decent hit rates and deploying globally and they're hitting 10, 15 regions around the planet, or across our edge network. It really does solve that challenge very, very quickly. I think the whole – certainly, developers tell us, it's really refreshing to not have to be writing this stuff from scratch. I think that's the biggest win.

[0:30:31] L: Cool. Great. Well, this has been a great conversation, Ben. As always, I loved talking to you about PolyScale. Just as a refresher to everyone, I wrote a small eBook for PolyScale on the comparison between caching with Redis versus caching with PolyScale and the advantages of the PolyScale architecture. I believe, you can still get that eBook from the PolyScale website. Is that correct?

[0:31:00] BH: That's right. Yeah. Yeah, it's definitely available on the site. Yup, free to download and definitely worth a read, I think, Lee.

[0:31:08] L: Yeah, great. I appreciate the conversation here, Ben. This has been great. Once again, Ben Hagan is the Founder and CEO of PolyScale, a high-performance, easily configured and set up database caching service for any application. Literally, any application sounds like. Ben, thank you very much for being on Software Engineering Daily.

[0:31:33] BH: Pleasure. Thanks for your time, Lee.

[END]