EPISODE 1713

[INTRO]

[0:00:00] ANNOUNCER: Redis is an in-memory database that can be used for caching, vector search, and as a message broker. Brian Sam-Bodden is a Senior Applied AI Engineer at Redis. He joins the show to talk about his work in AI at the company.

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[EPISODE]

[0:00:34] SF: Brian, welcome to the show.

[0:00:36] BSB: Good morning. Thank you for having me. I'm pretty excited to do this. It's probably the first time I'm in a real professional podcast.

[0:00:44] SF: Well, thank you. It's good to start off the show with compliments for the host. I appreciate that. So, thanks for being here. Why don't we start off with an introduction from the audience? Who are you? What do you do?

[0:00:54] BSB: Thanks for having me. I'm Brian Sam-Bodden, Senior AI Applied Engineer at Redis. I was formerly a developer advocate at Redis. A job that I took for almost four years, which allowed me to basically explore the extroverted side of my personality. I've been attending conferences since the late nineties. So, it gave me a paid job that actually supported my conference going. It was really great to basically interact with developers on a frequent basis, and kind of get an idea of what they were doing, the issues they were struggling with. Then, I brought that DevRel advocacy into our AI-applied engineering, which kind of gives me a better connection with community. How they're building things, what they're running into, and how to make those tools better.

Before that, I ran a consultancy for almost 20 years. We serve the likes of Walmart Labs, Visa, Intuit, and a bunch of other Bay Area and international companies. My venture into AI started in the nineties with basically perceptrons and things of that sort, but it was kind of the beginning of an AI winter, I believe, late nineties. Everybody would basically tell you that AI wasn't going anywhere. Then, focus on distributed computing. That's kind of what I started doing at that time. For the last almost 30 years, I've been doing everything from server side, distributed computing, rule engines. That's kind of AI adjacent, and I've been doing rule engine work for almost 20 years. But I have also done tons of front end. I have an artistic flair to my personality too. So, I got into basically building beautiful responsive UIs, and kind of got into the whole UI/UX world for a while, did mobile applications, web applications, and now I'm back in the server side, but I never left.

I'm a Java champion, which basically it's a developer or programmer that's basically cheering in, uplifting the platform everywhere we go. So, at Redis, I've been actually doing a lot of the Java in AI stuff, in trying to basically Python, it's definitely the 400-pound gorilla of AI. And we're trying to also make Java a good representative, or a good candidate for that, you know, data science, ML ops, and ML engineering crowd.

[0:03:20] SF: Yes. I mean, you mentioned the AI winter. I also, I was maybe a little bit later to the game than you. It was early 2000s when I first sort of doing my machine learning AI type of coursework, and also did a master's degree in machine learning. Back then, I remember writing a neural network for the first time, and being really - it was kind of blown away by what you could do, but it was also super simplistic, kind of like, toy problems. Then what ended up happening was there wasn't a lot of research funding for anything that related to neural networks. Most research institutions just stopped investing and doing research in that space, and everything became more about, more simplistic statistical-based models that you could run within the compute environments that we had available at that time. Naive-based classifiers, bays, trees, all these types of things that are fairly low cost to develop and actually execute. Then, we needed to do a lot of work, and we needed certain breakthroughs to happen, essentially for - coming out of the AI winter and into the AI summer, I guess exactly.

[0:04:25] BSB: Exactly. I agree, totally. That statistically powered AI era required people that basically were statisticians. Otherwise, you were just a user of a model and didn't have a strong background in stats, you were basically doing linear regression most of the time, but yet, I actually got into the rule engines world. That kind of was based on the [inaudible 0:04:50] algorithm. It was kind of like what forests and random forests and decision trees are like today. I started seeing from that basically, how - once we combine this tree, decision tree-based models, in statistical models, things were going to advance much faster. It really wasn't until it felt to me that the AI world was asleep, but people were working on it and creating and doing great progress.

But once the autonomous driving revolution started, computer vision models were the ones that first wowed us. I mean, I was wowed with like an MNIST classifier back in the day, and then once we saw object detection, object classification and segmentation, that's where I was like, "Wow, I got to jump back onto this world."

[0:05:39] SF: Yes. I remember we chatted, I think it was early last year, so a little over a year ago. We met at a conference that we were both at, and you had mentioned a time that you were sort of transitioning to do some of the AI-focused work at Redis. In a lot of ways I was looking back at it is, like, fairly early on to this Generative AI craze that we are in right now. So, I'm kind of curious, since you were fairly early into the wave, like, what are some of your general thoughts on where we are at and how fast things are moving, and some of the work that you're doing in the space?

[0:06:10] BSB: Certainly. At Redis, Gen AI has been in the context of Redis as a vector database. In the Gen AI world, RAG has become the predominant architecture. RAG started from super simple, basically just kind of a few guardrails. People were doing stuff at regular expressions, just to pick up keywords and stuff like that, and say, "Oh no, I can get the user that answer." Some semblance of prompt engineering, which has also advanced significantly. But in my mind, in the context of RAG, the retrieval part, it's something that people jump into without having that information retrieval background. I didn't have that background, but I had a little bit more than a lot of people did, mostly because of my work with rule engines, where basically there was always a retrieval component to basically default and knowledge bases and things of that sort. At Redis, we see that as the retrieval portion, optimizing that to the point that you can mitigate some of the issues with LLMs, like hallucinations and repetitive answers and things of that sort.

[0:07:17] SF: Yes. You mentioned information retrieval, which now it's like, what is old is new again. It's become a huge part of, well, RAG, which is now, in many ways, becoming the default architecture for building LLM-based applications to deal with things like hallucinations, real-time information, make it more context-aware to the problem that you're solving. I had, actually, my sort of background was more on the information retrieval side, so I did a lot of sort of TF-IDF back in the day and latent semantic analysis and things like that.

Now, in terms of RAG, does it sort of surprise you that from those humble beginnings, with using things like a RegEx or something like that, to it becoming this big thing that now there's a huge amount of companies that are just specializing in even just like small parts of the RAG pipeline. We're just going to work on getting your data ready for RAG. Or we're going to work on storing the data properly and retrieving the data properly, basically breaking up this pipeline in these small sort of minute pieces and then building companies out of it. Has that surprised you at all?

[0:08:19] BSB: Not really. I mean, when you have this type of pipeline architecture, basically, people start focusing in stages on the different nodes of that pipeline. At the beginning, it was just retrieval. Okay, let's just try to find the best information available in our knowledge base to beef up the context that we're sending to the LLM. But now, you're seeing things like, for example, re-ranking. There's companies that just do re-ranking like Cohere does a lot of stuff with re-ranking, and there's others more even specific companies that focus on re-ranking. That's now becoming a plugin type of feature that you basically will plug in into that node of your RAG pipeline.

Also, the concept that rag was a linear pipeline at the beginning. Now, we're basically going now more into the - this is another application workflow. We used to see that a lot in the rural engine world, where people thought the rule engine was going to be the one thing, the heart of everything, that basically control everything. But there were things that were somewhat non-deterministic with rule engines, and really hard to test. That is actually magnified with LLMs. LLMs is this thing that can do amazing things, or the dumbest possible thing that you've ever heard of.

So, definitely guardrails and a lot of tweaking of the stuff going in and out of the LLM is required. I see this architecture, it's not going to change, it's just going to evolve, basically. I see a lot of the things that LLMs do poorly right now, like hallucinations, probably being folded back into the LLMs, handling of those better. I see the retrieval part of the RAG pipeline growing. So, some things would be absorbed back into the LLMs as we progress in that field, things like quantization, for example. By having longer context windows, having smaller vectors, all that stuff, is going to impact how much the LLM can also do that rag was doing on its behalf. So, RAG, it's going to evolve in some areas and completely some of those modules in RAG will probably disappear.

[0:10:23] SF: Yes, in terms of, where does Redis sort of fit into this emerging LLM stack? Is it essentially focused primarily on the storage and retrieval vectors?

[0:10:33] BSB: So, that is the core power that Redis provides. But Redis is just not really new to this game at all. We've been doing search for six or seven years. So RediSearch, it's a search engine that was added to Redis almost a decade ago, and then vectors were added way before ChatGPT came out. Redis, the core tenant of the company, its performance. We might not do all the bells and whistles that everybody else does, but the ones that we do would typically be the competition in terms of performance, throughput and things of that sort.

So, I see, for example, Gen AI on mobile devices, or small form factor devices and LLMs getting smaller, having a layer that basically enables that performance and throughput and kind of plays along with all the advances in there. Because right now, the LLMs, it's the bottleneck for pretty much every application, whether it's RAG or not. Redis has always been kind of like the band-aid that sometimes you put on bad architectures, and then you say, "Voila, it's working great," even though you architected in some monstrosity. 

The same thing applies in this environment. Redis will always make your application faster as a basically, in-memory step to basically keep the data in a format that makes sense for your operational environment. But with vectors, actually, it's even more important, because now you're searching over very large space of vectors, and you need, for example, retrieval speeds and inference speeds, everything can be magnified with Redis.

[0:12:11] SF: What was involved with extending Redis to support like a vector type? When did that actually happen?

[0:12:16] BSB: So, that happened about four years, four or five years ago. I could be wrong, but it's around that range. Yes, I mean adding vectors. Redis, stands, it's a distributed dictionary. It's in a remote distributed data structure dictionary. The data structure part means that Redis, as long as you implement it, it can host a variety of data structures. It's a key-value store, but the value could be anything that can be implemented with C or Rust.

The vectors just became another sub-data type that we added to our JSON data structure, in our hash data structure. In JSON, we have vectors that are part of the JSON ecosystem, so an array of floats of different positions. For our hashes, we basically have binary blobs that represent the vectors, which makes vector searches with hash is incredibly fast. So, adding all this stuff took time, took time to basically bolt it on, onto the search engine, but they find a pretty good compromise of simplicity of usage in performance. But performance was always number one. So, even from the early days, our vector search typically beats the competition.

[0:13:40] SF: If I understand correctly, the key in the key-value store, when you're using vectors, is a hash of the vector. Is that right?

[0:13:48] BSB: No. In Redis, there's two workhorse data types that most developers use for most use cases. One is hashes. The hash, it would be the value, the key value. Imagine you have your key and it's pointing to a map or a dictionary in Python. Inside of that dictionary, you support different data types. So, the value on that dictionary could be anything that Redis supports. In the case of vectors, that would be a binary blob. You would annotate a field in a search schema as being a vector field, and then you can store bytes in it. Our libraries will basically take the embedding representation from the embedding provider and turn it into bytes to be stored as one of the values in a hash.

[0:14:41] SF: Okay. Can you walk me through, how does vector search work in this scenario? I want to look up essentially, like, I don't know, the 10 most related objects in my vector database related to some sort of input string that I'm going to create an embedding for.

[0:14:56] BSB: Of course. So, everything starts with basically defining an index. In Redis, you will define an index that it's either going to operate on a collection of JSON documents or a collection of hashes. In Redis, it's very schema-free in most cases. For example, if you have a collection of hashes, there's going to be a naming pattern in how those keys are created. In that case, what determines the collection. You tell the Redis index to basically, you're going to index this collection of hashes based on this prefix in the key and these are the fields that we have. So, you can say we have a field that's a numeric field. We have a field that's going to be treated as a tag, which basically means it's good for basically exact keyword searches. Or we have text, which basically allows a search engine to treat it as a full search, a full natural language search on that field, just like you do with traditional databases or search engines.

Now, the new addition to that was that you can now say this other field is going to be a vector field, and then you can decide how you're going to index, whether you use HNS.eu or flat indexing, and then you are also going to tell it what distance metric you're going to use to basically search for those. Now, you have, let's say you loaded all these hashes with data, with their field that basically corresponds to an embedding from a model somewhere. Your search basically does a k-nearest neighbor search on that vector field. So, you would say, given an input vector, let's say that it's - we're talking about text, and somebody gives you an input query. What was the main outcome of the Civil War, or something like that? We would take that input text and vectorize it with the same model that was used to vectorize those fields in the hashes. Then, you pass that input vector to the Redis search command, which is an FD search, and you describe things like how many neighbors you want to retrieve the k value and KNN, and in what similarity metric you want to use, like cosine or something like that.

Once you do that, you're going to get back a collection of those hashes, and you can determine whether you want to get the vectors back or just get the hashes, or just get specific fields, so you have fine grain control of basically what you get back from the search engine. You can also stack regular searches on top of that. You can do pre-filtering by, for example, saying there's a field in the hash that's basically the model of a car, or something like that. You can say, I want to do this vector search based on this input vector that, let's say, it could be an image. Here's an image of a car. Find me a car that looks like this, but only find cars that are Ford, for example. So, you can do pre-filtering before you do your vector search and you end up reducing the search space for the vector search.

[0:17:49] SF: Then, in terms of how this works in the distributed nature of Redis, so do you have to maintain these indices across?

[0:17:58] BSB: So, the indices are maintained automatically, basically. On Redis Enterprise, the distribution and all that stuff is transparent. So, when you're using, for example, Redis Cloud or Redis Enterprise, either on the cloud, our cloud, or on-premise or in a managed environment, all the distribution happens magically under the cover. For developers, for most cases, it's the same as working with your local Redis.

[0:18:25] SF: How big can this get? So Redis, is an in-memory data store. Depending on the size of your vectors, you're going to be constrained by the size of the memory available. Obviously, there is flash memory and things like that that basically use this storage to extend memory, and we provide that in our enterprise offerings. But in most cases, the use cases where Redis fits really well, it's where speed is a primary issue. Also, we're seeing a lot of novel use cases where our indexes are very cheap to actually build compared to other search engines.

We see a lot of ephemeral indexes used. For example, if you log in into, let's say, some kind of chat-based, knowledge-based for a company that has a hierarchy of permissions or knowledge that should be exposed to different levels of employee, you can create an index on the fly that basically takes those parameters, does a search against your overall data store, pulls out the data that would only be available to the context of your conversation, and use that as the knowledge base, an impromptu knowledge base for an LLM-based conversation. So, you can basically have indexes that appear, get used, and then get destroyed.

That kind of writes on Redis's use as a cache. For example, we have pretty much everything in Redis has a built-in TTL, time to live. So, for example, you can say, "Hey, after the conversation was done in five hours half past, just get rid of all that stuff." Yes, there's a lot of flexibility there for basically, what we call real-time interactions with Gen AI systems and RAG systems. When we mean real-time, we mean real time. We mean very, very, very fast interactions with Redis.

[0:20:13] SF: Because of the focus on performance, does this mean that there's specific types of AI applications that make more sense to build on Redis and perhaps other vector-type stores that exist?

[0:20:26] BSB: Certainly, yes. There would be systems where, for example, the indexing obviously, can go over a larger data set than we could handle in memory. There are architectural patterns to deal with that in Redis, and it's typically kind of like the composition in terms of hierarchy of data. But yes, there are use cases where other vector databases might be better suited. What we're seeing here is that, for example, nowadays Mongo, it's a vector database.

[0:20:57] SF: Yes, everything's a vector database.

[0:21:01] BSB: Yes. It's, I think, probably paradox has been revamped to support vectors, if it's still around, which I don't think it is. But we're still seeing, for example, Redis and Mongo. It's a very common combination, because what Mongo can do at the speeds necessary, then Redis fronts Mongo to handle that. So, we see cases like that where, since Redis can store vectors and Mongo can store vectors, you will, for example, offload some of those vectors to Redis for a session-specific interaction and to get that speed that you might not be getting from something like Mongo.

[0:21:36] SF: Yes. So, in that case, you're talking about combining essentially Redis with maybe another type of vector store?

[0:21:42] BSB: Yes. That's definitely an option, since for ephemeral indexing, in use cases where performance is critical, Redis might still perform as it used to before, which is a cache for your critical applications. In my mind, when the cache, it's the main interaction between you and your users, then that cache has become your database.

[0:22:06] SF: Yes. You talked about how the indexing automagically happens for a user. But what's going on behind the scenes in order to actually support the vector indices?

[0:22:18] BSB: Yes. So, Redis was always famous for being basically a single-threaded environment where every operation would block until the next operation, until you got a response from that operation. For indexing, this has changed a little bit. Indexing is actually a multi-threaded process that happens very fast in the background, and we support a few indexing schemes. So, one of them is the small, navigable - the Hierarchical navigable small world, HNSW, which is an ANN approach, an approximate nearest neighbor approach to indexing. We also support flat indexing.

So, when you want maximum precision, you would use flat. When you want to have a higher performance, you would use HNSW. In most cases, unless your data set is massive, I actually haven't perceived that much of an impact from using flat to HNSW, but a lot of our customers that are basically dealing with top of the range of size databases have experienced that. It is switching from one to the other. But the precision loss, it's typically negligible, because you have such a variable environment, which is the LLM, those perceive precision issues with indexing typically are so trivial compared to, for example, the output that hallucination or a bad context retrieval might have in the whole pipeline.

[0:23:48] SF: What about like sharding in this distributed environment? How does that work for these indices across like multiple nodes?

[0:23:56] BSB: So, in Redis right now, the indexes have to live in a single node. That is a constraint right now. Your index size should be able to fit in the memory that that one node has available. But it hasn't been -and again for the type of use cases that Redis typically it's used for, it hasn't been an issue in many cases and we rarely get a complaint about that as a vector database.

[0:24:23] SF: What is semantic caching for RAG applications?

[0:24:26] BSB: Now, semantic caching is where I see Redis being the dominant player in the field. So, the cost of basically hitting the LLM, it's piling up very rapidly for people. You see even YouTube videos where it's like, "Hey, I forgot to cap my usage of OpenAI APIs, and now I have to mortgage my house." So, the semantic caching, basically, it's the idea that, just like regular caching, if you've seen this query before, and you have the answer cached, give them that answer. Now, the semantic part is that the query doesn't have to be exactly the same query that came in, as long as the meaning of that query, it's very similar. Obviously, you can fine tune how close in the vicinity of the original query you want to be to consider it a cache hit. But that is the place where Redis in libraries like RedisVL, which is our Python vector library, can help you mitigate those costs and also the performance. I mean, the response coming from the cache is going to come back at Redis cache speeds versus hitting the whole pipeline of the LLM, which it's very likely not even co-located with the application. It's probably an API call to OpenAI or some other provider. So, we're talking about several orders of magnitude of accelerating the response to the user.

[0:25:49] SF: In the context of like, copilot or chatbot scenario, where I can ask it essentially anything. How common is it that I can actually get a cache hit when people could essentially ask whatever they want?

[0:26:01] BSB: Yes. So, I believe the statistics right now, based on several studies, it's like 33% of answers could be answered from a cache, which is a pretty high number. But I think that is a very kind of broad stroke opinion. You probably have to look at specific use cases and industries. For example, there is a lot of variability in something that, for example, it has a more abstract background. Let's say you have an art collector's chatbot or some sort. Everything in that art world, it's heavily reliant on opinions, so the variability in questions and answers is going to be very broad. But for example, in something like a FinTech environment, some answers are going to be so specific and the questions can be boiled down to a semantic vector that is going to be very close to questions that are similar. So, in environments like that, I think you're going to have a higher cache hit ratio in environments that are more abstract, keyway-type of environments.

[0:27:04] SF: So, beyond essentially, constraints around like the cost of inference for an organization, what are some of the other constraints in building RAG applications today?

[0:27:12] BSB: The cost savings, it's typically one of the things that our sales folks will basically bring up immediately because it's kind of like the decision maker's mindset. But from an engineering point of view, having a Redis in your RAG pipeline or your semantic AI application, it really gives you that reliability and that response time that you expect from just hard-coded code. And that is refreshing when you're building RAG applications, where you're basically, even for as a developer, when you're testing applications, when you're basically using Redis in that pipeline, everything is just so much snappier in progress. You can advance your application development lifecycle significantly just because that part - remember when, for example, our CI/CD systems were extremely slow and were API-based, and you had to send it to the one and only provider. You have to run your own. It became a roadblock to progress.

So, the Redis enables, by basically providing speed and higher throughput, a better experience as a developer, and obviously it translates directly into your runtime environments.

[0:28:25] SF: What about another trend that we're seeing? I think, most recently this year, a lot of buzz around AI agents, in a lot of ways, it feels like maybe this is the next, like real use case of Generative AI beyond the chatbot experience. Do you see this approach that Redis is taking kind of playing a role there as well?

[0:28:43] BSB: Yes, definitely. I mean, all these agents, so the autonomous AI agent systems, they all are going to need a memory and knowledge base for those agents. If an agent does a set of tasks, and if they've done those tasks before, and the only difference is the input data that was used for the task. There's parts of the agent functionality. The script of how to follow those steps for this specific type of query applies the same way that it applies in the RAG pipeline.

So, having an agentic memory. I don't know that I like the word agentic, but I think it's being used right now. I don't even think that's in the dictionary. As an ESL kid, making up words always felt pretty insulting to the language. But I'm going to continue using agentic now that it's become part of the industry lingo.

For agents, now you have, and we're talking about basically RAG pipelines that were linear at the beginning, and now you're having basically more workflow engines that need to manage your pipeline. Those workflow engines, all these agent frameworks, that's what they are. They're workflow agents with some of those nodes happen to be agent invocations. Agents are, in a sense, similar to an LLM, because they're typically directed by an LLM on how to proceed. So, all the same issues with basically semantic caching of responses and all that stuff apply at those node levels.

[0:30:11] SF: Right. Then, what do you think in terms of the AI agents, what are the initial applications that we'll see in those space?

[0:30:19] BSB: We're seeing a lot of like, for example, marketing. There's a lot of mundane tasks in marketing that are required, and typically are manual things that involve emails, that involve basically, opening an app and creating a ticket for something. A lot of that stuff. We have this web robots for ages now that were used for all kinds of purposes. So, I see agents that basically automate tasks on behalf of industries, first, specifically where there's no APIs available, do things through a website, but in a much more smart fashion than we used to.

Pipelines that are much more complex, that require human-in-the-middle interactions, some of those things are going to be much more easy to automate in the future. Actually, the very short future. A lot of those things are being done right now, and it's going to be a combination of, is there an API to do this efficiently? Or here is an agent that basically can navigate through even some server-side application that it's sitting on an AS/400. So, we're going to see a lot of - and this is going to be the equivalent of what APIs did for development. We're going to have, basically, now the bridge between systems that were hard to use or inaccessible. It's going to be the agent.

[0:31:37] SF: Yes. Essentially, you can now, you can wrap an API around the agent, so that an engineer can essentially interface with the agent as they would any API system, but then the agent can go off and do sort of this manual work that historically took a person to actually accomplish.

[0:31:53] BSB: Exactly. Even things like, I mean, you remember building something and then somebody changed the endpoint API from under you, or you got to notice that things have changed too late. When you have an agent in there, now you have the LLM reasoning capabilities to basically say, "Hey, something changed in the response from that endpoint, so I need to adjust that." Or when I send my request, it told me, "Hey, you're missing something in the request," and can the agent create that piece of data that's missing? So, some of those mitigations are going to be there, but having all this reasoning and processing at each one of those nodes also implies that now we're basically spending way more time at each one of those nodes. Again, that's another place where I see Redis playing a big role, agentic memory.

[0:32:39] SF: Right. Years ago, at my own company, I worked on a system that would - we did a lot of web scraping of jobs and stuff like that, and then we would have to encode essentially the rules of how the layout of the web page was in order to pull out things like the title, job description, all these types of things, and then bring that into a more structured form, because there was no APIs in some of these systems, and we were distributing that to mobile applications, but we had to essentially encode the rules server side so that if the website did make an update, we could push a server-side update to the rules that would then go down to the mobile apps, rather than encoding it directly into the mobile app, where, if the rules are encoded there and the website changes, you break the mobile app essentially. So now, with these systems, you wrap all that stuff with an API and have it sort of dynamically react regardless of what the structure of the website is.

[0:33:27] BSB: Yes. I see from a developer's point of view, I see a lot of things that used to be, inside of a try-catch block. Like try the way it should work, and then in the cache part, you will have all this, where we try logic. All that stuff, our programming definitely is going to become simpler for some of these edge cases where we can offload within a set of confined rules, the reaction to these exceptional conditions to an LLM.

[0:33:55] SF: What do you think are some of the like current tool gaps that exist right now for building Gen AI-powered applications?

[0:34:04] BSB: I think we're pretty early in this world. I mean, you see platforms like Bedrock and Google's offering where they're trying to basically create a more holistic end-to-end environment, where you have your data acquisition, cleaning - data coming from different form factors, different databases, different services. Then, a lot of them now kind of have a built-in RAG pipeline workflow into them.

I think that's just the beginning. Obviously, the RAG architecture is changing so fast in the addition of agents and all that stuff. It's making this architecture be much more flexible, but also hard to codify, as some of these providers are doing. I see that there's going to be, definitely the need for more standardized interfaces between components. For example, right now, how you invoke embedding model? It varies from model to model. You're seeing the evolution of some libraries in Python and Java to handle that. It's like, "Hey, let's" - in a lot of cases, every other model - let's put a wrapper on every other model so they all behave like the first model that we ever use, which is OpenAI, or something like that.

But I see more consolidation in general, basically, along those areas, like re-rankers. How does a re-ranker work? In platforms like Java, for example, it's very likely that there's going to be a movement the on the Java platform to basically create standardized interfaces so people can actually have a service provider interface and then an API for users. The service providers will have to basically code to those service provider interfaces to be able to provide their services in a standardized fashion.

I would love to see that in the Python world too. Typically, the Java world is more strict when it comes to that type of enterprise-y type of thinking. In the Python world, it's more of a wild west. It's like, "Hey, I wrote it. It works. Now, you get to use it." I see a need for this formalization of how Gen AI components or building blocks interact with each other.

[0:36:09] SF: I'm kind of curious about where things will go from, like a programming language perspective. A lot of traditional sort of data science, machine learning work has come out, like Python de facto standard there. But the nature of what you were doing in terms of building new models or building small tests was very different than what we're doing now, where we're building like actual AI-powered applications that for enterprise and that large scale. Python might not be the best tool to do that. Then additionally, I think when we look at transformations in technology that have happened, whether it's like an API or a particular standard getting really popular, a lot of times the language that becomes dominant is the one that is best suited to whatever that technology is like.

I'm sure that you had this experience too, since you're a similar age to me is, if you told me back in the nineties or early 2000s that JavaScript was going to become the dominant programming language in the world, we would have thought you were insane. But a big part of that was because of, I think a lot of the like built-in support for handling JSON, and things become really natural as we move to REST APIs and these types of web infrastructure choices that we made. So, I'm curious, like, I know you're a big Java guy, you might have a bias. But where do you think things in terms of programming languages might go to?

[0:37:29] BSB: In terms of - Python, it's a beautiful language to deal with. I'm also a Ruby-ist. I did Ruby for more than 15 years. So, Ruby and Python are very similar languages. Obviously, strong, loosely typed languages versus strong type languages. That's going to be a debate that's going to rage on forever. But if you see most of the enterprise-level packages, for example, for Python, like LangChain or LlamaIndex, basically, we bolted on, basically, type safety onto them.

So, for things that you're building for others, I feel that strongly typed languages with strong contracts between components, it's the way to go in terms of maintenance, evolution. I think Java can basically still make a big impact in Gen AI, in the AI world in general, and you're seeing that, for example, with platforms like Spring AI. Spring AI, it's a latecomer to the Gen AI world, but has done things in a very controlled evolution fashion, which allows people to basically - for example, there's not even, I think, the concept of re-ranking in Spring AI, because they haven't figured out yet how to do a clean API for it, or a clean SPI. That's specific to the Spring ecosystem, but the same thing happens in the Java platform in general, and I'm pretty sure that there's going to be - it's a slow evolution, but a very controlled evolution, which the combination of that in rapid evolution, for example, in the Python world, where you basically put out something out there very quickly, that combination of those two worlds, it's where we're going to get systems that are very robust and reliable.

[0:39:07] SF: So, as we start to look to wrap up, what's next for you?

[0:39:11] BSB: I'm having the time of my life. You know that 19, late 1990s dream of basically working on AI and machine learning has become reality. I'm having a lot of fun. I'm actually doing a Master's in Data Science too. I'm a couple classes away from being done, and I'm having a great time basically getting deep into research papers and how things are moving in the industry. At Redis, we are also doing, even though we're in the applied engineering department, we are basically going deep into, for example, the latest and greatest research in all the fields that impact what we're doing. We even do paper reviews, like we were a college class, which is pretty cool.

[0:39:53] SF: Well, I think you have to, at this point.

[0:39:54] BSB: Yes. Exactly. There's some nuggets of wisdom sometimes hidden so deep into a research paper that are immediately implementable in what you're doing. Other stuff, it's like pipe dreams and even researcher hallucinations, which is a thing of apt analogy here. But yes, I see myself going deeper and deeper into the more theoretical aspects of Gen AI and AI in general. I was super into graph computing too, which we don't do much of that at Redis. But I see that also as being a very fun field to explore, and I will continue to basically be out there with the community and try to bring clarity into some of these topics.

There's a lot of conferences where you and I have been to some of these conferences where I make the assumption that everybody knows this stuff now. When you go there and you talk to the folks on the ground, they are barely beginning to explore things, so they need a lot of guidance, clarification, and not to be talked over in terms of all this Gen AI stuff.

[0:41:03] SF: Yes. I mean, I think when you're in it, you're so used to hearing things like RAG, vectors, like all this type of stuff that we were talking about today, and even beyond that, that you just kind of get to a place where you assume everybody that's working in technology must be as well-versed as you are. But the reality is, this is all super, super new. I really like what you said about how much fun you're having, because I keep saying that is, like, it's a really fun time to be involved in technology. There's so much going on right now, and there's endless opportunity to go and try things out, and you'll find - carve your niche, sort of, whatever you're interested in, in this space.

[0:41:39] BSB: It's a brave new world. I see developers just starting their careers right now. Obviously, there's a lot of tools that you and I didn't have when we got started. I mean, we didn't have the Internet. It's pretty sad.

[0:41:53] SF: You just VM in a terminal.

[0:41:55] BSB: Exactly. But at the same time, the amount of stuff and layers of abstraction that are there to learn also makes it a very hard entry bar for a lot of people. So, just the cognitive load of basically becoming not even a computer scientist, just obviously, we know the distinction between a computer scientist and a programmer and one might not be good at the other job. But the same thing applies here. Now, you have to be part computer scientist for a lot of things, in machine learning, expert, in data scientists, all that statistics that nobody wanted to learn. Now, I had to go back and refresh all that stuff, because that statistics knowledge, it's primordial to build up a good intuition and understanding of all the things that are happening.

So, yes, there's a high bar of entry right now, or of staying in the field once you enter. Because there's so much to learn in - when we started, you could I basically learn about microcontroller programming and basically building stuff on a breadboard with a Motorola chip, and understanding gates and all that stuff. Now, I don't think you have the luxury of living in so many of those layers. They have to basically jump to the vicinity of the abstraction that they're working on if they want to be productive right away. So, in some cases, we had great advantages by having a simpler world to deal with.

[0:43:19] SF: Yes. I think, whatever your abstraction layer is, it's good to go one or two levels below it, but you don't need to know - if your abstraction layer is essentially front-end web applications, does it make sense to get down to essentially the microcontroller level? Probably, not going to help you that much, but it does help to go a couple of layers deeper, and like you said, there's so much to know right now. There's no way that you could - you're just going to, essentially, I think, lose yourself trying to grasp everything, and it's going to become daunting in a barrier to entry. I think we could probably do a whole show talking about some of this stuff.

[0:43:51] BSB: Totally.

[0:43:53] SF: This was awesome, Brian. Thanks so much for coming on and I really enjoyed this.

[0:43:56] BSB: Thank you so much, Sean.

[0:43:57] SF: Cheers.

[END]