EPISODE 1601

[INTRODUCTION]

[0:00:00] ANNOUNCER: Knowledge graphs are an intuitive way to define relationships between objects, events, situations and concepts. Their ability to encode this information makes them an attractive database paradigm. Hume is a graph-based analysis solution developed by GraphAware. It represents data as a network of interconnected entities and provides analysis capabilities to extract insights from the data. 

Luanne Misquitta is the VP of Engineering at GraphAware and she joins the show today to talk about graph databases and the engineering of Hume. 

This episode of Software Engineering Daily is hosted by Jocelyn Byrne Houle. Check the show notes for more information on Jocelyn's work and where to find her.

[INTERVIEW]

[0:00:51] JBH: Hi, everybody. Welcome to the show. We're super excited to get started talking about graph today. We have with us Luanne Misquitta. She is one of the world's leading experts and consultants on graph. And, for 20 years, she's been an expert. I've seen some of her Talks on YouTube and really enjoyed them. She's currently the VP of Engineering at GraphAware. And we're just super excited to have you, Luanne. Welcome.

[0:01:13] LM: Thank you. I'm really happy to be a guest on Software Engineering Daily. Thanks for having me.

[0:01:18] JBH: I thought what we could do is start with a little bit about yourself, and then we'll focus a little bit on some technical questions and then talk about GraphAware more. Tell us a little bit about your journey with before and after graph.

[0:01:30] LM: It's been a very long journey. Before graphs, I knew I always wanted to code. Always knew I wanted to do that as a child. Thanks to my dad bringing home the Atari in the early 1980s. I was hooked on that. And that was it. It was very clear to me. 

I've got currently about 24 years of actually formal working experience. But before that, tried to do a lot of things. I was around the age when the internet hit. In the 1990s, built HTML websites. Horrible-looking ones. But, yeah, through that whole journey. 

As I did more and more of working as a software engineer, working with teams. I love working with teams. I came across graphs in around, I would say, 2008, 2009. It was very, very, very, very early stage. It came around the era of NoSQL databases when that explosion happened. And there were firmly four categories. Graphs was one of them. 

The moment I tried it, I absolutely loved it. And we'll talk some more about that today. But that's been it. Then I joined GraphAware, of course. I've been with them – this is my ninth year. The company is 10 years old this year. It's been an amazing journey. And there's lots we have to do still.

[0:02:39] JBH: You know, you've been a longtime developer. When you found graph, what was it that was that aha moment for you? 

[0:02:44] LM: Yeah. It's just a very logical and natural way of looking at your data, right? Because, graph, it really is the model of the world today, right? It's very simple, right? It doesn't have too much jargon. In fact, it's super simple. And I can tell you like what it is in like a minute, right? You have entities that represent – 

[0:03:03] JBH: Yeah, let's do that. Because we have a technical audience. But is there so much tech to know? Often, you may not know as much as you want to about graphs. Give us an architectural starting place or base. The pillars. 

[0:03:16] LM: What does it represent? First of all, it takes off from the mathematical concept, right? It's nothing new. In fact, there was a version of graph databases. They were called network databases in the 70s, right? They didn't take off much because there wasn't much networking happening. Your data wasn't so connected. And the purpose of data was more tabulation, aggregating. Those kinds of analytics. It was just too early. 

A graph basically contains entities and how they're connected. An entity can be a person, a place, a thing, a vehicle, an address. Just things in the world, right? And then the relationships connect them. You live in a house, on a street, in a country. You drive a car to your office or your workplace. You eat at a restaurant. You know you have friends. You meet from time to time, et cetera. Right? 

And what a graph does is it just models those relationships explicitly. If you think of the relational database world, which almost everyone is familiar with, you have rows and columns of data. Again, very easy to understand rows and columns. But it's not so easy to look at that and understand how that data is connected. You would have to go through the concept of a key, a foreign key maybe, an associative table which tells you that this kind of entity is linked to this kind of entity. The relationships are not explicit. 

In a graph, you can imagine them as direct pointers from one piece of information to the other piece of information. And what makes this so amazing is that, typically, when you design or you model your data, or you talk about your – let's not go to data for a second. Let's talk about your business or your domain. It's super easy for everyone. And when I say everyone, I don't mean just engineers. It's also people from the business, your stakeholders to walk up to a whiteboard and draw your domain. You would just draw it out with arrows and things connecting. 

And that really is your graph. There is a one-to-one mapping. It's very – there's no impedance mismatch that you have with, let's say, a relational database, right? That's the first thing I love about it, the simplicity. You don't have to think too much. It's a model that can be communicated very, very, very easily. It narrows this gap that we traditionally have between engineers and what they do and how the business views the world. It's really one-to-one.

[0:05:43] JBH: You're saying, in terms of just from – it's a modality that helps you work with your subject matter experts too. Because they're not thinking in rows and columns. When you have to go interview people because you have to model the data for a new graph database, it's easier. You don't have to explain anything. You can kind of meet the subject matter matter experts where they are and then you create your – this is something I always had a question about. You're creating the model and then you also are adding in nodes and edges? 

[0:06:10] LM: Yes. Exactly. If you were to draw on your whiteboard that Jocelyn, you're a person, right? You would be a node. You would have a circle on the whiteboard. You would have Jocelyn. Here's your name and a few details about yourself. Maybe those are properties. And then you're a host at Software Engineering Daily. Software Engineering Daily would be another node. It represents a podcast, right? And then you would have an arrow from Jocelyn to Software Engineering Daily saying that you host the show. And maybe on the relationship, which would be host-host, maybe you would have another property saying since this year, right? 

It's a very readable model. It expresses exactly the state of the world. This is the modeling aspect. But then it comes to, okay, what do I do with this model? The ability to query how data is connected is where a graph excels. You're looking at really the connections between the data. If you have data that's not really connected hard but maybe you're not interested in the connections, then a graph database isn't suitable. 

You're looking at the fact that Jocelyn hosted Software Engineering Daily. And maybe there are other hosts of the same podcast. Maybe some of you live in the same city. Maybe some of you like the same cuisines. And so, therefore, now you have a deeper connection that maybe you visit the same restaurants. Or you look for patterns in your graph that you talk about these subjects. Here are similar podcasts that talk about similar subjects. 

It's a lot of – the value of the graph database I would say is really in the connections. How is data related? That enables you to uncover patterns or match patterns depending on what's your use case. And really be able to navigate that context. What's the context that binds this data together? That is I think where graphs excel. And that's the position that it holds amongst all the other kinds of databases. 

[0:08:14] JBH: That's a helpful starting point. Let's talk through some architectural and use case examples. Because a couple of questions I have. One, the traditional knock – and we'll address these as we go through these examples. But the traditional knock is it's too hard to integrate. Too hard to learn and not fast enough. Let's walk through a couple examples and you can sort of help me understand the architectural trade-offs at those intersection points. 

Let's take the example of – because I have finance background. We go with that first. I'm Jocelyn Houle. I'm a credit card customer and I want you to raise my limits. I'm calling in. And what you guys want – what the organization wants to do is let's see if Jocelyn has had some billing problems. We're going to go and see how you likely it is that it'll be fine giving her this extra $500,000. We're going to base that decision off of the way this account's been serviced. 

There's, I would expect, a graph from the customer data. There's a notion of me that lives in a graph. But then, there's like a – way in the bowels of the building, there's a transactional, linear data warehouse somewhere that's just like grabbing servicing calls and payments all day long. How would those work together at speed? 

[0:09:27] LM: Yeah. Okay. Your traditional data warehouse, let's say. Or data lake. Whichever way you look at it. As you said, this has historically a lot of information from a lot of data sources. Not only your business with the bank, but maybe related to the bank. Loans, insurances, spend patterns. Everything related to you. Let's just go with that. 

Now to be able to crunch that data, it would be really easy to tell you with a relational database answers to analytical type of queries such as what is your total spend or your average spend. The loans you have outstanding. Do you pay them on time? Et cetera. But as discrete pieces of information. 

Now with a graph, what you would end up doing is you would end up modeling yourself in a different way. You would have Jocelyn the customer of the bank holding a credit card and maybe past credit cards also related to different kinds of financial instruments. You have taken a loan or maybe multiple insurance, everything else that relates to your spend. Maybe your spend patterns where you typically spend money. Do you spend it on – do you have like weekly weekend shopping sprees? Do you gamble a lot? All these kinds of things. Because today the information is really – 

[0:10:48] JBH: I do all those things, by the way. 

[0:10:50] LM: Okay.

[0:10:51] JBH: I do all those things.

[0:10:52] LM: The information is readily available. And what a graph helps you to do is to really navigate those connections in real-time really, really at speed. To be able to make a connection between Jocelyn and either a pattern that you are looking for that shouldn't exist. Let's say that the bank has several places that they don't take to very kindly to extend your credit limit. 

If you're a frequent spender at certain kinds of establishments, that maybe your credit score is a bit lower than it should be. Is there a path from Jocelyn to any of those establishments? Whether it's direct, rare. Or whether it's like multi-hop. Those queries are extremely fast. 

Other than that, it also can encapsulate more of your network. Are you connected to known fraudsters? Are you connected to people that are involved in offshore scams or things like this? It's not only you, but it's also the neighborhood around you. 

What are the associations you have? Are you like super – do you have like no associations? Or have you had associations to people that were involved in criminal activities in the past? It's the patterns. And how fast can you join the dots? That's where the graph comes in. 

[0:12:15] JBH: I do love that. And everybody wants more context. I think all of the smarty pants people in tech are like, "Yes. We want all the context around the data. And we want it to be apples to apples." It's very appealing. 

Putting myself in the role, though, of somebody who's implementing in a really heterogeneous data environment. Let's say you're at a massive insurer or you have a lot of your medical data. One of these really big operations. And you're saying to yourself, "Oh, I definitely want this context. However, I have every kind of data store. I have all kinds of requirements for privacy, for lineage." I guess my question would be what do you see as a common implementation path in a like complex organization? How do you get started with graph? You can't just turn it on tomorrow.

[0:13:00] LM: No. Yeah. This is a very classic question, I think, and really fit for, really, the enterprises. Because if you're a smaller company just getting started, you can be up and running tomorrow. Because it's super easy to get your data and model it. You can ask a few queries and you're done. 

[0:13:18] JBH: So, Greenfields, brand-new.

[0:13:20] LM: Yeah.

[0:13:21] JBH: Easy yes.

[0:13:21] LM: Yeah. Very, very, very short time to market. Extremely short. For an enterprise, as you said, the biggest barrier is to figure out, A, what data do you need? Because there is data and then there's data that you want to answer certain kinds of questions. You don't necessarily need every scrap of information that you have if you think that it's not valuable to the kinds of questions you want to answer. If it's medical insurance, you need data around maybe medical history, past claims.

[0:13:56] JBH: This is such an interesting point. I'm sorry to interrupt you, but I have to say like this is – we should do – I know we all talk – I want to talk about AI as well, but in the old world of like predictive analytics and machine learning. It's not querying the context, but it's sort of close to it. It's in the same neighborhood. Because you're trying to just quickly render repeatable meaning out of the data. 

And what ends up happening is, actually, there are only a few big levers in the data. You could. And you can pull in 2,000 elements for your model. The reality is, is probably eight or 10 that are driving.

[0:14:26] LM: Yeah. Yeah. Yeah. No. This is really important. Because if you try to like model everything, you kind of get stuck into – you get stuck into that modeling cycle where you're just modeling and modeling and you're not sure if you're going to get the value from that. And I think that's where a lot of the confidence starts to dip because you have these very long cycles. You're not really getting value from your data just yet. But somehow, you've been promised that you will get value from the data. And then it kind of it goes downhill from there. And that's where a lot of the disillusionment kind of creeps in. With a graph or with – I think almost all of our customers over all the years or as consulting with companies, how to get your data into a graph? 

[0:15:09] JBH: How to get started? Implementation wise. Okay, I've got some budget. I want to get my feet wet. I don't want to slow anybody down. But where to start? Because it seems to me like things like fraud, customer 360, those are great, but they're also so high stakes. 

[0:15:24] LM: Yes, they are. They are. What we always advise is, really, don't try to boil the ocean, essentially, with everything. What is the most important use case for you at this moment of time in your business? Where will you get the most value? 

Because graphs have this – I – I don't think it was by design, but it allows you to really build a model very incrementally. It lends itself to allow you to start small. What is a use case that you want to start with? Where are you going to get value? Bring in the data.

[0:15:58] JBH: Let me ask. Do you have any examples of types of companies or types of projects that have been good starting points? 

[0:16:03] LM: Yeah. There are a lot. I will try and pick some relatable ones. Impact analysis, I think everyone can or most people can relate to. Whether it's infrastructure. Whether it is logistics or supply chain. Yeah, let's go with these. What's the impact of something happening? What is the impact of particular data center going down? What systems are going to go down? 

Telecommunications domain is a really, really good case. If I'm going to take a part of the network down for maintenance, what's going to be affected? Are there any of my major customers there? Logistic supply chain, if there's a storm that's going to impact a certain area which is a root to most of my delivery centers, what's the impact? Can I find alternative routes? 

Impact analysis is a very, very popular graph use case. Because it was kind of like cause and effect. And because they're dependent on each other, maybe not directly, but indirectly, you are looking for parts between things, parts between a piece of infrastructure and another one. And you know that because they are linked, either dependent or supplying, if any part of that chain breaks, then you actually know that you will sort of like disconnect that network. Where's your impact? 

[0:17:22] JBH: Let me ask a dumb question here, which is something like let's say you know that if you give – if you open a new account and the person never uses it, they never put any deposits in, that likely case one of two things are going to happen. 80% chance they're never going to be customers and 20% chance they'll do it within 120-day window or something like that. Let's say there's two basic outcomes. Do you have to – I'm going to use a language that might be triggering for the data world. But do you have to hardcode that or really declare those as possible outcomes? Or is there a separate system that you would set up to infer what the outcomes are and feed it into a graph? 

[0:17:58] LM: I think you would approach it from a different way from a graph perspective. And this is one of the other things I was going to get to, is that first – not a barrier, but something puzzling to a lot of people that get onto graphs is that first mental model shift. You think about it from – 

[0:18:16] JBH: Yes. I'm still struggling with it. Yes. 

[0:18:19] LM: Once your mind shifts, you will never look back. And that's the power of graphs. Your mind starts thinking that way. The question for me to you would be what do you want to achieve from this? Do you want to be able to predict customers that are likely to – or, rather, people that are likely to never be customers? 

[0:18:37] JBH: My experience in the industry is, the faster they start working with this account, they're going to be real customers. The longer they wait to use this account, I may not get them and I won't be able to recruit my cost of customer acquisition. What I want to do is sort of see when they're sliding out to maybe not being customers and do something about it. 

[0:18:56] LM: At a super high and basic level, where we would start is we would say, "Okay, the things that are important to detect this sort of pattern." The customer, the account and the deposits, let's say, at this stage. 

[0:19:08] JBH: Yeah. At least that. Yeah. 

[0:19:10] LM: You probably just need those. Once you have that, you would be looking at matching this pattern across the graph to find all customers that satisfy this pattern within, let's say, a specific time frame. Because it's not all time. It's within some particular critical period of time. That encompasses the pattern matching. To your question about would it be hardcoded or not? This would be really a query. A pattern matching query that you would probably have set up to alert you. 

[0:19:37] JBH: Yeah. Yeah. As you light up each of these entity networks, then you have the ability to query them in context of each entity network. 

[0:19:46] LM: Exactly. Exactly. Yeah. And once you get that, that would be like a very basic sort of pattern that would alert you to something that you need to care about. That you've signed on all these customers and they're kind of reaching this boundary of not really being customers. What can you do about it? 

The what can you do about it then forms, I would say, almost a second use case. Because if there are ways to help customers to make that first deposit and therefore ensure that they are true customers and not the ones that will just drop off. Maybe there have been strategies that the bank has used in the past. Maybe there's a phone call that's been made. Maybe there's been some sort of incentive. Maybe there's been some sort of outreach that's been done. And then you have a conversion rate. These sorts of methods worked for this category of customers. Because maybe not the same thing applies to all kind of customers. Maybe you will then extend your use cases to say, "Here's the demographic of these customers that successfully converted based on these outreach methods. 

Now you shift into like the next set of use cases. And that's what I mean by incremental. Your first one was find me all these customers that are going to churn most likely. Then you can extend that data to say, "Okay, now how do I classify these customers? Or how do I cluster these customers?" Is there a certain demographic of customers that tend to churn? Are they from a certain profession? Age group? Living – 

[0:21:14] JBH: You need to use a separate classifying engine of some kind. 

[0:21:17] LM: Yeah. Or you would do it with a graph data, science algorithms within the graph. It's really the – 

[0:21:23] JBH: Okay. 

[0:21:25] LM: Yeah, it's like network analysis. You start to basically cluster. And not just on properties, but on relationships. If we were doing just properties, then again – and I'm sure we'll talk about this later. I would say don't use a graph. Because you don't actually exploit – 

[0:21:40] JBH: Oh, let's talk about it now. Because, see, one of the questions I was going to ask you in just a minute was let's assume we're buyers at one of these big organizations and there's just a ton of choices in this area. And maybe I should know the answer to this, but you've got Orient, Neo4j, Arango. You've got all this kind of – and so, what do – if you were to sit down with me and say, "All right. Let's just create a short checklist of questions to ask yourself as you're trying to directionally pick a tool." You don't have to say anything about different specific tools. But, directionally, I guess one of the first questions you would ask is are you really in the market for a graph? 

[0:22:13] LM: Yes. This is a really fundamental question. I think you will find a lot of material out there saying, "Sure. You can put any kind of data into the graph." And it's true. But are you really going to get the best value out of putting that data in the graph? This is a question you need to ask.

[0:22:32] JBH: I want all of the engineering listeners to really listen up to that. Because that's a great way to manage the business people are going to show up and say, "We want to put everything in graph." 

[0:22:40] LM: It's never easy. And no matter what I say today, you're still going to get that. But the question is why do you want to graph? And there are, I would say, a couple of very basic questions that will help you to eliminate the fact that you really need a graph because everyone has a graph. Do you care about the connections between the data? Do you have any use cases at all that care about why the data is connected the way it is? 

I will use a really, really old example that we would use, I don't know, about 10 or 11 years ago when NoSQL was new and we had this standard way of explaining difference between key-value store, column store, document store and graph. We said it so many times, I remember the example. 

Let's say that you have a town. Okay? And in that town, you have people living in the town and they all live in houses. And they're all supplied electricity. You have utilities. And they drive their cars to work. Just picture that you're modeling this thing. 

If the question you are trying to answer out of this data about this town is what's the average income of every household? You don't need a graph. Okay? If you want to answer what's the average size of every household? How many houses are there in the perimeter around the school or the supermarket? You don't need a graph. Because you're really looking at tabulating data. Aggregating data. None of those questions relate to why that town is connected. 

But now you change your question and say what's the shortest way I can get from uh the school to the hospital? Now that is a path problem. You care about the root of travel. That's a graph. The electricity has gone down in this particular cluster of houses, why? What's the root cause? You need a graph. Okay? 

When you're looking at how things are connected. Because, obviously, you have an electrical grid, you have telecoms, you have road networks, you have people getting from one place to another. What restaurant should I go to eat at tonight? Because I love pizza. Graph question. Recommendations. Social networks. Who am I likely to know? And these all rely on the connections in your data. 

If you have any of this set of questions. Who's the most influential person in the town? Very popular graph question. It relates back to betweenness centrality, which is really just a fancy way of saying who's the influencer? And either positively or negatively. Because, positive, if you want to like amplify your message, you need to find the one person who is so connected that can amplify your message. That's a positive case. The negative case would be in law enforcement, for example. 

And when you look at the social networks of how organized crime is or gangs, you'd really want to find that central figure to take out of the picture. Because what happens then is then you disconnect your network. You disconnect the groups and you're able to tackle them differently. These are all graph problems. 

[0:25:43] JBH: yeah. That's interesting. 

[0:25:45] LM: That's to answer the first question do I need a graph? Let's assume that you do need a graph. Now, as you said, there are many graphs on the market today. The questions that you need to ask yourself are, first of all, there are – I would say the first question is, is that the database? Because at the end of the day, it's a database. How graph native is it? And by graph native, I mean the data that goes into that database. How is it stored? And what's it optimized for? 

With NoSQL, every category of databases was optimized for something different. The column stores, for example, could withstand really, really, really high-speed ingestion. From data ingestion, from sensor points and so on. Massive scale. That was what they optimized for, speed. 

What graphs do is they optimize for reading, for pattern matching, for quering. How is your graph database built? Now if it's native – and I will now speak very specifically about graph native databases and some of the multimodal databases out there. Because you do have databases that can be many things. It's a good thing. Because, obviously, for engineers or for businesses, there's only so many databases you want to manage at the end of the day. We went through the phase where everything was polyglot and you had all these databases. One for my documents. One for my graph. One for something.

[0:27:08] JBH: Right. Right. 

[0:27:09] LM: Yeah. Thankfully, that craze has kind of settled a bit. Because it just became a maintenance nightmare. But a graph native databases, Neo4j, was one of the first graphs on the market. They were the people that actually created this whole category of graph databases. And they are primarily one of the graph native databases. 

What do I mean by that is the way the data is stored on disk is optimized for pattern matching. You basically have your nodes collocated with how they are related to other nodes. When you're traversing this data, what you are essentially doing in a really simplified form is you're literally following pointers. You're not doing a join. You're not doing an index lookup, which is why relational databases, even though they have relational in the name, they actually suffer with deep multi-hop queries. Because you always have that layer of indirection in between to say, "Okay, I need to consult something to look up. How it's related to something else? Or find what's related." 

But in a graph native database, it is really designed to traverse these relationships at lightning speed. And that's graph native. Then you have multi-modal databases, which kind of offer a bit of everything. You can store documents. You can do graph-based traversals on them. And the question we always ask is what do you want to get out of it? Do you want a general-purpose solution? Or maybe it will fit. But if you want to really optimize for those fast graph traversals to be able to detect patterns, then the mul – it's kind of like the jack of all trades. Because it's really not – 

[0:28:49] JBH: Yeah, which can be handy. It depends on your use case and the type of company you're in. But for very large, complicated organizations that need – I think what I'm hearing you say is if you were talking to like a massive Procter & Gamble. That type of organization. The first thing you'd say is, "Hey, do you really think you need a graph?" And then the second one is it should be graph native. And the reason you care about that is because speed – the queries are faster. It's going to move. 

And I think that's really interesting. As I've been doing research in this space, there are a number of – I'll do a search and see companies come up that I was like, "Oh, I thought you did time series? Or I thought –" everyone's kind of getting in the graph game. 

And so, the other thing that's interesting about what you're saying is it sort of seems like graph and generative AI would sort of be peanut butter and jelly, right? They go together. And that makes sense. But what – just briefly. And then we're going to get into GraphAware. But vector or graph, there's so much discussion around that right now.

[0:29:43] LM: Yeah.

[0:29:44] JBH: How should I think about that in like an index card worth of information? 

[0:29:48] LM: It's a really new space. I mean, everyone and everything is talking about vector databases. 

[0:29:53] JBH: And every technical person I've talked to is like there's a bunch of reading I haven't done. This is a safe place. You haven't read it all. 

[0:29:59] LM: Yeah. It's really funny. Because it's one of these things that we've seen a lot of crazes over the years. But it's one of these things that I think is going to stay around. Actually, the whole generative AI. The way LLMs have sort of like just taken over everything. And now the vector databases. To really support this whole kind of model. 

I think it's really – it's super interesting with graphs. Because they're very complimentary and they do converge at a point. With graphs, you have your context. You have your very explicit relationships. And you are looking at things like the context. How things are related to each other? How they match? 

And then you look at the vector space, which is, to me, currently. And as you said, there's a ton of stuff going on out there that I think I would just have to catch up with it every night to be on top of it. It's closer to me to search to the ability to basically say, "Okay, here are the characteristics of a particular entity and the neighborhood around it." Either properties or relationships. 

And then it's really encoded into a high-dimension vector. And it tends to capture really the semantics, which is kind of different from traditional keyword-based search. Search, before all this happened, typically keyword-based. Your Elasticsearches and everything else, you have a keyword. You find things that match. And then you use the graph database to add some context. Out of the thing that you're matching, what is the additional context? Does it make sense to you at this particular moment of time? In this particular space that you're in? 

The vector basically at as well that level of semantics. And the number one problem, and I think probably the most clear problem that they solved today is the nearest neighbor search. What is semantically close to this thing that you are looking for or asking? 

And what's really cool about these two converging is that you actually have the piece in the middle. The whole LLM in the middle, which has gone through the phase of being able to answer everything. Then kind of figuring out that, okay, it does hallucinate from time to time. And now the realization, at least at this point in time, that you really want to try and ground that LLM a bit in actual, maybe private domain-specific data. 

And so, the convergence of all these now tends to actually make a lot of sense. Because you have the accessibility of the whole generative AI space to people that are not necessarily data scientists or very, very technical people. It kind of opens it up to say any analyst can now have access to query all this data that you have. But you still want to ground it in facts and information from your knowledge graph. And that's where the vectors come in this space.

Right now, I think it's a very narrow space. They are fulfilling a need that just has to be filled given the pace of this whole category. But there are a lot of companies, Neo4j included, that – I don't know. Within the month, I think. They already added vector indexes. Because it kind of – it's really in the space. It's not just something that's thrown on top of it. But it kind of makes sense in the space of a graph. We will see more of those. And then it remains – 

[0:33:17] JBH: Well, I know you're a graph evangelist. So, you're going to always say that. 

[0:33:20] LM: I think it's very, very early days. Whether it's going to unfold that the vector capabilities will be added onto systems of databases such as graph databases, Elasticsearch. Because it makes a ton of sense that it would have that capability. Or whether vector databases would – well, they're called databases now, but there's a huge gap between really being a database. Are they actually going to build themselves out to be the database that we sort of expect? Especially OLTP. Are you going to be ACID-compliant? Are you going to have the performance in the scale? 

I think that gap is like pretty large as compared to the amount of work that other databases would take to add it on. We have to see how this uncovers.

[0:34:04] JBH: Yeah. 100%. Sorry. I didn't me interrupt you. But, yeah, that makes perfect sense. I like the way you're thinking about that. Is it going to be a full category? Or is it going to be a feature or an element of a category? Hard to say. I don't know. But that's interesting. 

Let's get to talking a little bit about GraphAware. I want to hear a little bit more about the product. But you were a consultancy that evolved into a product company. 

[0:34:26] LM: Yes. We started consulting at the time graphs were very new. What we did at the time was actually try to explain why you need a graph and what's the meaning of a graph. That was the kind of consultancy that we did. We saw the problem. We knew people that knew a bit about graphs and they wanted to know, "Okay, how can this help my business?" That's the space we operated on. 

There was really very, very, very high evangelism going on at the time to help customers really discover the value of graphs. And we were consultants. As we kind of like spent more and more time, we kind of felt that a lot of stuff that we were doing, a lot of the problems that we were solving were very common. 

We started actually developing our own tools and libraries to actually help make sense of this data. And the purpose of GraphAware or kind of our vision is really to make sense of connected data. That's what we stand for. It's very simple. 

Down this path, we gathered enough to say, "You know what? We actually have so much experience." And so many of the patterns that we see repeat from customer to customer. And so many of the tools we've built that really work and they're really helpful, it kind of makes sense to now wrap it into our own product. That's how Hume was born. About five years ago, I would say. We're still very, very young as a product, as a company. But we started there to say, "Okay, we want to help customers that have this data, especially the enterprises, especially governments." 

And our current focus is very much on the law enforcement space. And they all have the characteristics that you spoke of earlier. A lot of data, multiple systems, siloed data. And you really want to bring them together. Because the value in querying that data and finding patterns in real time is so valuable and so important, especially in the law enforcement space or any intelligence analysis space. With intelligence, it could even be like fraud detection, as closer to your background. 

[0:36:28] JBH: Let me interrupt you for a second. It is real-time analytics layer on top of graph. 

[0:36:33] LM: Yeah. Yeah. 

[0:36:34] JBH: What is it. Let's just say what it is? And you're solving mostly problems in law enforcement. 

[0:36:39] LM: Yeah. Hum is a product. We have two target personas. We have the analyst who is really interested in looking at the data, visualizing it, detecting patterns, querying it. Figuring out how different pieces of data are connected. To keep it really short. That's one of our personas. And then you have like the data scientist, the data engineer that's more interested in doing a deep dive analysis on that data coming up with everything that goes into network analysis. 

Hume as a product. We basically give you a single platform which helps you get that to orchestrate your data from various sources into that single graph. And then the ability to visualize it, analyze it, look at it in different ways geospatially or in time right. Style that data so that you can visually distinguish how clusters look like. Find parts between different parts of data. That's really in a super tiny nutshell what Hume is. 

[0:37:40] JBH: If I wanted to start a company or an initiative to say like I'm going to do anti-human trafficking, right? Then this is the type of product that I would take to put – 

[0:37:49] LM: Yeah. Yeah. Yeah. Yeah. It's really important. We actually value being in law enforcement. Because the value is extremely high. In every crime movie or series, the picture that you usually see is the corkboard, where you have like these little pins and little pieces of evidence and then the thread. 

[0:38:06] JBH: The murder board. I think that's called a murder board technically.

[0:38:09] LM: The murder board. Okay. It's true. And it's really sometimes the information is all there, but you're just missing that link. And that's really what the graph surfaces. We know from customers – and I cannot name those customers, but we know from customers that many of the times the data is there, it's in spreadsheets and tons of documents. And sometimes it's really – and why we call ourselves mission critical is because sometimes it really is a matter of life and death for, let's say, a missing person case or a potential attack. 

You have the data. But because the pattern isn't obvious or because you haven't found that one hidden connection, you're still in investigative mode. This is what we solve with Hume. We give you that window into your data so that you're able to find that missing link much, much, much faster. We have reports of analysis that would traditionally take days that are now happening in hours. And it's massive, right? 

[0:39:12] JBH: I mean, that's a big deal. And you feel good getting up in the morning to do that. 

[0:39:17] LM: Exactly. Exactly. Exactly. Because it can solve many things. It's not a solution that is built with law enforcement in mind. Of course, we add features that aid the law enforcement space over everything else. But the impact we feel from this is super high. 

For us, and our team and our company, we really know that it's not this superficial, emotional thing of, yeah, we're helping make the world safe. We actually know that there are cases where Hume has helped to make that difference. And that itself is I think a reward. And then because it's built on graphs, it's even better. It's really amazing. 

[0:39:53] JBH: I have a million questions. From a business perspective, I bet your background in consulting has helped you with law enforcement. Because, often, they have like kind of low maturity, I bet.

[0:40:01] LM: Yeah. Yeah. Absolutely. Absolutely. The journey we go through with a lot of customers, it's not identical, but it's similar. It's usually about, "Okay, how do I get to answer these questions very quickly?" 

We go through phases of figuring out what data you have. Where it's located? How easy is it to bring into the graph? How do you model that data? Super, super, super important. Because you can have two similar domains and each of those domains have different problems to solve. Different use cases. Different questions to answer. And you would actually model them differently. 

This is some of the mind-bending stuff that happens for people that first get into graphs. Because there is no rule book of third normal form, where everybody that needs to design a relation nal databases will apply third normal form. And you will end up with the same schema with a graph. Yeah. Well, it depends on your use case what you want to get out of it. And that's the fun of it. Oh, at least I think it's fun. 

But we start with that. How do you model your data? And then very, very quickly, we get the first use case implemented. What's your use case? How can you query that data? Are you getting the results you want? Now this phase – and that's why we like to do it incrementally. Because if you were going to ingest every piece of information or every silo that a law enforcement agency would have, you would be doing it for years. 

You would start with like a very small portion. Maybe a specific type of crime. Or maybe you would start with the forensics department. Or maybe you would start with missing persons, for example. You would model that piece of data. You would ingest it into your graph. You would craft the queries or enable an analyst to actually explore that data, find patterns. 

And then what always happens is, when you actually look at your own data in a graph, what pops into mind is, "Oh, do you know what? It can actually answer this, and this and this. And if I had that piece of data, it would help me answer that." And from then on, it's a very organic journey. It kind of – 

[0:41:54] JBH: It suggests itself from there. From there, you're going to start – yeah. Yeah. 

[0:41:55] LM: Yeah. That's typically our journey. And then you get into the deeper areas. Okay. Now how am I going to help you find clusters in your graph? How am I going to find you patterns? How am I going to detect things like – especially on law enforcement. The fact that two people met. This is a conceptual notion. But it can be actually expressed in a graph if you say that both people had a cellphone and then your cell tower actually picked up the fact that you were in that location. Because your cellphone pings that location within an amount of time. And maybe your number plate was caught on one of the traffic radar. These are like some of the not-obvious, direct use cases for graphs. Because you have the data. It's connected. 

but by putting together this meaning of the world or this understanding of your world from a law enforcement perspective, you are now starting to answer questions to help you analyze your cases to say, "Okay, what happened in this parameter?" We know that this car was here. This person was here. It's all pattern matching.

[0:42:55] JBH: Yeah. That's so interesting. And I've been talking with a lot of people about the importance of the analytical thinking in the new generative AI world. And it sounds like this is a perfect example, this kind of modeling. 

[0:43:07] LM: Yeah. Yeah. Yeah. 

[0:43:08] JBH: It takes a set of skills. Not just one set of skills. It's a whole combined thing. That's really interesting. But the other thing – we're coming to the end. So, I want to make sure that I ask you too. I guess all this data has to be certified as scrub for privacy before it goes into the – yeah. That probably takes a bit time.

[0:43:22] LM: Yeah. Well, it depends. Because we work with law enforcement customers, all of our deployments are on-premise, obviously. Or private cloud. We don't have – 

[0:43:31] JBH: Okay. Or they meet some internal – 

[0:43:33] LM: Exactly. And then, of course, you have to implement various layers of security, which also relates back to one of your earlier questions, is if I'm choosing a graph database, you want to make sure that, of course, it has basic guarantees out of the box. Typical ACID compliance. 

If you're going to put something into the graph, you don't want to wake up tomorrow and lose it, which is a lot of what some of the NoSQL databases had, which is fine. But just not for this particular domain. There's nothing wrong with it. 

ACIT-compliant, do you have the level of securities that large organizations and sensitive organizations need to depend on? It's not like a trivial matter where it's kind of okay to have data leak. Are you able to set up access control around all that data? The guarantees that your database – like any mature database should provide is super, super important. And we deal with customers like this. And that's why it's really important as to which database you pick. Where's your reliability? Do you have the right support? Do you have the right guarantees? Have they been around long enough? Are they trustworthy? Yeah.

[0:44:36] JBH: Yeah. I'm glad you listed that out. I'm going to put that in the show notes too. Because I think that's an important add-on to the list of questions. Even though you know, it's always good to have that out there. 

In terms of your growth pattern for this company, I wanted to just give a shout-out. I think this is such a great model. And people should think about this when funding is tight or closing customers takes a long time. This consultancy side of your business evens out the uneven revenue, potentially, of getting started. 

[0:45:01] LM: It does. 

[0:45:02] JBH: And you have the expertise and the relationships. It's not easy, but it's an interesting path that we haven't – it's been so much money in the venture markets in the last few years that we didn't talk about this path for a long time. But it's such a great path into evolving your startup out of consultancy. I just want to underline that for the listeners. I think it's a great way to go.

[0:45:24] LM: Yeah. It's been an interesting path. It wasn't pre-planned. It's just really how it unfolded along. But I think it's worked out extremely well for us. Definitely, a lot of the people that we consulted for on our top customers and our best customers. It just worked. Thankfully, it just worked for us in a very positive way.

[0:45:43] JBH: Well, I think it's great if you don't – a lot of people can't quit their day job. And so, it is a way that you could, as a consultant, continue to support yourself, support your small team and continue to evolve into your sort of classic Silicon Valley startup without blowing up your family life or whatever. You can get started. I like that. 

What is the future for GraphAware? Where are you guys taking the product next? If you can share? 

[0:46:10] LM: Well, there is a ton of work to be done. We are nowhere near the finish line. I would say we're at the start. We have a lot to do in terms of now that the analyst can analyze the graph. We are past that point. You can visualize what's in a graph. But now we want to attack the harder parts of challenges that anyone getting into a graph would have. Things like how do you know that the piece of information is the piece of information? The same person? Entity resolution in a way, which has been around for a long time in many, many different forms. It lends itself really well to graphs. And it's always been in the realm of something that a separate team of specialists, data scientists would do. 

In law enforcement, it's like super important. Because you, Jocelyn, as a person in the system probably multiple times. From maybe the fact that you have a vehicle registered at the motor department. You're a person there. You're a person because you have an account with a bank. You're a person because you are registered into the voting system. You're a citizen. You have a passport. 

And, honestly, all these things are not the same. They are really – every system provides a different view of that person. How do you know that you're the same person? And this is where a lot of companies struggle. Even in the financial world, how do you know that that customer – 

[0:47:30] JBH: Oh, definitely. 

[0:47:30] JBH: It's the same. Right? Entity resolution. Bringing more of the graph data science into the product, really important. We couldn't do that all this time. Because you need to start at the basics first. But now we're ready to bring really more advanced capabilities to allow analysts to have the freedom to be able to manage data that they happen to collect from a source. Very popular. 

Let's say you are investigating of a case, you seize a phone, a laptop, a pile of papers, whatever. It's always been a challenge to kind of bring that small piece of information that doesn't really comply to maybe the whole schema, right? When you think about the schema. But it's like extra information. But it's still important to solve your case piece of forensics. How can you bring that in seamlessly into the graph? 

Now we start talking about the graph being able to ad hoc, connect to other disparate sources of data maybe for a short-lived amount of time to analyze how that data is connected maybe for a long-lived amount of time. And these are kind of challenging problems. Because it's sort of organized, but yet ad hoc. And how do you find the middle ground between them? These are the spaces that we want to look into. And, yeah, plenty of work ahead of us.

[0:48:41] JBH: That sounds amazing. It's so interesting. I especially like this idea of why sort of anomaly detection for law enforcement is so weird and different. Because it could be the key to the whole thing.

[0:48:53] LM: Yeah. Exactly.

[0:48:53] JBH: That's interesting. It's been really great to catch up with you and understand a little bit more about your role, about GraphAware. I'm going to put a link in the show notes as well. You have an ebook, Applied Graph ebook, that we all know these ebooks or partly marketing. But I also found it to be very helpful in terms of laying out the basics for implementation.

[0:49:12] LM: Yeah. Some of the stuff that we have on our website is also because, GraphAware, when we were formed, the way we hired or the way we joined – I forget now. It's been a long time. But the people that joined the company really early were really experts in their domain. Our chief scientist, our CTO, they all came from the world of graph. 

We started off with being really experts in that field. We continued to be part of the community thought leaders in the space. A lot of the books and a lot of the material you see is actually kind of accumulated experience over the years. It does serve as marketing, of course. But you'll find that, yeah, there's combined years and years of experience that kind of go into this. 

And I think that's why, as well, we are a small company. We're nowhere near a large corporation. But we have a super high concentration of experience in this space. And I think that's as well pretty critical to why we you've been able to continue on this path for this amount of time.

[0:50:09] JBH: Well, it's amazing. Thank you so much. We started out with an intent to just lay out some groundwork for a technical person who maybe didn't know a ton about graph. And I feel like it was so great to hear your opinions on that and kind of lay out some big categories to think about as we all are engaging more to understand more I think especially with generative AI. I'm a data person at heart. So, I'm always like, "Oh, you got to start with the where's the data first. And then go to the model." I really appreciate that point of view. 

Last things last. As VP of engineering working, you were one of the first – are you founder first? Early employee? I can't remember. 

[0:50:41] LM: Early employee. Yeah. 

[0:50:43] JBH: Yeah. Any advice for engineering people who are maybe in big companies and looking to move to a small one? 

[0:50:50] LM: It's one of the hardest questions you've asked me all day today, which is funny. The best I can think about it, I think, for me, a slightly non-conformist view. What kind of company you work for really depends on what kind of person you are what kind of environment you thrive in. 

I went through a large company, services company, product company, corporate company startup and then another startup. I've kind of seen them all. And this is the size that really fits me. 

For me, it's important to know, really, what kind of environment you thrive in? Secondly, what does leadership mean? I think it's a very overloaded term and a very aspirational term. I think engineering has kind of gotten better over the years where I think if you were to think 8, 9, maybe 10 years ago, the only way to kind of make a mark for yourself is to be in management, right? Even if you were the best engineer. You were the engineer at the end of the day. 

I think a lot of people kind of got forced into the management side of things and really found it hard. Because as an engineer, it's really hard to get away from the engineering if you truly, truly, truly love engineering. It kind of created this split. I think my message would be like it's more forgiving today. You don't have to – you can still lead on different parts. You can still stay very true to your engineering roots and have a form of leadership. And you can still still branch out into management and have a form of leadership. 

[0:52:09] JBH: I love that. 

[0:52:09] LM: But for me, the number one thing, I think if I look back, you could say, "Okay, read all these books and get all these techniques." And I love reading. Don't get me wrong. There are a lot of things that you can – experiences and ideas you can draw from. But for me, the number one thing is, for you to be in leadership or for me to be in leadership, is to really be able to genuinely care about people and the people you work with. To have that sense of understanding what drives people to really care about it. And I don't mean superficially. To really care about it. To really be interested in making people in your team and around you successful. And therefore, the company's successful. 

I find that I always come back to this principle. It doesn't matter how you operate things and how you run things. Because that changes over the years. Depending on the group, and the day, and the time, and the age and what kind of period we're going for. 

For me, I think that if you have this, it makes everything else much easier. Because it's easy for me to align with this. It's easy for me to derive a sense of satisfaction to really have that gratitude at the end that my team built something. I was there to help them enable them. Not kind of tell them what to do. 

For me, there's a big difference between being a manager and being a leader. And the people being managed. Or are they kind of like behind – have you created that sense of purpose, which sort of is just self-rewarding then. You don't have to do much except be there to guide, to offer a bit of experience that you might have gathered simply because you've just been around too long. It's not about being smarter than the other. For me, caring I think is very, very, very important.

[0:53:46] JBH: I have to tell you – you hear this, right? You have to genuinely care about people. And I'm just going to share this, even though this is going to be published on a broad platform, is like I'm always like, "How do I do that?" Because I think that's an area of opportunity for me. I'm always like, "Let's just talk about the work. I don't want to talk about feelings." 

However, I like what you're saying. And if you cast it as part of a leadership mode rather than getting stuff done, project mode, that never made sense to me. How do I care genuinely about people when we're just like in the meat grinder? You're saying something different, which is how do you kind of level it up.

[0:54:17] LM: Yeah. Because there is a balance. I mean, it's also a very thin and dangerous line, which I have fallen into this trap many times. Because sometimes it can be an extremely emotionally draining or tough job. Because there's only so much you can deal with. 

But I think, for me, my perspective is the job, the work, the product, the company. This is a central focal point, right? I mean, that's why we are there in the context of the company. Of course. 

For me, that's a central point. And then it's about caring about how people are able to be the best version of themselves on the job. Not necessarily outside of the job. Because I think if you expand to that, it's tough. And you don't want to. You absolutely don't want to. 

It's kind of like the sense of everyone is different. Every single person has a different motivation, a different strength, a different way of working. Something different that drives them. Something that they're good at. Something that they're not. And I think for me to kind of understand this composition and then assemble it in a way that's most effective to move the team, the product, the company forward is key. 

And you will find the same analogy – if you read a lot of the books especially around the – well, any of these books that talk about team formation, they always draw the analogy of a sports team, right? Where you have all these different players, and they all have different roles and they all have different strengths. But you really know how to put them together into that formation, into that play. 

Because at the end of the day, the goal is to win for a sports team. For a company, it's like winning but in a different form. For me, this really fascinates me. And the dynamics between people and team, it really interests me. I think it's also probably a factor of me actually liking how things play out and how people dynamics work that kind of like interplay with this. 

[0:56:07] JBH: Nice. You kind of brought it all back to graph. I love it. You brought it all back to context. 

[0:56:10] LM: Okay. That wasn't intentional. But thank you. Yeah. 

[0:56:13] JBH: Listen, it's great to chat with you, Luanne. We'll have you come back to talk a little bit more about caring about people's feelings, which is an area of opportunity for me.

[0:56:19] JBH: It would be great. 

[0:56:20] LM: But it was awesome to check in with you. Thank you so much. And we'll talk soon.

[0:56:25] LM: Thank you so much. It was a pleasure.

[END]