EPISODE 1612

[INTRODUCTION]

[0:00:01] ANNOUNCER: Maritime logistics is the process of organizing the movement of goods across the ocean. Historically, this has been a challenging problem, because of the multinational nature of shipping, as well as piracy, smuggling, and legacy technology. It's also profoundly important for security reasons, and because 90% of what we buy travels over the oceans. Ocean vessels produce a lot of CO2, which adds climate change and energy dimensions to maritime logistics.

Windward AI is a maritime logistics platform that was started 13 years ago by two ex-Israeli naval officers. The idea for the company came from the observation that, at the time, it was hard, or impossible to know what's happening on the deep sea. Benny Keinan is the VP of R&D, and Lior Resisi is the data platform group leader at Windward AI. They joined the podcast today to talk about the technical and practical challenges of maritime logistics, why Rockset was the right database for their unique data sets, the impact of the Ukraine war, and more.

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale, is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com, and see all his content at Leeatchison.com.

[INTERVIEW]

[0:01:50] LA: Benny Keinan is the VP for R&D at Windward AI, and Lior Resisi is the data platforms group lead, and they are my guests today. Welcome to Software Engineering Daily.

[0:02:01] LR: Hi. Happy to be here.

[0:02:02] BK: Hi. Thank you for having us.

[0:02:04] LA: Great. Thank you. I know to many people, the merger between software engineering and something like maritime operations might not seem very interesting, or very obvious, but you have an interesting story behind you and your company. Can you give me a little bit of background about how Windward AI was started?

[0:02:25] LR: Sure. Windward is a 13-year-old company that actually started by two founders, Ami and Matan. Both of them are ex-Israeli naval officers that spend a lot of time in the seas. As part of what they've done, they saw that it's pretty hard, if not impossible, to know what's going on in the deep sea. When they came out of the army, they said, "Okay, we need to see how we can solve this problem," and starting to build a platform that will assist different parties, whether these are governments.

Later on, we got into the commercial aspects, but how we can provide eyes on what's happening in the deep seas, which until that point was almost impossible to know. That was the starting point of Windward. Later on, as things evolve, we started to understand that security aspects and things like that are only one part of what's happening and what is interesting, because eventually, most of the global trade is happening overseas. More than 90% of everything that we buy travels at this point, or that point in time through the sea.

If you think about the amount of money that is exchanging hands overseas, if you think about things related to environmental aspects. Vessels are very producing a lot of CO2 emissions. There is a lot of aspects on that side. As things evolved, and we saw a few years ago on COVID, what COVID did to the global supply chain, so this is another aspect. If you think about that, going from the early days of security related things into everything that is happening in the world, tightly coupled with what's going on in seas and global trade, and so on, and this is how Windward evolved from something very narrow into touching in so many aspects on our daily lives and providing more value and more insights to additional type of customers, use cases, and so on.

[0:04:31] LA: When we think about logistics, we think about detecting motion, right? Tracking, in this case, ships that are moving, where they're moving to and from and what cargo they're carrying and where they're going to and where they're going from and all those sorts of things. What you are really doing is you're trying to detect patterns of interest in all of this traffic moving over the international waters. Is that a fair statement?

[0:04:55] BK: It's partial, because eventually, let's start one step back. Logistic is one aspect. We are doing everything around global trade going on in maritime. Logistic is one aspect. You have additional aspects that are not only around logistics. When you try to understand what's happening, tracking the location of the vessel is one aspect, but you also need to know things about the relations between the vessels, who is the owner of the vessel, what type of the vessel, what is it that they're doing, what is the weather in the sea? Because maybe sometimes you can see a vessel going on a specific route, but suddenly, doing something else.

If you can explain that by the weather, you have good reasoning on why they're doing what they're doing. We're collecting a lot of data coming from the vessel, coming from satellites, coming from additional data sources. Everything is fused together. On top of that, we build our own understanding of the domain, our own entities, and eventually, translate into what we call activities.

Activity is the notion of a vessel is doing something. You can think about that like the effects of what's happening. You have the vessel and the vessel went into a port to discharge what they have onboard. Being inside the port is a port call activity, and this is a fact. Some of the things that vessels are doing are not that obvious to understand. When a vessel is going into a port, pretty easy to understand. Even that is not that easy. It's not like, "Yeah, I've been here, port call." No, no, no. It takes a lot of understanding about what it means. Does it make sense that a port call happened for two minutes? Most likely not, because you cannot do anything in two minutes. There is a lot of understanding on of the domain.

One of the things that Windward is doing, it's not like a generic AI platform. It's a vertical AI platform that combines AI capabilities with a lot of domain understanding, and the combination between the two of them can create those activities, which eventually, when you collect everything together, we have several tens of different activities in the system. Some of them are straightforward, just like I mentioned before, the port call, but some are more sophisticated.

For example, if you think about something that we call dark activity. This is a term that Windward invented. A dark activity is a vessel that you can actually don't see. It's not something that we're saying is happening because we see something is happening. This is an activity that we're saying exists, because nothing is happening. For example, if a vessel was in a specific location and a few minutes afterwards, it's gone. Now that location is very suspicious in terms of smuggling drugs, for example, like next to the shore of a specific country that might be famous in doing a drug industry and stuff like that.

Understanding that the vessel is gone for a specific time in a specific location, and what other vessel might be in that area, and what's the count route, and when it was lost, and for how long, and when it appeared again, where was it? Connecting all the dots, the missing pieces, different layers on top of AS information, which is the information about the location, connecting satellite images that we can match together and see, okay, we have no transmission in a specific area at a specific time. We have a satellite image that we can actually see a vessel in that location at that time. Probably, this is the vessel that's not supposed to be here, or at least they are not saying that they're here. We can connect everything together and say, "Okay. We have an activity, which is dark activity." These are all the facts that collectively create, we can call it a behavioral profile of that vessel.

[0:09:10] LA: Okay. You're not really tracking vessels, you're tracking activities, and these activities can be both fact-based activities, like we know this vessel went into a dock, or we know this vessel is located here, because it's telling us that information. But they could be dark activities, which is activities that you are presuming based on other information, and what's going on with it.

[0:09:33] BK: Exactly.

[0:09:34] LA: Okay, that makes sense. Now you've got all of these activities, and you're using AI to correlate those activities and do various things with - Let's get into the data collection side first, and then let's - we can talk about the AI side as we go along. You obviously need tons and tons of data from a variety of different sources to make all of this work. You need to know where ships are located, at least ships that are wanting to be located. You need to know where they're located. Obviously, you're collecting that information. You already mentioned whether information, satellite information, I'm assuming information from ports of call that about what ships are there, or what things are going on there. What other sources of information do you have that you use to take advantage of to create these activities?

[0:10:24] LR: As I said, I think, indeed, the most important part of our work is to get the locations of the vessels. For that we're using AIS data. AIS stands for automatic identification system, which is protocol that vessels are obligated to use, and to transmit their location, and this is one of the sources that we buy. I think one of the challenges that we have is even in this very basic piece of information is to make sure that it's valid. We have a lot of noisy data. We have a lot of other kinds of issues with the data that we're getting. Some of the data is even manipulated, because as Benny said, if you're going to do some dark activity, or participate in a smuggling action, you usually don't want to tell the good guys where you're at, so you're going to manipulate the data in many cases. Even getting this kind of basic data layer of the entities and the players in the ecosystem is not straightforward. That's the basic layer of data that we have.

To add on that, we have a lot of other kinds of data from GIS layers to nautical charts, whether data about companies and ownership data, port state control. We have our domain knowledge with different layers of ports and exclusive economical zones and maritime areas. The secret source is actually taking all of those data points, which are very different from one another, and blending them into one understanding of what's actually going on.

As I said, the very basic layers, understanding those entities and who are the players and where they're located over time. Then we'll try to take and understand the behavior of those entities and add what we call a semantic layer of activity, as Benny said earlier. That's the entry point for a whole new world of models and AI models, which you have from different kinds, we have from good, old specific rules to machine learning and deep learning models, which basically gives us the understanding of the risk that each vessel has, or gives to the top clients.

[0:12:40] LA: You import lots of data from lots of different sources, and that's probably your primary source of data. But do you also collect any data yourself? In other words, do you have sensors of your own that you use to collect data, or is it all data from other sources? Like you say, your secret sauces correlating all that data together.

[0:13:00] LR: We do have our own data, which is both, let's say, that we gathered it over the years about, for example, polygons of the world and our understanding of ports. We also have our own data about ownership. We have our own collection mechanism for getting and managing the relations between vessels and the different kinds of ownership we have for each one of them.

Other than that, we have a lot of different other sources that we have, we blend and we need to fuse from different sources, because we need to have a lot of redundancy. As we said, we get it from different sources. We have that on our side.

[0:13:40] LA: Now, let's talk about the dark vessels, or the dark activities for a little bit here. Obviously, legitimate ships want to be detected. They want to be tracked. It's important from a logistics standpoint and all that stuff. You track illegitimate activity as well and apply that into your AI as well, which means you're tracking entities, ships, etc., that are not legitimate and don't want you to know where they're located.

You're using different techniques to find those vessels. I'm assuming that's things like satellite images, but also, probably port data and other things like that. How do you do that correlation? What are some of the tricks, without giving away what you guys do? You know what I'm saying? What are some of the things that you can do to track illegitimate, or dark ships?

[0:14:31] BK: I think that eventually, as I said, we can tell you everything, but then we'll have to probably do something to you, so you won't be able to tell it. In general, as we mentioned before, there is an expected behavior from a vessel. A vessel if you think about that, it's like an economic unit. If it's not working in an efficient way, something is fishy. If you're like a taxi driver and you need to take someone from point A to point B, and now you're in point C. You want to get to point A in the best and most efficient way, because you're not earning money on this part of the road, because until you're going to pick up the passenger, this is on you. You want to do it in the most efficient way, the shortest path, and so on and so forth.

If you're starting to do all kinds of detours and stuff like that, something is very fishy. This is like understanding the behavior of what is the expected behavior of a vessel and trying to identify things that conflict that. If you see a vessel that's supposed to go on a route from two points and suddenly, they take a lot of time to get to some point in the middle, you can assume that, okay, maybe they're not really going that way. Maybe they're doing some detour. They might shut down their transmitter of AIS. But this is old school, because if you are shutting down the AIS transition, it's like waving a flag, "I'm doing something that is not legit. Watch me. Watch me."

The more sophisticated vessel that want to do something that is not legit, they are not telling about that. In that aspect, they keep transmitting, but not necessarily they're going to transmit the right information. If you want to go to a specific place and you're going to take your cellphone with you by locating where your cellphone is, I can know where you are. But if you're going to give your cellphone to your kid and tell him, "Okay, you go over there," and in the same time you're going a different way, I can know where your cellphone is, which is with your kid, but I have no idea where you are. You keep transmitting information.

[0:16:56] LA: It's just the wrong information.

[0:16:58] BK: Exactly. This is a simple way to keep transmitting information, but not related to what you're doing. Obviously, there are more sophisticated ways to hide what is it that you're doing. Luckily for us, we are smarter than at least the things that we know about. Okay, we are not -

[0:17:17] LA: You're smarter than the bad people you know about.

[0:17:18] BK: We can't say that we know everything. Exactly.

[0:17:23] LR: As always, it's a competition between the good guys and the bad guys. You always need to be a step ahead and do something new. I think that it's important to understand, or one of the core understandings that we have is that very much like human being, a vessel had its own pattern, its own behavior. More than that, it has a similar behavior to similar vessels.

One of the most important things that we have in our system is the similarity between different kinds of vessels. Between the vessel to itself and what it did in the past and between a vessel to similar, or to a very much like other vessels. Of course, being an economical unit and knowing, or predicting what those vessels should do, versus what they're actually doing is the key to understand what's legal and what's not.

[0:18:18] LA: Makes sense. Makes sense. You obviously have a lot of knowledge and information. Your product is knowledge and information. Who are your customers?

[0:18:28] BK: We have a wide range of customers, starting from the early days of Windward, which could be like, government related. It could be a Navy, it could be the Coast Guard, all kind of law enforcement. As we evolved, we have more on the commercial side, it could be major oil companies, cargo, whether it's a dry bulk, or wet bulk. It could be containers, whether it's the ships that the vessel that carry the containers, like carriers, or the vessel owners that move cargo from place to place. It could be banks that wants to understand their financial viability of a specific vessel, if they want to finance that. It could be an insurance company that want to understand what is the risk profile of a specific vessel in order to understand what is the premium that they want to charge them for how they behave.

If you think about the behavior of a vessel, if you have a drunk captain, most likely, you don't want to ensure him, or you want to take a very high premium. You want to understand the behavior of the vessel. According to that, define the premium. It's different from very, very aspect of how you look on the system and the risk profiles, different risk profiles that we provide.

[0:19:47] LR: I think that that's one of the most interesting parts about Windward, is that the same data can be used in different points of views and be interesting for different customers in different aspects. It can be oil companies that are very much interested in the same data that the United Nations, for example, is interested in, in order to, for example, draft sanctions over North Korea, or things like that.

The same data layer that we have and the same source that we talked about and the same activity that we have and the risk models can be used and can be analyzed from different point of views, and for different customers in different verticals. That's really interesting.

[0:20:28] LA: Just even listening to you, this is interesting. It's like a whole new world that I've never been exposed to is all of this that goes on with global shipping of goods and services. There's so much that goes on in that space that you don't think about. I wouldn't have thought of the insurance aspect, for instance. But you're right, that data is incredibly valuable to insurance companies and banks who are underwriting these ships and the large corporations that are actually doing the transportation.

Of course, the one that makes the most sense, at least to me initially, before this interview was the governmental aspects of piracy and all that stuff. And are tariffs being paid correctly and all that stuff. You can imagine all of that. You're talking about a ton of data here now. How many ships are there, by the way? How many ships do you track? I mean, I'm sure you get that question a lot. I'm sure, the answer is very vague, as far as what actually exists. But how many ships in general are we talking about here? Thousands, hundreds of thousands? Am I way off by orders of magnitude in either direction? What are we talking about?

[0:21:38] LR: I think it goes back to the very definition of what is a vessel. Because the same vessel can be used in different aspects by different facilitators, different times of history. You need to define whether it's the same vessel or not, because it has now different behavior and different use. We're talking about hundreds of thousands of vessels operating in the world at any given time.

[0:22:01] LA: I'm assuming this is vessels of a certain size, or above. I mean, my pontoon boat isn't being tracked by you, but what size vessel are we talking about here is the smallest vessel in general?

[0:22:13] BK: I think the smallest vessel is about maybe 15 to 20 meters.

[0:22:17] LA: Oh, that's small. Wow.

[0:22:19] BK: Yeah, that's small. I think that every vessel that is transmitting data, using the AIS receiver will be tracked by us. We do have some pontoons, maybe not yours, but some others. We have a lot of small vessels, as well as very big ones.

[0:22:35] LA: Got it. There's certain legal requirements for vessels of a certain size, I'm assuming. My pontoon doesn't have any requirements when it's on my lake in my backyard. Obviously, anything I'm assuming that's ocean going worthy is required to have, be tracked. Is that correct?

[0:22:54] LR: Yeah. It's usually vessels over 300 gross tonnage, that goes to international voyages that are obligated to use this protocol, which is basically used for safety measures, because when you're in deep seas, you don't want to crash into a different vessel, if you can't see. It's kind of a radio transmission that every vessel can transmit and receive and understand the vessel that you have around you.

[0:23:20] LA: It's not just commercial vessels, but it's also privately-owned vessels for entertainment purposes. But also military vessels? Or is that a whole different area that we don't get into?

[0:23:34] BK: We do have data on some vessels that are military vessels. We don't have any classified information within our system. We rely on publicly available sources.

[0:23:47] LA: Okay, so it's all publicly available information. Where the fifth fleet is located is not something that you're going to be able to get from your data, which is fine. I happen to like that. But it might be illegal traders, it might be commercial ships, it might be recreational crafts, and some governmental crafts, anyway, depending on the purposes, etc.

Anyway, we're talking about tons of data here, hundreds of thousands of vessels. I'm assuming megabytes of data per vessel and potentially, and we're talking about tons and tons of data here. Data management has to be a core part of what you guys do. I know you've made some changes recently in the data management side, and that's one of the things we want to talk about. Let's talk a little bit about data storage and data management. You've changed database strategies recently. Can you tell me about what you were using before, what you're using now and why you made that change?

[0:24:45] BK: Windward has been around for about 12 years. We had, as you said, a lot of data. We have a lot of, let's say, legacy databases that at some point of time, we got on the understanding that we can't use them for new use cases. They're limiting us. We needed to do some a shift from the traditional databases that we used, like Mongo and PostgreSQL and Elasticsearch and Cassandra and similar databases to something that can be more robust, can give us our analytical usage, can support it better, which can answer our very unique requirements that we have.

[0:25:29] LA: What are some of those requirements? What specifically makes it hard for databases, like PostgreSQL, or MySQL to do that work for you?

[0:25:37] LR: I don't think it's one requirement that we have, but the combination of the different requirements that we have. First of all, our data is mutable. Unlike a lot of other systems, our data changes, and our understanding of vessels and what they did can change over time in retrospect. Once we get new data, we can actually understand whether they're past. We keep, mutate and change our data. That is a big challenge that we have, and that every database that we want to use, or we may want to use should support.

[0:26:14] LA: Just to make sure I understand this correctly, you've got all this historical data about what's been going on with the ship. Based on the information you just got, you might have to adjust what that history of what the vessel actually did was. I think I used too many adjectives there. Basically, you change history. Not just changing the location of where the ship is located right now, but you go back and say, "Well, we thought it stopped at this port, but this data suggests that it didn't stop, or stopped someplace else, or did something else." You adjust the history of what the ship did based on that new data. That's what you mean by mutable. Is that correct?

[0:26:54] LR: Exactly. Yeah.

[0:26:55] LA: Okay. Continue from there. Thank you.

[0:26:57] LR: Other than that, I think that data mutability has many challenges. One of them being that denormalizing the data is something that becomes very difficult. Because if you want to have different copies of your data in a denormalized manner, you will need to keep updating it. Managing the update process for so many different dimensions of the data that can change over time becomes a real burden. That's why we need, like good, old days, we need support for joint operations between different entities. The data should be normalized, because we cannot handle all of the updates over time for the data that is mutated.

We need, of course, care fencing support, which is not straightforward. We want everything to be sub-seconds. Taking all of that requirement set makes our choice difficult, and the candidates list pretty small.

[0:27:56] LA: Which databases did you look at and ultimately, we're going to get to which one you chose?

[0:28:01] LR: It was back in the good old days, before ChatGPT that we actually had to go to Google and to ask our friends and colleagues and listen to podcasts and try to find this list on our own. We didn't manage to just ask ChatGPT, "Please, give me a database that answers all of these requirements," and so on.

We're examining different databases with different flavors from CockroachDB and Snowflake and Firebolt and RocksetDB, which is the one that eventually we chose. We had a few more. Benny -

[0:28:40] LA: DynamoDB, probably, from AWS.

[0:28:44] BK: Of course, DynamoDB. Atlas.

[0:28:47] LA: Atlas.

[0:28:48] LR: Druid, Redshift, all kinds of different databases that can be either data warehouses, data lakes, databases. Different kinds and different aspects, because each database, of course, has its own pros and cons, and we needed to have some unique blend that will fit our needs and will make our future use cases that we're not aware of yet, become possible and make our development fast and our data layer accurate and consistent.

[0:29:21] LA: What you chose was Rockset. Is that correct?

[0:29:23] LR: Yeah. Eventually, we chose RocksetDB from the founders of RocksDB, which is an analytical database that supports all of our requirements. I heard about it for the first time in a podcast in Israel named the Data Swamp. They talked about one of the companies in Israel that is doing a POC with this database and that I was intrigued. we started discussing and reading about this database. We met with their solutions engineers and with their CTO and engineering team. We found a partner. With time, of course, we needed to go through a lot of deep validations and comparisons and define the entire process. They were with us all along, hand to hand.

[0:30:11] LA: Have you completed the migration over to them?

[0:30:16] LR: I think it depends on how you want to define completed.

[0:30:18] LA: How you define. Yeah.

[0:30:19] LR: It's always like that. There's no easy answers. What we did is a separate, or define our world based on the entity. We have domain driven architecture and we moved one by one the different domains from the original database to Rockset, which made our lives much easier. One by one, we can move more and more workflows and these cases to be on top of Rockset.

[0:30:49] LA: What was the biggest challenge in those migrations?

[0:30:52] LR: I think even before the migration, we had many challenges with the comparison itself, because obviously each database, as we said, have its pros and cons. The results of your tests may be very much affected by your test set, or the specific words that you are going to choose. That was a major challenge that we needed to face in the very beginning. Even before narrowing down the list to a few databases that we tested thoroughly, we needed to carefully decide what we want to test and how we're going to do that.

For that, we had a dedicated squad that had the representative from the different teams and groups within R&D. We made sure that we are going to cover everything that we have in our system, so nothing will be left behind. That was the first and one of the biggest challenges that we faced. Once we decided to migrate, I think one of the biggest challenges was that we needed to move, or to adjust our code to being domain driven.

So far, unfortunately, we didn't have well-defined data access layers. Most of our clients simply read directly from databases. We needed to go and have a deep understanding of each one of the flows and tune it accordingly. There will be one service that is going to read the data to be aware of the entity that is under the hood is managed in Rockset. To tune all of the usages, all of the queries, all of the performance issues, be it the UI flows, or a spark that job flows, make it use the new technology with the best possible performance.

Of course, that synchronous flows have very different behavior and requirements from spark jobs in which latency is less important, but your ability to parallelize the load is something that you need to consider.

[0:32:56] LA: I'm assuming, given the nature of the data that you receive and the customers are using it, you had to do all of this while your application was live, right? You couldn't bring the application down to do a data migration for a few days. You had to keep collecting data, you had to keep giving data to your customers. Did you have to do any downtime to make this transition? How did you manage the transition from a availability standpoint?

[0:33:23] BK: If you think about that in a way, this is like changing the engine of a boat while you need to keep sailing. You need to make sure that the engine is the same in terms of it has enough power to drive the vessel forward, that you can connect it to all the other systems on the vessel, so everything will work smoothly, and so on.

Thinking about everything that we had to do, taking everything that we all said so far, we need to make sure that during this transition, we are not losing any information. Every data point was written also both to the original database as well as to the new one. We need to make sure that it's not only it's being written to those two databases, but it's the same one, because as we mentioned before, we have different processes which not only append only, but we also update records and we delete records. It's not enough to write the new record into places. If you update that, you need to make sure that it's going to be updated on both ends.

Creating a lot of monitors and understanding the discrepancies between the databases, making sure that we have enough time that we run all the data, which is in sync on the two databases that we can transition every single application from one database to the other one. It's not something that you're doing in a single day, or a single minute. This is something that you are migrating application as you go. Every additional day, or two, or three or weeks, another application moves, and some of them working with the old one and some with the new one. Everything needs to be in sync. Otherwise, you're going to have tons of discrepancies in the system.

Taking that into account, taking the general load on the system, taking all the reasoning of why we did this migration to begin with, because we had our database didn't manage to support all the use cases, and so on. We're creating additional loads on top of that. Doing everything together, I think this is the complex part of this project. Understanding when this is okay to shut down the old database and you can fully be dependent on the new one to make sure that you have backups. Every single aspect that you can think about. "Okay, did I have a backup for this one, or I forgot about that?" You need to cover every single aspect. This is a very specific process that you need to make sure that every single aspect is being understood and measured and monitored and alerted. The combination of the complexity of all those aspects together.

I think in general, if you ask what is the complex thing, this is the complex thing. The collection of all those small things that you need to keep checking and checking again and again and again.

[0:36:19] LA: Makes sense. Yeah. That's often true with most data migrations, is your first job is to make sure you can keep data synchronized as much as you can. Well, you can't always do this, but as much as you can, keep data synchronized across all the different sources. If you're able to do that between the old and the new, I mean, if you're able to do that and keep it synchronized both directions, it makes the rest of the migration a lot easier, or at least a lot more possible in some cases. Because you can move things back and forth and have it work equivalently on old technology, or new technology.

Not all applications can do that level of synchronization, but in cases like what you're doing, I can see how that makes sense. You have a tremendous amount of inbound data that you take in and process, and then a separate set of activities to use that data and provide information to your customers, etc. I imagine you can import it in multiple locations and keep it synced that way and other things like that. Are there any specific strategies that you use to keep your data synced that were unique, or different than the norm?

[0:37:27] BK: Before touching this point, I think that Lior mentioned that before. Since we have part of our system, the ability, or more than that, the need to recalculate historical data, our system is working in two modes. The first one is the ongoing, we get new data, we calculate whatever that means, we add additional data points, and so on. At the same time, we have a different mode that the system is working, recalculating everything from scratch, bootstrapping everything. You have all the raw data and now assuming that you have now all the raw data of the last two years, now you can make better decision compared to what you could have done a year ago when you only had one year of data.

We keep updating the data in an ongoing way, but in parallel to that, we keep doing, recalculating of everything that we have in the system in a periodic way. When thinking about those two modes and transmitting everything to a new database, it's making things even harder. You need to make sure that everything is being done on both sides, whether you're going to do it twice, or you're going to have the source of truth and then replicate from that to the new system.

One of the things that we did was defining, we have the current system as this is the source of truth and everything is going to be migrated from this system to the new system. There is inherent latency within that process, that we know that eventually, everything is going to move, but we are not going to do two things in parallel and keep understanding whether we have everything in sync, because it's really hard to do that at a specific point in time, because the calculation are going on all the time. You can check and compare two things, but one of them was already updated, the other one didn't. So, is it okay, or it's not okay?

In addition to that, we get to the understanding that it's not going to be something that is complete equal on both sides, and we define thresholds what is close enough to define that we are good? Again, because of the nature of the data that it's keeping updated and so on, you can get to the point in time that you're stopping everything and you have a data that is frozen on one side, then migrated to the other side and then you can compare and get a perfect match.

A system is running. As I mentioned before, we didn't stop the system for a second and everything was while getting new data, while users are doing additional changes, and so on and so forth. We had to get to a decision, saying that if the differences between the old and the new is less than 0.01%, this is good enough for us.

One of the things that the team did is created those measurements for each kind of entity type, because not everything is the same. You need to have different measures for different aspects of the system. I think these are the main tools, or strategies that we used.

[0:40:47] LA: That's a technique that would be very useful in your case, where a little bit of inaccuracy isn't going to kill you. You're not like a financial institution where a few hundred billion dollars here or there isn't going to be a problem. You can't do that in some use cases. But your use case, that works very well with. That helped you be able to migrate the source of truth from one to the other with as little variation as possible, but it didn't have to be zero variation.

[0:41:18] LR: The zero variation is not necessarily good, and changes are not necessarily good, because when you look at the data with fresh perspective, sometimes you calculate the data better than what it was before. If you have a different result in the target database, it's not necessarily wrong. It might be even better than what you had before and that makes the comparison process even more challenging, because it's not that you have something that is concrete and like the source of truth of all truth. It's challenging your source of truth with maybe new decisions, or new processes, or new data that you have along the way.

[0:41:57] LA: A variation from the source of truth is potentially good, not just bad.

[0:42:03] LR:  Exactly.

[0:42:04] LA: That makes sense. This is great. This is great. Obviously, there's a huge geopolitical involvement in the sorts of things that you do. How has the current geopolitical climate, I'm talking specifically, the wars in Ukraine, the wars in Israel, the war, or the situation of for trade with China and the North Korea, how does all of that affect what you do?

[0:42:31] BK: We can look and let's take for example, the war in Ukraine. As I mentioned before, we have different types of customers. Let's focus in this example on the commercial customers. Let's say, I'm an oil company that I want to transfer oil from, or buy oil, or ship oil from one place to another place. As part of the war in Ukraine, the EU and the US define different sanctions on what is legit in terms of trading. Is it okay to buy oil from Russia, or not? Is it okay to use specific vessels with specific flags, or ownership to do this business?

What we provide to our customers is the ability to understand and mitigate the risk around whether they want to do business with an entity that they know for certain that is, let's call it clean, or they should avoid doing business with vessels that are sanctions, whether because of someone added that specific vessel to a specific list of sanctioned vessels, or because of that specific vessel violated some of the sanctions and some of it could be, again, behavioral aspects.

If you are not allowed to take oil from a port in Russia and we have an understanding that that vessel was indeed taking oil from a port in Russia, which means that it's comply with the sanctioned enforcement, our customers would like to avoid doing business with that vessel, because they otherwise, are going to be exposed to such sanctions.

The situation in Russia, and this is something that is happening all over the world in different situation for different reasoning, whether this is war, like physical war, or economic war, or whatever political issues that exist and unfortunately, in our world, we have many issues all over the place. Things keep evolving and things that were okay today are not going to be okay in two days from now. Part of what we are doing is adjusting our understanding of what are the legal aspects of every new definition of what is legit and what is not. Baking that into our system, adjusting all the measurement that we are doing, how we calculate risk and exposing that to our users.

Eventually, it is the decision of our user to decide whether they want, or don't want to do business with a specific entity. We provide our assessment according to all the laws, all the guidelines provided by the UN, by the US, by the EU, and so on. We provide you the information, you eventually need to make a decision whether you want, or don't want to do business with a specific entity.

[0:45:35] LA: It makes sense. This has been a really intriguing conversation. I hate to say we're out of time right now, but I want to thank my guests today. My guests have been Benny Keinan, who's the VP of R&D at Windward AI and Lior Resisi, who's the data platform group lead at Windward AI. This has been a great conversation and I've learned a lot from this on an area I really didn't know a lot about before, and I appreciate that. I want to thank you both for being with me today on Software engineering daily.

[0:46:05] LR: Thank you very much.

[0:46:06] BK: Thank you.

[END]