EPISODE 1730

[INTRODUCTION]

[00:00:00] ANNOUNCER: Google needs no introduction. And is renowned for its data and analytics capabilities. Gerrit Kazmaier is the VP and GM for Database, Data Analytics and Looker at Google. He has a long history in this space. And in this episode, he speaks with Sean Falconer about data and analytics in the AI era. 

This episode is hosted by Sean Falconer. Check the shownotes for more information on Sean's work and where to find him. 

[INTERVIEW]

[00:00:37] SF: Gerrit, welcome to the show. 

[00:00:38] GK: Thank you. Great to be here, Sean. 

[00:00:40] SF: Yeah. Thanks so much for being here. I've been looking forward to this. Always good to connect with - I'm ex-Google. But you're at Google now. But I still have a ton of friends there. Go visit the campus once in a while. A lot of fun memories there. But before we get into the weeds, why don't we start with some intros and we can introduce you to the audience? Who are you? What do you do? 

[00:00:58] GK: Right. My name is Garrett. I head Google's data and analytics services, the engineering and the product management. Before that, I worked at a company called SAP, heading their data technology. And this is where I started my career as a junior developer way back when in their databases team. I have been with and around data all my professional career. And, well, really enjoying it.

[00:01:25] SF: Yeah. You kind of grew up, I guess, in some sense career-wise at SAP. And you were there for quite a number of years. What was that journey? I mean, you must have saw a lot of growth at the company over the time that you were there.

[00:01:36] GK: Yeah. 10 years. As I said, I started back then as a junior dev in the database development team. It was just I think, I don't know, the serendipity of timing that I came in on the data side as if it was just starting to develop the first iteration of its in-memory engine which later became SAP HANA. I was part of the in-memory development team, the part of in-memory revolution in the database space really, which when column stores and in-memory I think really made it to the enterprise-grade databases. Stayed there for 10 years working on all engineering functions. 

Eventually, as many or some engineers do, got on the management track. I don't know what I've done wrong to get on that path. But kind of stuck with it. And before I then joined Google about three and a half years ago, I was leading SAPs data technology. SAP ASE, databases, SAP HANA, data warehousing. And now here at Google, leading also the DA services. Similar space. But as you know, different technology stack.

[00:02:45] SF: Yeah. Absolutely. Yeah. I mean, there's times where I certainly have asked myself, would my life be so much easier if I just had stuck as an IC rather than going into managing people.

[00:02:54] GK: The answer is probably. 

[00:02:56] SF: There is of course - yeah, there's a lot of personal satisfaction that comes and pain that comes with that responsibility. For someone who's at your level at a place like Google, it's a very, very big company, what is your sort of day-to-day end up looking like? 

[00:03:10] GK: I guess a little bit of everything, I want to say. Just looking at this Monday when we're recording this podcast, this morning we had both business reviews for specific products and engineering review, an operations review. We also have hardware capacity forecast because of the upcoming Olympics, which many of the companies are using Google products. I think what does a day look like? I think there is a broad range. And I think that's maybe what characterizes the role best in the past on the tech side. It was really being super deep in one or two technical domains. And spending most of the time at a vertical shape. And now I became more of a T, if you will. 

I have a large horizontal bar, which ranges from finance to product operations review. And the day basically coordinates around that entire T. And mostly figuring out how I can help the team to get unblocked, execute, give them resources or support to make sure they all have the environment which they need to do their best possible job. And whatever that entails. Whatever that entails.

[00:04:25] SF: Yeah. Absolutely. You have this rich history in data from your time at SAP to Google. It's hard to find probably. There's only a fraction of companies in the world that kind of have the level of data that those types of companies are working with and the sort of technology sophistication. But in terms of where data starts to intersect with the world of AI, I always said that data is like the love language of AI. Essentially, every AI project is a data project. And we don't have things like LLMs with those massive amount of data, digitized data. 

And when I look at the huge amount of foundation models that have been released over the past year, I think one of the true separators there is data and the quality that goes into those models. How do you think about like data quality, data preparation when you're starting to sort of intersect these different worlds of analytics, AI, and also the data systems? 

[00:05:23] GK: Yeah, it's fascinating. Because it comes back data quality but I think in a different shape in data preparation. In the past, when everyone was kind of developing their own models, high quality data was so important. If you have a bias in your data, you transport it directly into the model like you said. There was a popular saying to say every AI project is a data project in disguise. It's only a question when you're uncovering it. 

And now with these large foundation models, it turns out they're pretty good in reasoning even about poor data. It is not the same anymore. Because you're not really training your own model. You build maybe your own adapters or you're figuring out what is the right prompting strategy for for a given model or for a scenario that you're building. But it turns out models are actually helping you quite a bit in working even with messy data. Models are good at cleaning dirty data. 

But on the other hand, what we're seeing now is that, A, asking it the right question. Your prompting strategy and what context you can give it becomes such an incredible predictor of accuracy and of performance. And now, suddenly, in your data preparation and data quality come in through a new way. What is your rack strategy? What data do you index? Even when you didn't think about given that everyone's data is quite large, how do you reduce that to something that fits even a very large context window of the Google models? 

And what I think right now is even the most interesting application is tapping into new data sources. Models are really good at reasoning about unstructured data, for instance. Specifically, in the enterprise domain, that has been one of the biggest pain points. Because unstructured data pipeline is hard to develop, super brittle. It was really difficult to get any of these scenarios to success. But now with large foundation models like you said, with very manageable efforts, you get really good outcomes for classification entity, extraction summarization and so forth. And now, suddenly, there is like this new sea of data, which is completely in tap. And now preparing that and getting that part of an enterprise data landscape so you can use them in generative AI scenarios. I think that's a supreme factor right now and where I see the most interesting applications frankly. 

[00:07:53] SF: I think every enterprise in the world has historically been sitting on massive amounts of unstructured data that they can't really do that much with. It's just hard to leverage it. And, yes, things like data lake, lake house technologies have been around for a while. But that takes a tremendous amount of expertise and manual work to catalog and build up the metadata associated with it. 

I guess in terms of these worlds coming together, do you see AI or generative AI in particular being sort of the technology innovation that we needed to be able to really truly make things like a data lake valuable? 

[00:08:31] GK: I think so. Yeah. I think 100%. I would be hard-pressed to name in a data space something as significant as the advent of gen AS models or gen AI or agentic workflows. And it does change quite a bit. Maybe in the broad picture I think, like we just said, new types of data become super relevant. Unstructured data is going to be a part of an enterprise data landscape. 

I do think workflows are changing quite a bit. Because you get from assistive functionality to autonomous functionality. A whole different set of capabilities that you can leverage. And if you think about it, really, one of the most scarce resource is actually people time in data projects. And data teams, data engineers, software engineers, those are all scarce resources. And one of the best uses of gen AI is actually scaling them up by automating them or making them more effective. 

And then thirdly, I think a big bucket now for us on the databases side, if you will, is actually re-engineering or reinventing some parts of the data stack. Just one example, which I think is a good example of that, is SQL was more or less the most important language in the data space. And, boy, it was there for a long time. But it turns out that now large language models, for instance, they do struggle with SQL. SQL is quite tricky to generate at high quality. And it's not only about binding intent to structure. It's also how the language itself operates. It's not easy to generate. 

But python, for instance, it's an imperative procedural. There is a ton of training code out there. It's just so much more suitable. And now if you think about it, basically it means we have to execute Python at the same scale in distributed frameworks as we did SQL, which was one of the main engineering efforts. If I think about my career, making SQL work in-memory or BigQuery. Making SQL work in a highly distributed execution model. And now a lot of that flips over to how do we do that with Python in a way which is sandboxed, and safe, and secure to be basically the runtime for gen AI queries or queries generated by gen AI models or agents. 

[00:10:56] SF: Yeah. I mean, SQL is one of those languages that I think like people have tried to kill since it's been invented. And then eventually it just keeps going. I think, eventually, you just have to accept that people want - if you have a database, they want to interact with it with SQL. Maybe gen AI is starting to change that. Can you talk a little bit more, like why do you see a Python essentially as being something that provides more advantages for automatically generating in order to tap into the data? I agree, generating SQL automatically that understands the schema and can execute without errors is a tricky problem for sure. But why is Python something that's easier to do? 

[00:11:34] GK: You know, you're asking a really good question. And I think top notch, bleeding edge research is also trying to answer that. But I think there are some obvious facts. One is there is just a ton of Python code out there which you can use to train models on. And that is far more than SQL code. The second thing is that - which I think really helps Python encapsulates a lot in reusable function and modules. You don't have to generate everything, right? You have to basically just know what function or what library to call. And in SQL, none of these things exists. You don't have much training code and you don't really have the modularity. You basically regenerate everything from scratch. 

And then I think there are some more hypotheses of declarative language versus imperative language and how that plays factors in generating code. I don't think this is proven out yet. But I think, clearly, just the amount of code and the reusability of packages is beneficial. Plus, there is one factor, which I do also think is quite helpful in the control flow. And Python is clear. And usually, basically on the data side, you create a data frame, you apply operations, you create a data frame. And that's fairly linear. And in SQL, everything is more like a giant ball. You can write it, which is much more intertwined but much less close. You have a line of coding SQL at the end impacting everything at the top again. And I think that just makes it hard.

[00:13:04] SF: Right. When you think about writing something like Python code to do some sort of analysis, it's going to be probably broken up into really small sort of statements. And you can tap into lots of libraries that are deterministically going to run and execute. You remove some of the non-determinism factor that you might have when you are interacting with an LLM. Whereas like SQL, you could have something that's like a thousand-character SQL statement that has a bug somewhere at the 497th character. Something like that. And it's a little bit difficult to disentangle versus be on the point to that single line of Python code that's not executing that needs human to come in and maybe adjust and fix. 

[00:13:42] GK: That's right. Yeah. That's one of the fun challenges of how do you execute Python at the same scale as SQL for data agents. 

[00:13:50] SF: Is this something that like Google is investing time into building new products around?

[00:13:57] GK: Yeah. A lot. Well, dare I say, the most exciting part of the job because it's - I guess if you're a product builder - and my career was always about building products, writing code myself earlier, and then building products, which is just so exciting when you have a big technology shift. And I frankly think everything in the software industry is kind of a reset almost. I think SAP specifically, having seen how difficult it is for organizations to work with data to solve business problems and what challenges they experience. And also, at the Google side, seeing what potential the technology holds I think ultimately is, A, big change. And, yes, we are working specifically on, "Hey, how can we rethink BI? What will change when we have assistive functionalities or even partly autonomous behaviors that can address typical weaknesses in human analytics? And the problems are clear to everyone. One is we all carry our biases. And self-service BI was mostly challenged by people finding what they were looking for, right? Basically, proving their own beliefs. 

Or another big problem of BI is you just have this tiny canvas of a dashboard, which means you have to build these very big aggregates. Otherwise, for humans, there is no comprehension anymore. But many of the really interesting signals are at a much more microscopic level, you almost want to say. But it's really hard for people to analyze it like that. Even if you think about real time and real-time monitoring and action systems, the problem just exacerbates. And this is all an area, these very strong foundation models come in. And I think for the first time offer credible technology to solve them. 

[00:16:03] SF: Yeah. And I can see how - I think one of the challenges with traditional approaches to BI is someone goes and does a bunch of work. they produce a report. They come into a meeting. They show the report. And they got 20 reports a show. And the meeting never progresses as fast the first slide or representation of the report because people start asking questions. And a lot of times, you maybe didn't think of that question beforehand. So you don't know necessarily the answer. It sorts of grinds or slows down the learning cycle and they have to go off and produce some sort of new report. And that delays the project. But if you can suddenly make all this stuff much more adaptive and faster to iterate on and faster to ask questions in real time leveraging the LLM to generate new reports, then this becomes more of a brainstorming exercise that you can do in real time and really speed up those learning cycles.

[00:16:52] GK: I totally agree. And I think most organizations, I would even say all organizations I know, it's not like they are running out of questions. They're just running out of time and resource to answer them, which means that many of the questions never get answered or asked even. And even more so, when you look at the activation rate of data in companies - how much data is actually being used, which is part of a decision process or an application workflow, it's a small percentage. And if you have these unbiased autonomous agents, they can basically harness all of them. All of that lost signal. And what you said, they can also really help people coming to better decisions faster because they can get to clearer answers and without going through a data analyst who's then doing a custom modeling project. 

[00:17:43] SF: Yeah. And I think, also, because a lot of times it's easier for us to sort of build reports or do analysis around purely quantitative metrics, we sort of do a disservice to qualitative signals. Because it takes more effort to essentially dig into those things. But there's a lot of value in doing sort of thematic analysis that you might do in social sciences or even in disciplines like human and computer interaction where it just takes a lot of human power traditionally of digging through transcripts and trying to find patterns. But, suddenly, if you can actually leverage generative AI to do some of that pattern matching and thematic analysis, you can really start to take advantage of some of this sort of qualitative signals to inform what you maybe you're looking at from a quantitative perspective. 

[00:18:33] GK: Yeah. I like that. And I think you one of the best ways to describe this is it was all about big data for a while. And big data mostly meant just having a lot of it. And I think that with gen AI, what we learn is it's not about the big data. It's more about the wide data records. Exactly to your point, having the ability now to tap into other data signals which are more qualitative in nature. 

For instance HCA Healthcare comes to mind. One of the customers that we working with. And for them, it was really important to take their patients data and combine them with imagery and other medical records which are not part of the structured patient data anymore. And really come up with that wide patient record that combines all of these signals exactly to the point. That allows them a much deeper reasoning actually. 

[00:19:22] SF: Yeah. It gives you sort of the ask secondary questions beyond just what you get from the statistics. Where do you think things like traditional databases? We talked about BI tools. But what about like transactional databases, analytical databases? Where are they kind of fitting into this transformation that's happening with the development of the AI stack? 

[00:19:44] GK: I think they're tremendously important for two aspects. And one is, in the beginning, you said that these large foundation models, they compress so much of the world's knowledge. But they compress nothing of any enterprise's knowledge. Basically, they are not aware of anyone's specific context. And that context, that state is usually actually stored in an operational database system. 

Big element is when we talk about something like RAG and vector indexing, it's actually taking these operational systems and making them interface with foundation models. I think that's huge part of that. And the second part, it goes a bit into the direction of Python as a data language. It depends now on what your hypothesis is on how the applications will change. My take is that a lot of what has been written in application code was actually more flexible and better done by an agentic workflow by AI models, which then also means these models, they need to interface again with a database to retain state, to reason about state, to basically progress any type of workflow again. 

And so, that means that these models will operate on operational databases. Now they are stateless and you prompt them and you give them an answer. But when you think of them being used in the workflow, basically everything that we know from an application of which the database is such a critical part starts applying again just in a different way. The interfaces change a lot. The metadata changes basically how the architect and application changes. But the database is going to be a crucial part of it. And one block. That's why also Google developed AlloyDB with such strong AI capabilities to be the operational database for all of these future-oriented workloads. 

[00:21:38] SF: Yeah. I mean, I think that historically we've kind of seen database systems, analytics, AI is like these like siloed separate things. And even if you look at things like RAG and the way that we do that now, it's like the LLM doesn't know anything really about like the vector database. And the vector database doesn't really know necessarily anything about like the LLM. We're just kind of like smashing them together through some orchestration code that we've written and jamming that into the context window. How do we start to bring those worlds together? 

[00:22:09] GK: Yeah. I think this is really speaking to the next step in the evolution, if you will, where today it's mostly about integrating. And like you said, there is the context window or maybe some plugin concepts, information retrieval. And that's the API we're coding against currently. But what's on the horizon is really, right now, most of the stuff is let's just call it assistive. Assistive in the sense of application of gen AI and more function calling, if that's a good metaphor, if you use it as part of an application. 

But as we go forward, the next step is going to be, well, let's rather express this as a workflow, a series of steps. And you got to give it context. Basically, you need to make sure what you said changes from I don't know anything. I just know how to make a call to a vector index to basically get likeness for a keyword. I actually understand the database schema I'm working with. And I understand reasonably well the semantics of that. And I understand what are, if you will, golden prompts or golden steps that I can use to operate on top of that. And permissioning and having an evaluation function to understand what is progress looking like torwards the goal. 

And once you basically take that step from current integration to - it's called agentic workflow. This idea of I have instructions and I have a goal. And I have an autonomous model making progress towards that goal performing a predefined set of steps. You naturally get to that step, that context that I was just talking about changes from prompting the model to actually table schema, evo function, permissioning into an integrate part of the data stack itself. 

Metadata is a primary example of that. We need to make sure that the database or the data schema encodes much richer semantics for model to understand it. Something as benign as synonyms, right? What is that thing called in different ways? So you know what to find when you're looking for it. 

[00:24:26] SF: Is AI starting to impact the way that data databases get developed or built themselves? Are we seeing new innovations introduced in traditional databases that haven't been there essentially despite 50-plus years of research? 

[00:24:41] GK: Oh, yeah. Definitely think so. But you also made now a key. You ended your question with what we haven't seen in the 50-plus years of research. You asked a broad question. I mean, we definitely do see things that haven't been as relevant before. We just talked about different runtime languages than SQL. I think that's a big shift. Also, we are architecting the system differently because we are designing it for different actors and constraints. 

What I mean with that is just take as an example, typically in databases and analytics, we considered messaging and streaming something which was more sitting on the side. You had a system like Kafka or Google Pub/Sub and data flow being more on the fringes and ingesting data. But we're not really seeing this as a part of a database system. This is streaming is something else. 

But now with the shift that we talked about, you don't have a dashboard being updated once an hour. But you have agent who is basically operating in real time. Suddenly, this state that you have in the database plus the stream that you have become super relevant to be accessible to that agent at the same point in time. That was one of the reasons why we developed a streaming engine inside of BigQuery under OneSQL interface. So you basically have both data movement and data at rest together accessible by one query. And that enables then for instance a fraught model, right? You basically have someone doing a transaction online. You also need to query pass behavior to get some qualification. Does it good look or bad? And then you have to emit an event again to basically drive the application. That's just one example there. 

I think there's a big shift in what we say, "Hey, this is a database or not." And integrating these concepts with each other is quite substantial. Another example would be the use of unstructured data, which in the past we really thought about I think file system as, "Oh, permissioning. Or an object. The directory level is fine." Just having a blog reference and a table unit, that's fine. We don't need anything beyond that. But now with databases as the foundation for AI model, suddenly, you get to the point you're saying, "Actually, I do want to have one security domain across structured and unstructured data. I do want to have consistent metadata across. I do want to have fine grain access controls over data in object storage. And, basically, all of that maybe is not entirely new groundbreaking research. But it really evolves our thinking. And what is a database architecture? What does a database do? 

[00:27:34] SF: Yeah. There's been I think more and more sort of consolidation of things that maybe historically had existed in different systems down in the database. Like you had even at the base level, a SQL database, N SQL database. And then NoSQL had to introduce SQL kind of going back to what we said before. Because everybody wanted to use SQL despite what the NoSQL database was there for. 

But then you're talking about bringing streaming to the conventional database. We're also bringing - we had specialized vector databases. Vectors are now existing. And, essentially, all these days, different databases can bring unstructured data to the world of structured data and issue one query because there's a lot of value instead of having these things kind of consider separate entities. And it seems like more and more consolidation is happening at the data layer. 

[00:28:19] GK: Yeah. I think all of these functions are becoming a critical part. And there is an interesting dichotomy almost. Because at the same point in time, we're also unbundling functions. You mentioned, beginning at the lake house, external storage. Thinking about open file formats. It's interesting, because both is happening at the same point in time that, the engines, they become more complete. Being able to express much more of the value chain of data, if you will. 

And the flip side of that is an increasing trend that basically customers are saying, "Because of the importance of data, I want to manage that in dependently of any specific data use." Both things were happening at the same point in time. 

[00:29:04] SF: Yeah. We've been talking a little bit about AI agents. I think that's like a topic of conversation that's been a big topic of conversation this year. Kind of like how RAG was really - I mean, it's still a thing. But, of course, it was like the big thing last year. Everyone was talking about RAG. But was there particular breakthroughs that happened that has led to people thinking about what that next AI application is beyond just the copilots or chatbot system? 

[00:29:31] GK: Yeah. Of course, pre-text, I speak about the data domain. When I say agents, I specifically talk about data agents. And I think you know some of the things that I think are for us breakthroughs may not be applicable to other agent domains. Of course, the big breakthroughs for us were the base capabilities of the models itself. Undeniably, a big breakthrough. And serving efficiency and doing all of that at scale. But I think on the data side, what really has been the biggest enabler was evolving our thinking on the system architecture to support it. 

Specifically, in the data agents, working with data is really tough because it really goes back to that base that a large model is trained on public data does not know anything about your enterprise data. Now, you want it to start - interact with it. If you take a simple example, you want to build a data analysis agent, right, which you just give data and the agent should figure out how to wrangle it and shape, how to analyze it, how to summarize its findings and then present it right; something relatively simplistic if you will, at least at the surface.

It already starts with how do we teach the models how they should operate with a given set of tables. It starts with understanding, which I think is one of the most important parts is that with these data agents, it's not about getting them perfect in the initial shot. But you want to develop and iterate them really, really fast. How do you get from a completely untrusted state through a governance process to a fully trusted autonomous state? You can think of that as like a control room, right? The agent does something. It looks in it. The data agent does something. It looks basically at the action it wants to perform. It checks into its own repository. Can I execute that query on a data schema, right? Because security, right? You want to make sure it's a safe query, for instance. 

If the answer is yes, you know it can operate it. If the answer is no, you know it basically goes into a control flow to a human, to a data analyst, who can say, "Yeah, that looks right." Or, "I'm going to modify it a bit. I'm going to change that," and basically makes that part of that trusted corpus again. That is just an example that tells you how do we need to think about developing these data agents to move them progressively from initial to semi-autonomous, fully autonomous over time. That's one example where I think it's more about changing how we from a system architecture perspective think about developing these systems. 

Another example would be like where we started our conversation and saying what are the right languages, and how do I give agents? Agents, maybe we should break it down, right? For me, agent means that there is a workflow, right? You have the idea of what the structured set of steps as an agent is ought to perform. You do have an objective, right? You do know what task completion looks like. You do have a way of measuring progress and to give it context to operate in, right? I think those are the building blocks. 

In order then to make the agent architecture work, you do need things like the control for the governance aspect. You do need to develop tools that you basically give an agent as reusable building blocks like how do I wrangle data, what does that mean, how do I analyze data, what is aggregation and so forth. That becomes basically this compostable set of building blocks through which you then can assemble an agent that automates ideally a full workflow end to end. As I said, I think really that governance flow that progresses in maturing them is the key. 

One terrific example is I'm sure the listeners of the podcast, the data fans, they know something like Bird SQL. Bird SQL is a benchmark for measuring models in the accuracy to take a theory and generating the right SQL out of it in one-step operation. That shows that state-of-the-art I think right now with many sharding and other attempts, it's like, I don't know, 68, 69 percent accuracy. We would agree that's too low, right? You don't want to be wrong 30% of the time, not in an enterprise context. 

What - your question on breakthroughs is taking these powerful models and now constraining them in a new system architecture. For instance, telling them, "This is the schema you operate on," specifically it's not the endless sea of possibility. This is the schema. This is metadata annotations that will allow you to understand is this trusted operation a trusted metric. Has someone verified that you can calculate it like that? Or is it something that needs to be checked and is potentially unsafe? 

Teaching a model, as I said, to operate various tools that are tools that basically be made safe, so they can stack them as building blocks in their own operations and then compose them a step in a workflow again. I think that's really in a nutshell the breakthrough, right? It's really how do we use them in sophisticated architectures. 

[00:35:02] SF: Yes. I think one of the big challenges there is it's still hard to get an LLM to admit when it doesn't know something. It's overconfident in many ways. It's going to come up with an answer. When we start to incorporate these with an autonomous workflow, either we have to build essentially around that, the knowledge that it could make mistakes that are not desirable. I guess that's kind of the phasing approach that you're talking about of having something where a human is in the loop to be able to check some of these things. 

[00:35:34] GK: Exactly. This is where you need that control plane or that governance plane that actually, like you said, it's building the constraints around it, so it does not get to arbitrary and wrong answers. But basically, make sure it's either a guaranteed correct answer. In the case of BI, either I give you an answer. I know it's correct. Or I tell you, "You got to help me here. I don't know exactly what you mean. Can you please re-specify?" That's, for instance, maybe the BI product we are working with an honest looker where we are specifically spending our engineering innovation on it. 

What metadata do we need? How do we specialize a model to operate on that metadata? What is the governing system around it so that exactly if a user prompts it with a question, we either give it a validated 100% correct answer, or we basically expose it back to the user and saying, "I'm not sure what you mean here, right? You need to specify that element for me. What did you mean with that part of your prompt, so we can get to the trusted state again?" 

[00:36:38] SF: Where do you think we are in terms of the enterprise adoption curve? When you work with leading technology companies like Google or even work in the Bay Area, I think it's easy to think the entire world is running on cloud, has five nines of availability, global scale, all these types of things. But in reality, this is actually a small percentage of businesses. Now, we're talking about AI adoption, which is even more out there. Based on your experience there, the products you oversee, there's a lot of interest. But how many people are actually building?

[00:37:10] GK: Is that a leading question, Sean? Look, I think what is fascinating I think everyone is experimenting with it. I think that's something different for generative AI because I do think it's so accessible. It is building a chat agent. It's so accessible. It's something that I think everyone right now plays with in use cases. Even though penetration may not be deep, I think it's very broad. It's unusually broad. 

Also, what I find is that the level of proficiency actually of people deeply looking into this at our customer side and thinking about is also unusually high. I do think we do see success in interesting pockets that you wouldn't expect initially. For instance, the highest number of use cases that are implemented that I see is actually, hey, how can we now activate unstructured data because it's something they have. It's mostly around business scenarios they already do, and it's just making them better. 

For instance, we have example from companies using call center and support data to improve their churn prediction and customer retention, right? That's just by the virtue of understanding what people mean when they write them an email or talk on the phone. Or there is you work with companies who are building new services, for instance, on stock trading, analyzing voice data, and extracting what trades have been done which wasn't possible with language models previously because it's quick language and very jargon. In that whole unstructured domain, there is an incredible amount with movement. I do think that's happening now because it's more augmenting. It's not starting new. 

I think on the other hand what you said is equally true that for many companies, there is a realization now that if we want to do this, we have to get our data in shape. Now, it depends on where you are at, right? There are some companies who have invested in data consistently, and they have a really good starting condition. Like in your leading question, right, we have customers who are sitting on a premises systems with legacy systems. For them, job number one right now is actually getting control and a hand on the data landscape. That really is then for them the limiting factor with more broader scale AI activation. 

To answer your question maybe in a summary, I think it's happening in unusual, unexpected spaces already. It's still early. I think we are vastly underestimating the impact in the long run. Right now, it's mostly assistive functionality that everyone is working with, either consuming it from handlers like Gemini or providing it to their customers like agent that helps you to pick out the right product at your favorite brand. I think the big shift is actually coming through that next gen of business applications that are certainly going to be way more autonomous than we would have ever imagined them to be. 

[00:40:19] SF: Do you think that there has to be like a new breakthrough in order to get there? We had this breakthrough, I think, with LLMs where suddenly it feels like it is transformative. It feels like you're chatting with a person. We've been - that's something that I think everybody who studied computer science sort of envisioned at one point in their life but maybe didn't think that we would ever necessarily achieve it in our lifetime. Now, we're basically there for the most part. You can have a conversation with these systems, and that's pretty magical. 

But at the same time, is there another sort of step function that has to happen in order for us to reach these fully autonomous? It's kind of like we've been talking about autonomous vehicles for a very long time. I think most people that knew anything about that probably would have predicted that they would be on the streets more predominantly than they are, than the handful of Waymo cars that you see driving around San Francisco. But it's taken a lot to get to that point. 

[00:41:17] GK: Yes. Actually, I have maybe two hearts in my breast. On the one side, I do think one of the reasons why we see it having such a big and quick impact is because, obviously, it has been such a strong intuition of humans to interact with computers in that way from, I don't know, Star Trek to Knight Rider. You pick it, right? There was always that intuition that people had that this would be the simplest form of technology if you had an intelligent entity to talk with which understands your intent, which doesn't need you to specify each and every operation, which deals with ambiguity and applies judgment and gets to an outcome. 

I think it feels very intuitive and very natural. I think that's why we see this tremendous impact, right? Because I think it's not abstract. It's not as other technology shifts has been where it was we had felt much more technical and alien to people to get to. But I think as with everything, I think the big adoption cycle is going to really come with the next generation of business applications. I think right now, basically what we're seeing is that we make tremendous advancements in foundation models, in hardware, in energy efficiency. I think that's like building the base infrastructure maybe as something like the public web, right?

Now, the public web also had a large excitement, right? But it took it took some time for the true innovation of the Internet and to materialize and for all of these new applications to be created and doing them successfully. I think that's definitely going to be a part of this journey as well, right? I think right now here in the Bay Area, as much as in India and in Asia or in Europe, companies and individuals are working on how do we take now these technologies to really build an entire new generation of services and applications? I think only once they materialize we will basically now look at this as we look at the Internet today and saying it drives the economy. It will take many innovators and many problem solvers to run it to its full potential. 

[00:43:36] SF: Yes. I mean, I think that's when you know that you're sort of at the hype cycle is when this stuff just becomes sort of the default experience of every application. Just like eventually every application sort of existed in an online form. It wasn't just siloed within your desktop and disconnected from the world. That just became sort of table stakes, and I think it's going to take some time. But that's eventually what we'll reach with AI. 

I think you made an interesting point, too, around how these technologies are sort of more human in a weird way. If you look at sort of the history of technology going back to, I don't know, running a computer program with punch cards, that is sort of as alien for a human as you can get. Then you got into terminal sort of UIs interfaces. We got to Windows and graphical user interfaces. Each time, we still had to figure out or sort of train ourselves how to use the technology. The training has gotten easier, but we still have taught ourselves essentially how to use technology, whereas this is hopefully something where it doesn't require training. You can kind of put it in the hands of anybody that can. They may not even need to type. They can just talk to it essentially. Suddenly, that really lowers the barrier to entry for access to technology and probably changes the types of things that we can build substantially. 

[00:44:50] GK: I am incredibly optimistic about it. That's what makes me so excited is that I think we have - there are so many challenges, big and small, in a company, as well as in economies and societies. I do genuinely believe that for humans - maybe it's just me, but I actually think it's humans, right? Reasoning about data and complexity, it's just so hard. It's difficult. 

When we just look at the advancements that AI has brought to material science, that it has brought to things like protein folding in the health space, climate models, I think there is just a huge opportunity right now for us to be able take the technology and really solve the biggest, hardest problems in ways that hopefully bring us prosperity and the children of my children and the generations to come through all of that merit that hopefully it generates to just understand and reason so much better about the world. 

[00:45:55] SF: Yes, absolutely. I mean, I think that's a great place to start to wrap up. But as we do, is there anything else that you would like to share? I think, well, if the great listeners to this podcast are not doing so already, GenAI I think is definitely a domain to study and to be a part of. As I said, I'm a strong believer that the data system is the key foundation to it. I actually dare to make the challenge that when I was developing, it was mostly around writing the functions, right? It was very function-oriented, and data was more an outcome of the function. 

I actually think in the next generation of systems, everything will be starting with driving insights from data, and that is going to drive functions downstream. The application data warehouse, power equation gets inverted, right? In the past, application is few and data warehouse downstream. I think in the future, the data platform will be the primary, and the functions will be secondary. I think that's going to be a huge change in the space. Lastly, you said it initially and I just want to end on that, Google is an awesome place to work. In case you haven't checked it out, definitely worthwhile exploring. 

[00:47:10] SF: Yes, yes. It's a small company that maybe I haven't heard of, but you should check it out, though, if it's new to you, absolutely. Awesome. Well, Gerrit, thanks so much for being here. I really enjoyed it. 

[00:47:21] GK: Likewise, Sean. It was great. Have a good day. 

[00:47:23] SF: Yes. You, too. Cheers.  

[END]