EPISODE 1598

[EPISODE]

[0:00:00] ANNOUNCER: Machine learning model research requires running expensive, long-running experiments where even a slight miscalibration can cost millions of dollars and underutilized compute resources. Once trained, model deployment, production monitoring, and observability requirements all present unique operational challenges.

Chris Van Pelt is the Chief Information Officer of Weights & Biases, which is the industry standard and experiment monitoring and visualization, and has expanded that expertise into a comprehensive suite of ML ops tooling including model management, deployment and monitoring.

Chris joins us today to discuss the state of the machine learning ecosystem at large, as well as some of their more recent work around production LLM tracing and monitoring. This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:01:05] SF: Chris, welcome to the show.

[0:01:06] CVP: Sean, thanks for having me.

[0:01:07] SF: All right. Well, let's start off with some basics. Who are you and what do you do?

[0:01:10] CVP: Sure, my name is Chris Van Pelt. I'm a co-founder and Chief Information Security Officer at Weights & Biases. This is actually my second startup. So, Lucas, our CEO of Weights & Biases, and I, started another company, gosh, almost 15 years ago, also in the machine learning space called CrowdFlower that rebranded with Figure Eight, that help people labeled data that was often used to train machine learning models.

[0:01:35] SF: Yes, awesome. I know you've been working in the ML space for a long time. The company that you started 15 years ago, before that. I believe you also worked with power set, and now, Weights & Biases. What's been some of the kind of biggest changes that have happened during that time? That's a long time to be working in a singular space. But let alone, I think machine learning AI is something that is really exploding right now. So, I'm sure there's been a significant amount of change. 

[0:02:01] CVP: Yes. Well, one of the changes is we started calling it AI. That's a buzzier, more exciting word. But in my career, it's always been machine learning, which is really using statistics to model relationships between features.

A big thing that changed is that deep learning really started to take off, certainly more traditional ML models are still very useful, and being used in lots of contexts. But about five years ago, around the time, we started Weights & Biases, we were seeing that deep learning was becoming really big, especially in like autonomous vehicles and computer vision. Then, since then, we've seen this explosion in natural language processing and these generative vision models, which is it's kind of wild, full circle, because that power set job, my first job in Silicon Valley was all about natural language processing, and how to return better search results using machine learning methods. The methods we were using then are very different than this new wave of machine learning.

[0:03:00] SF: Yes. It's interesting there that you mentioned, the change that, we're now referring to it as AI. During my master's degree, which I was actually in the machine learning space as well, my master's thesis supervisor, he's always referred to as pattern recognition. And he would say that the name would change every decade or so whenever sort of the funding started to dry up for one particular name. And then they would go and change it to kind of reposition it and secure new funding and academics.

[0:03:24] CVP: Yes. That's totally true. But I mean, I want to say planning which had to be key, it doesn't feel like simple pattern recognition, right? It feels like something more magical for sure. So, I don't know if we should call it AI, or just really good automation. But it's definitely a big deal.

[0:03:38] SF: Absolutely. So, you're the co-founder of Weights & Biases, which was, now over six years ago, way before, I think the world has become completely obsessed with this field of AI in the last year. And my understanding of Weights & Biases is that it's an ML platform geared towards helping developers and researchers build better models faster. So, first of all, is that a fair description of the product?

[0:04:02] CVP: Yes. I think that's a great description. It's not very hard to describe a product is kind of broad and complex as ours, but that's a good high-level overview.

[0:04:12] SF: Going back to when the company was originally founded, what was the problem you were originally focused on trying to solve?

[0:04:20] CVP: Yes, so back in 2017. Lucas, my co-founder, actually, got an internship at OpenAI. He did this because we saw that deep learning was really taking off and our personal like ML chops were getting a little rusty, having been running a separate company for the last 10 years. And Lucas got his Master's Degree in Machine Learning from Stanford and was able to work on a problem at OpenAI, a specific robotics grasping problem to figure out how to move a specific robotic hand-like thing to pick up an arbitrary object.

As he was working on this problem, he was just having trouble keeping track of all of his experiments. The actual models that he was creating, he was just putting them in folders on his desktop, and then maybe writing down notes in an arbitrary document somewhere. So, we thought, well, there should be like better ways to do this. There's a bunch of rich information we can collect about the experiments automatically. So, we set forth to create a nice little Python SDK and web application to just keep track of his own experiments.

[0:05:27] SF: Then, from there, from there, how is the vision for the company and the breadth of things that you do today in the different workflows that you might serve? How has that evolved over time? You mentioned that it's a pretty wide product, there's a lot of things that you can do with it.

[0:05:44] CVP: Yes. Well, we really focused on this core experiment tracking problem first. Then, when we built a product, it wasn't clear, we thought it was cool, and thought it was useful, and it wasn't clear if we were going to be able to convince other people that it was useful. So, in the early days, it was just talking to as many people that would listen to us, and then working as closely with them, if they did engage, to figure out how we could improve the product, address any pain points that we hadn't anticipated.

Naturally, as we did this, we started to hear pain points from other parts of the machine learning workflow or pipeline. So, one of the first features we built just outside of experiment tracking was making it easier for teams to automate hyper parameter sweeps. So, when you're building these models, you kind of have some arbitrary number of settings, and you don't know which settings are going to be the best. The state of the art is to just try as many of them as you can, hopefully, intelligently, so that you can find the correct settings to make your model perform the best.

So, we launch Sweeps, which helps users do that. We knew that when you're doing machine learning, the data coming in, and the models coming out, it's really important to keep track of them, and understand what code produced them, what hyper parameters were used. So, we created artifacts, which allow you to track data lineage across a pipeline. Originally, we weren't making a super visual product. But we quickly saw users say, "Hey, we want to see all these charts and compare different metrics and different evaluations against each other." So, we've invested a lot in our data analysis functionality, so users can create rich dashboards, compare evaluation results, and share essentially reports with other team members to collaborate.

The newest functionality we've been working on is production monitoring. So, after you've done all this experimentation chosen model, now, how do we make sure that it's doing a good job out in the wild? 

[0:07:39] SF: Yes. So, you're basically serving the full lifecycle of the model, essentially, to actually, operationalize it?

[0:07:46] CVP: Yes. I mean, our mission is to build the best developers tools we can for machine learning engineers. So, we're looking for any pain points developers have in this journey, and how we can make those as painless as possible.

[0:07:58] SF: Outside of like a product like yours, how do people like historically, or even today, manage experiments and measure results if they aren't using something like Weights & Biases? Is it going back to what you mentioned with your co-founder, sort of bespoke solution of using a bunch of folders and cobbling something together that works for that particular experiment, but maybe it's not a very scalable solution?

[0:08:23] CVP: Yes. That's still something we see today, especially if you're just working on a project yourself. You can kind of come up with whatever way you want to keep track of those experiments. So, we often see very ad hoc approaches where things are just going into a folder on a desktop somewhere, or into a cloud storage bucket somewhere. And then there's like a Word document or a Google spreadsheet that kind of writes down notes about different things in that folder. That starts to fall apart really quick when you're working with multiple people. So, it's similar to like regular software development. People didn't realize how important source control was until we had these teams of 10s, or 100s of developers needing to make changes to the same underlying system. So, we thought when we started Weights & Biases, this is like the early days of software development. There aren't great tools to help developers collaborate, so that's what we set out to create.

[0:09:17] SF: Yes. So, it sounds like because there's essentially a growing investment, and also, the scale of these models now is much bigger than they were 10 years, 20 years ago. Also, we move beyond just this being something that was happening in like an academic lab to now being, productize, essentially, through whether you're in the LM space and building foundation models, or you're building essentially a company that leverages a lot of these different things. So, you need a completely different tool chain, because there's so many people involved in those products, just like you're going beyond the sole engineer working on a singular product on their desktop to you know, hundreds or thousands of people contributing to one single source of truth of code repository.

[0:09:58] CVP: Yes, exactly, man.

[0:09:58] SF: So, when it comes to working with ML workloads and ML pipelines, and the entire lifecycle of the model, is that different than traditional engineering workloads? Are the types of products that they need to support that are fundamentally different?

[0:10:14] CVP: Yes, for sure. I mean, let's take the regular software development workflow. I check out, I'm probably using git. I checkout a git repository, I kind of branch and make some changes. I push that up to some central repository, maybe GitHub, and I write some tests. Then, when I put that up, people can look at the exact lines of code that change. We can have some discussion around it, and the tests will run, and I'll know if things are, are passing or not.

With machine learning, you have code, so you're probably still going to be using GitHub and you'll be making changes to the code. But it's not the code itself that you evaluate. You're actually going to evaluate this model that is, well, it's made up of a bunch of weights and biases or parameters that are learned through the course of training it. And you're going to then take that model, and do an evaluation on it. Test it on some data you know the answer to and see how well it is or how well it's doing. And it's always going to get some wrong. It's never going to be 100% correct.

So, the big shift in mindset here is as a software developer, I write logic, and it's either right or it has a bug, that's it. There's one or two options here. With machine learning development, it's going to be like kind of right, or it's going to be right with some probability. So, you need to start designing systems that take that into account. What do you do when the model predicts something at a low probability? Or how do you inform the user about the confidence of such a model?

Then, as you're making the changes, you need to compare it against some baseline. Anytime you retrain these models and your dataset changes, or the code that interacts with that model changes, you're going to want to evaluate how good is this doing, versus some existing baseline, which is probably the model you currently have deployed. That tends to be a much more visual process. It's not just okay, that model is 96% accurate. This one is 97% accurate, let's ship it. It's usually like, all right, now let's look into the examples and see which ones changed, like which ones - this model get right, and now this new model is getting wrong. Are those actually more important or critical to the actual use case? In that case, maybe 90% accuracy isn't better and I need some new metric that takes into account these important edge cases. So, it's a very different workflow that requires very different tools.

[0:12:31] SF: You mentioned the evaluating, like how good a model is. How does using Weights & Biases help me evaluate the different iterations or versions of my model and determine whether like one version of that is outperforming the prior version?

[0:12:48] CVP: Yes. So, the Weights & Biases interface is very visual. It allows users to describe whatever metrics are important to their use case, and every single model is going to have some different set of metrics. The simplest to think about is just accuracy. Let's say we're building a model that classifies documents as either spam or not spam. So, you can measure a metric that says, "All right, I got 98% of my classifications correct."

By logging those metrics to Weights & Biases, you can then compare all of your experiments to see, all right, well, which model or which set of hyper parameters actually ended up giving us the highest accuracy? In addition, Weights & Biases has these rich graphical ways of logging data. So, you can log essentially a data frame, maybe of actual prediction results. So, you hold out some test set of important documents that you want to definitely classify as either spam or not spam, and you know what the answers to those are. You can log those as tables and then we allow you to kind of pivot and dive in and see individual examples, and maybe compare two different experiments side by side to see how the model is changing.

[0:13:52] SF: How do I diagnose where a problem exists? 

[0:13:58] CVP: Yes, I mean, by looking at the data, generally. That's the only way. Machine learning engineers often spend more time analyzing the data that the model was trained on, or analyzing the evaluation results coming out of the model to understand its failure modes. Now, especially with deep learning, I say, understand lightly like, you can understand that it's getting it wrong. T's going to be hard for you to figure out exactly why it's getting it wrong, like on the insights of the model. The best you can often do is say, "Okay, well, it keeps on getting this example wrong. Let's provide more examples of what I wanted to do in the training set, and see if we can kind of spin the model more towards the direction that we need for our business."

I mean, the most important thing is just measuring what you want to evaluate. It's not always clear, especially with these large language models, how to measure the quality of them. Nowadays, it's often conciseness or correctness. These metrics are kind of hard to define mathematically but really important to have because you need a stable way to compare things to be able to ship code that doesn't cause downtime or bugs or end-user issues.

[0:15:05] SF: Then, is a similar approach, essentially, just comes down to kind of like really diving into the data and running experiments to determine whether something is a bug, versus, an anomaly when testing the output of a model?

[0:15:19] CVP: Yes. Well, I mean, the Weights & Biases system keeps track of any of the source code that was used. So, it might be - that's the other issue with machine learning models, it's hard to know if it was something wrong with the data or something wrong with your code, or something else. The machine learning model is happily going to output the stuff, unless it's really broken, in which case, you know the model itself is broken. But if there's some mistake in the way you loaded the data, or evaluate the data, the model is still going to make predictions. They just might be really bad, which is why you need this second level of checking, which is to evaluate it on data that's never seen before, to ensure that there isn't a bug in the code, or in the data itself.

And by using a tool like Weights & Biases, you could dive in and see the difference between the code of two runs to see, oh, well could it be, or what did change, and then you also get to keep track of each version of data. So, you could see what records were added or removed that might be responsible for this regression. 

[0:16:12] SF: Then what about reproducibility? I think that's a challenge with a lot of ML experiments, especially with LLMs, because the input might not always equal the same output, even if I run the same input over and over again.

[0:16:25] CVP: Yes, and especially with settings like temperature, you're like adding randomness into the output. So, it's literally never going to say the same thing again, which is why choosing that performance metric is important. It can't just be like an exact string match, it has to take into account correctness in whatever form that might be for your use case. In terms of reproducibility, as I mentioned before, like Weights & Biases has always tried to make it as easy to be able to reproduce these experiments, because that's the only way you can go and debug, or find what's wrong with something that changed.

So, we capture the code, we capture exactly what arguments were passed to the script when you ran it, so you could like run that exact script again. It's still up to the developer to try to reduce any randomness in their code, if they really want the similar randomness last time. So, PyTorch, for instance, lets you set a random seed, so that the randomness is at least something that you can have happen the same each time. But with things like temperature, you're going to get some different answers. So, the way to counter that is to ensure your evaluation metrics are robust enough to handle that. 

[0:17:33] SF: Yes. So, it's not a matter of just like simply checking the output for - at least, most likely, not a matter of just checking whether the output is the exact match each time, unless you're essentially fixing the experiment to do that. It might be a little bit more complex in terms of the evaluation criteria. 

[0:17:47] CVP: Yes. I mean, you can make the LLM into a classifier, where you know what your classes are. So, you can just literally check to see if the classes end up in the string. And if they don't, then it's wrong. But usually, the more interesting tasks are much more nuanced.

[0:18:01] SF: Yes. Absolutely. And the types of problems that you're seeing, and you're helping people solve in the space, the challenges of the ML engineer that's working, maybe in the private industry, or the enterprise, different than those from, like the research community?

[0:18:18] CVP: Yes, for sure. I mean, in the research community, there's often more individuals that are able to do the research. So, there's less need for broader collaboration. But if you look at the more recent bigger papers, there is generally like big teams working on it. So, there's some overlap in that sense. The big differences, the class of data and the risk around that when in business.

In academia, you're using some academically licensed data that's open in some way, and isn't all that sensitive. In industry, you're often working on sensitive data, or data that is important for the company's intellectual property. So suddenly, all these additional controls need to be put in place to ensure that data isn't leaking, and that it's being used in compliance with things like GDPR, and other regulatory framework. The kind of need for governance and reduction of risk in enterprise is a big change. And the environment is just different. The ways in which you collaborate, the different stakeholders that need to get buy-in in academia versus industries is very different.

[0:19:26] SF: You mentioned privacy, security, and governance. I think that's definitely something that's very front of mind right now in the Generative AI world where only a few months ago, Italy temporarily now removed a ban, but they banned ChatGPT for some period of time. And there's been companies that have locked down access because of fear of essentially leaking core IP, or Italy, the situation wasn't compliant with GDPR or not, and of a number of different countries are now fast-forwarding to build regulations around AI. So, that's something that I think is going to be like a growing concern for anybody that's investing in these technologies in the private industry moving forward.

The other thing outside of that, that comes to mind with some of the differences as well, could be around failure rate. In a research experiment, or something that's just a demo, having some level of failure might not be as big a deal. But when you essentially take that to scale with potentially serving millions of users, then something that is essentially an element of failure or an error rate or something that, could be pretty detrimental, essentially, to the product offering. So, what are some of the things or strategies that you're seeing from companies that handle or combat that, or potentially reduce the scale of a flaw within their model?

[0:20:48] CVP: Yes. I mean, well, number one is like evaluating. You need to really understand how your model fails. So not just overall rate, but in what ways? Then, often in business, there are specific cases where it's really important not to fail. Even if overall accuracy is 99%, but the 1% of the time that it does fail is this case, where it's like, really not good for the business for it to fail. You need to find ways to work around it.

So, I mean, what we saw, oftentimes, at my previous company, was people would put machine learning models into place that weren't that accurate. But the nice thing about machine learning models is that they output their probabilities. They tell you how confident they are. So, they're going to output an answer. But they'll also say like, "I'm only 20% sure." So, to mitigate these failure modes, often in the cases where the false positive or the false negative is really critical, and you have a low confidence, then you change the user experience. Either that gets routed to an actual human being somewhere to make a choice, or there's special UI or ways in which the user is informed, like, "Hey, this is a guess. It might be wrong. You need to like really double-check here."

I mean, it's kind of like similar problems in just the regular chat interface. Ideally, the UI could stay, we might be hallucinating here, or certainly they've done a bunch for safety already to ensure that the models aren't out putting content that would be like really offensive or dangerous.

[0:22:18] SF: Yes, absolutely. I think, context is also a big factor of like, what is the problem that you're applying the ML to. If I'm using something like, I know, some sort of personal device that is going to automatically recognize the exercise that I'm doing. And once in a while when I do a sit-up, it thinks it's a push up or something like that, probably not that detrimental. The consequence of that error is not that bad. But if I'm using some machine learning to read a scan of my eye to detect whether I have some cancer or some other health issue, the consequence of false positive in that situation, or false negative is much more detrimental to the individual involved.

[0:22:56] CVP: Yes. This is why we don't have mass market self-driving cars, even though Elon told us we'd have him four years ago.

[0:23:03] SF: Yes. Exactly. The consequences of a mistake is very, very high, when it comes to things like self-driving cars.

We touched a little bit on Generative AI and LLM, do you think the tooling involved with LLM work is potentially different than the tooling needed for this sort of more traditional ML?

[0:23:21] CVP: Yes, for sure. When we started this company, we were really focusing on the researcher. The people that studied machine learning, and we're interested in modern machine learning approaches. So, the product is really designed for that user in mind, very math heavy, very advanced in that regard.

Now, with LLMs and these APIs that any software developer can use, the product really needs to start looking a little different for that type of user. The core problems are the same. we need to evaluate these models. we need to keep track of maybe what prompts or chains we're using as we build out our applications. But the end user isn't as excited about being able to write [inaudible 0:24:01] tech in a report to describe some novel loss function. They want clean APIs and a lightweight SDK that helps them get their work done. Our team has been working a lot to focus on building out more functionality that is targeted at this much more broader ML generalist that we think is only going to continue to grow.

[0:24:22] SF: Yes. So, what are some of the investments that you're making there? Essentially, there's like democratization that's going on around a malware, as you mentioned, any software engineer can hit essentially an API endpoint. If you ever use REST APIs, you can essentially take advantage of these massive models down to do all kinds of like amazing things to enhance products or create new types of products.

[0:24:45] CVP: Yes. Well, one of the big ones that we just released is our production monitoring for LLM APIs, essentially. We actually created a proxy which complies with the OpenAI spec, so you can use it in front of like OpenAI's APIs or any of the third-party open source APIs that are now serving up all of these amazing models that are open to the world, generally, using that same OpenAI schema and - I always mix OpenAI and OpenAPI. There's an OpenAPI spec for the OpenAI API. By using a proxy, what this means is that engineers can just change a couple environment variables, like the OpenAI SDK will respect a couple environment variables and say, "Now send all the traffic through this thing first, and then go to OpenAI."

What this then gives companies is the ability to have visibility into how their entire team are leveraging these APIs. Is PII leaking? Do we have a hallucination problem? How much are we spending? Which teams are the biggest users of this? So, like really important questions that I think all companies want to be asking that we unlocked with our production monitoring suite that was released just a couple of weeks ago.

The other thing is, we have a really cool integration around traces. So, just like in like Datadog, or other kind of APM application performance monitoring tooling, you can see these detailed traces of where time is being spent, and what services are being called. We allow users to keep traces of their LangChains. Whether you're using, say, the official LangChain SDK, or building out custom chain of thoughts, or agent-based programs, you can capture kind of all of those different spans, and then quickly debug and see where time is being spent, or maybe where errors are occurring in those complex chains, which has proven to be a really popular feature as well.

[0:26:30] SF: Do you think that as this area of LLMs develops, there'll be some consolidation that happens? I feel like if you look back to the beginnings of Cloudera, there was people who resisted the use of the public cloud, and they're like, "We're going to build our own cloud." And then I think, eventually, people in the last few years, but even banks are now starting to move on to like AWS and these different services, recognizing the value that's there.

But I feel like at the current time, there's a lot of people who are maybe resistant to some of the open source models out there, and they're trying to build essentially their own foundation models. But do you think the future of this is that we'll end up with a handful of like clear winners, and essentially, most things, in terms of customization will come down to fine-tuning or augmentation, or maybe prompt in engineering?

[0:27:15] CVP: Yes. I mean, it's hard to predict these things. I would say, personally, I'm really excited about the open-source ecosystem and the models that are being released there. Just going to like the Hugging Face hub and seeing what's popular or trending this week is now kind of a part of my routine. And especially when you can run these models like on the edge, or on more constrained devices, it unlocks a whole bunch of very interesting use cases that just aren't there, when you're calling out to some third-party API, or when you need to run like a really big model using multiple GPUs, that costs a lot and is tricky.

I think there's definitely going to be a place for the players like OpenAI for these really big models that are like way more advanced, but they can be used to actually evaluate the smaller models or to generate training data for the smaller models. So, they're definitely a part of the chain. But I think we're going to see more and more growth on the open-source side. I think, companies building their own foundation model from scratch, we work with a lot of those companies. But I don't get the sense that that is going to grow substantially. I think building on top of well-licensed, well-trained, more open models, it feels like where things are going, but I could very well be wrong.

[0:28:28] SF: Yes. I mean, I think that if anything is consistent with human history, when technology innovation is involved, whatever our predictions are today, they're probably massively wrong to the future. Actually, going back to that, as I mentioned earlier, you've been in the space for a long time, and you mentioned at the top of this interview that in some ways, what we're seeing from systems like ChatGPT, and what we're seeing in the alarm space is like, we almost need like a new word for it. Because it's so impressive, feels like something new and something transformative.

Based on where you started your journey in the space, do you feel like we're ahead of what you would have imagined where we'd be today from an AI/ML sense? Or is it surprising to you that, actually, where we are right now, relative to where you started?

[0:29:12] CVP: Yes, I think I'm impressed. I don't know if - I tend to not have big expectations, because that's like a recipe to be disappointed. But I think what I've seen, especially in the progress with these large language models - so, Weights & Biases has been working with OpenAI for many years, and we were early users of GPT-2, and then we played with GPT-3, and we're pretty excited about these things. But I don't think even then we understood like how big of a deal they were. It wasn't until ChatGPT was released about a year ago, that the world understood how big of a deal these things are. 

But knowing that ChatGPT 2 was like, okay, it's okay. It's interesting. ChatGPT 3 was like a little more interesting. Cool. And then there's this inflection point. I think largely due to just the beautiful interface that ChatGPT has and how you can like easily interact with it. There's no indication that that progress is going to slow. So, it seems like, there's Moore's Law continues to unfold, and we double the number of transistors in our chips, the models are just going to get better. Now, it's been a very common thing historically, especially in AI for us to say things like this. And then we hit some like unforeseen roadblock and things kind of like peter out.

But everything today is showing continued improvement. So, I think if we're impressed what we have today, in a couple years, there's high likelihood that we'll continue to be impressed.

[0:30:43] SF: Yes, you mentioned me the beautiful UI of ChatGPT, and I always say like, that's really, at least in my experience. So, it's like what brought AI to my parents who are retired in the seventies and never worked in technology. The fact that they even know what Generative AI is and it's a thing that people were talking about is, it feels like a step function for me, because it's really in the zeitgeist of essentially everybody at this point, if it's hit the retirement community of rural Canada.

But the other thing you mentioned there was that there's been numerous times throughout history in the mental space where we may be thought we're going to see an explosion, and there's like a roadblock that we hit. What is an example of a roadblock that was hit in the past that had to be overcome?

[0:31:26] CVP: Well, I mean, the original idea for deep learning or neural networks, was had in like 19 - in the 1950s by Frank Rosenblatt. And there's like an article in The New York Times with Frank saying, like, yes, this is the beginning. In the next few years, we're going to have like these automated machines that can do these various tasks and that certainly did not happen. And in fact, when my co-founder, Lucas was at Stanford learning machine learning, they were considering taking deep learning out of the curriculum, because it didn't seem as promising as the other algorithmic methods. So, it's like really easy to be wrong and not know what's actually going to work here.

[0:32:09] SF: Yes, neural network research was essentially defunded in a lot of countries for years. It kind of had spikes in the like late seventies, early eighties. Then, for a long time, people just stopped doing it probably because of basically it was like computation limitations, as well as other things. Suddenly, they focused on more pure statistical methods and some other variety of different tools to build models, build classifiers. And then, Canadian researchers kept got some of that going through the nineties and early 2000s, leading to today.

So, this is your second startup. And you've been at this for a little over half a decade now. What is like one of the biggest surprises from, as a founder, or two-time founder that you've learned along the way or something that maybe someone and people ended up using the weights and biases in a way that was completely unexpected from what you thought they might have been your vision for it was?

[0:33:03] CVP: Well, actually, we have this feature that allows people to create kind of custom visualizations, because it's very visualization heavy, and it's built on top of this technology called Vega, which ultimately kind of compiles down to D3, which is this visualization grammar. One of our engineers at Weights & Biases actually took this Vega back and made like an actual like RPG game, just writing Vega, which was kind of insane. So, you can navigate with your keyboard and jump around using this kind of obscure language that few people knew. That was really impressive to me. I think, some of the coolest use cases in the platform itself, there have been some gaming companies that are using Generative AI to do some really cool stuff, and kind of seeing and talking to those teams, and how they want to use the product.

The thing I like the most about this company is that our customers are so diverse. We have people in agriculture, in bio, in medicine, in automotive, and e-commerce and every industry, there are use cases here, and our tools can hopefully help them achieve their goals. So, it's just been awesome to work with such a diverse set of companies.

[0:34:11] SF: Yes. It's not like it's a vertical play. You're basically serving the engineering needs of ML engineers, which is an ever-expanding market and actually exploding market at this point.

[0:34:21] CVP: Yes. The timing on this company is definitely better than the timing on the last company. We thought machine learning was really cool. But the rest of the world didn't 15 years ago, and this time around, it seems like the timing couldn't have been better.

[0:34:33] SF: Awesome. Well, as we wrap up, is there anything else that you'd like to share? Any updates that people should go check out?

[0:34:39] CVP: Yes. I mean, I think the big one is for any organizations that kind of want to monitor their LLM usage. Check out our new LLM monitoring functionality, and for teams doing kind of chain of thought, or reasoning, or retrieval augmented generation with these models, definitely check out our LangChain integration and use our cool new kind of tracing dashboards.

[0:35:01] SF: Awesome. Well, Chris, thanks so much for being here. I really enjoyed this. And best of luck with everything. I agree. I think the timing is right. So, I think that things are going to be only exploding from here. Cheers.

[0:35:10] CVP: Appreciate it, Sean. Thanks for having me.

[END]