EPISODE 1840

[INTRODUCTION]

[0:00:00] KB: Anaconda is a software company that's well known for its solutions for managing packages, environments, and security in large-scale data workflows. The company has played a major role in making Python-based data science more accessible, efficient, and scalable. Anaconda has also invested heavily in AI tool development. Greg Jennings is the VP of Engineering in AI at Anaconda. He joins the podcast with Kevin Ball to talk about the tooling ecosystem around AI app development, the Anaconda Toolbox, the rapidly evolving role of AI in engineering, and more. 

Kevin Ball, or Kball, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup and organizes the AI In Action discussion group through Latent Space. Check out the show notes to follow Kball on Twitter or LinkedIn or visit his website, kball.llc. 

[INTERVIEW]

[0:01:12] KB: Greg, welcome to the show. 

[0:01:13] GJ: Hi, Kevin. Very nice to meet you. 

[0:01:14] KB: Yeah, excited to have you on here. Let's maybe start with a little bit about you. Can you share some of your background and how you got involved with Anaconda? 

[0:01:24] GJ: Sure. I started as a graduate out of physics and material science out of graduate school and went to work at a large consulting organization where we were building sort of complex models and simulations for mostly government organizations to help them do things like figure out complex dependencies and procurement schedules and anticipate what the capabilities that they would gain from picking platform A versus platform B and balancing costs. Though I'd written mostly simulation code in graduate school, but I started writing a lot more, I guess, user-facing code to kind of interact with end users and expose some of those capabilities to end users. 

During my time there, I briefly found Python. We were writing things in Java Swing at the time. It was a painful experience. We grew a team there, then ultimately decided along with a few other colleagues to set out and start to forge our path in a startup world, as some people do, and started to then explore Python. And so then, at that point, stepping away, I really sort of dove into Python much more deliberately and found that I could be way more productive with it. Of course, some people have found that they love typed languages for large projects But for me, I'm trying to move very fast, it was extremely helpful to be writing in Python. And so started writing a lot of things in Python. Even I found and I started doing consulting work that also leveraged my Java background, I would oftentimes write the algorithm in Python first and then port that algorithm over to Java. And I found that was actually faster to do very often. 

And the thing that I found that really made it faster to do was the ability to work in the REPL and especially working, at that time, the IPython notebooks. The ability to really iterate on things like that made it so powerful. And then from there, I kind of formed a product startup and we built the entire stack in Python. And again, starting off as sort of an individual contributor where I was sort of building everything end-to-end, starting with the web framework all the way down to writing kind of custom machine learning code, it was extremely powerful. There was really no other single language that I could have used that would have enabled me to do all of that individually. 

And as part of that, I found a package, magical package called Numba, which helped to accelerate a lot of those computationally intensive workloads that were a core part of the application that we built. And so when time came for me to move on from that startup, I had interacted with and known a number of folks and made a small pull request at one point to the Numba library and had a very positive impression of Anaconda because it had been such a transformational product in my own workflow and really solved so many problems for me. I had a very strong impression of it. 

And I came to Anaconda just about three years ago, not quite three years ago. It'll be three years at the end of April. And now I am the VP of engineering for AI at Anaconda where I lead all of our AI initiatives from the engineering side. 

[0:04:39] KB: That's awesome. Yeah, I think a lot of people have had that experience with Python where it's like there was an XKCD comic at some point about this, right? Where it's like, "I'm flying. How? Python." It just feels like you can move so quickly and productively in it. 

[0:04:52] GJ: Absolutely. 

[0:04:53] KB: Let's actually talk a little bit about Anaconda for folks who are not deeply in the Python world. What is Anaconda and how does it differ compared to, say, the Python that just comes installed on a MacBook? 

[0:05:05] GJ: Right. Well, when I started using Python, I can leverage my own personal experience and talk a little bit about this. When I started using Python, I started using it on a Windows machine. And one of the first big challenges I had was trying to figure out how to install NumPy on a Windows machine with the existing standard Python. Because one of the challenges that Python has historically had is the inability to work really well with complex binary dependencies and the ability to kind of have different sort of semi-isolated environments on the same machine. 

Anybody that started doing any sort of heavy numerical work - and as NumPy came along, which is Travis Oliphant, who was one of the original founders of Anaconda wrote, that was a core problem that they identified that people kept having. They realized that a huge opportunity existed to solve that problem for people. Anaconda was kind of created around this sort of package manager, the conda ecosystem, with the idea that they could create a sort of way to build the packages and manage all of those dependencies so that people could stand those things up super easily and get started. And in fact, that was my experience. 

And so at some point, I think back, this might have even been like pre the days when Stack Overflow - I guess Stack Overflow was around back then. But I found at some point on a forum, and somebody said, "Just use conda." Right? I did just use conda to install NumPy and was able to get everything up and running and was like, "This is magical." And so I think that initial sort of magical experience where everything just worked is kind of the core of the practitioner experience that Anaconda really tries to deliver to all of our users and all of the folks who are customers as well. 

[0:06:48] KB: Yeah. I love that description. Let's really spend a tiny bit more time on this and then move on a little bit into AI, which is, as I understand, kind of your bread and butter there. But we have Anaconda the distro and then Anaconda the company that is running it. What is the relationship? How does Anaconda the company sustain itself? What does that look like? 

[0:07:09] GJ: Anaconda the company maintains called the defaults distribution. And this is a set of curated packages that we build ourselves. We take those packages, we take the upstream build recipes, we bundle that all together, we run the build process, and we package that into that distribution which is available through the Anaconda defaults channel. And that's the primary distribution that all of our customers use. And that is where Anaconda makes its money is by providing that and providing a layer of enterprise support for that and providing a level of capability on top of that, which is mainly around observability and governance for large organizations. This is how Anaconda makes its money. 

Now, Anaconda, the larger organization is also a core part of kind of the conda foundation that we are also a big sponsor of that. We host and maintain all of the packages that, for instance, conda-forge serves, which is on the anaconda.org channel. And many people, especially practitioners outside of enterprises, also use those packages. 

[0:08:18] KB: Got it. Okay. As I understand it, there's a bunch of stuff you can get as an individual. But if you want to use this as an enterprise, you're paying Anaconda the company and getting support on all these different pieces. 

[0:08:28] GJ: Correct. Yep. 

[0:08:30] KB: Okay, cool. Let's now dive into what you are focused on. You said AI is your area, you are VP of AI applications and tooling. Let's talk a little bit about what we mean with AI, because you've been in the ML space for a long time. A lot of people hear AI right now and they think the latest in LLMs and the LLM craze. But what are the different parts of the AI ecosystem that you focus on? 

[0:08:57] GJ: Most of what we are focused on internally is around how to use, say, pre-trained AI models. And that's distinct from, as you noted, like a lot of the traditional machine learning models and sort of what my historical background is. Way back, when people built machine learning models, typically they wouldn't do anything without having your own data that you needed to bring to them. Oftentimes, it was problem-specific data. 

For instance, if I wanted to train a decision tree model, or a classical regressor model, or a classifier model, I needed to bring my own data to it. I needed to apply that and train an algorithm. And then once I had that, I could run inference, but it only really worked on this very specific narrow problem. And they were often kind of brittle. You had to do a lot of monitoring associated with it. Their utility was kind of limited to places where you had that kind of both, both problem stability. And the value of the problem that you were going to apply it to was enough that it was worth it to sort of throw these resources at it. 

The era now that we find ourselves in is quite different, which is like the GPT is generative pre-trained transformer. And the pre-trained is really the key part of that because that step involves taking a huge amount of information, tons of information pulled from all over the internet and data sources like the pile and feeding that into the model in advance. And what that creates is a situation where you have these much bigger, huge models that have a lot of information in them sort of baked in and available out of the box. And the primary place that we focus on at Anaconda is how we can use those kinds of models to help enhance people's workflows and how people are incorporating those into their experiences that they're building with our tools. 

[0:10:53] KB: That makes sense. There's a lot of different takes going on in that space. And you have kind of I think the first in this space. Most of those tools were in Python. You had like LangChain and all these other different tools for doing it. What is the tooling that your team is focused on? What parts are you building out? Is this application frameworks, or is this integrations into notebooks, or how are you thinking about it? 

[0:11:14] GJ: We're thinking about it kind of in a couple different ways, actually three major ways, which we'll kind of go through. The first is that we in Anaconda work through a lot of Python-related issues, and there's ways that we can apply AI internally to help us address some of those issues. One of the things that we do, which is complex, is building packages, building packages for our distribution. And so we've done some work to try to determine how can we use the existing AI tooling that exists and has emerged in order to help us do that job more efficiently. And so that sort of internally focused effort, especially around the core product offerings that we have is a place where we're targeting some of our efforts. 

Then on the external facing specific product efforts, there's something we call Anaconda toolbox. Anaconda toolbox exists within Jupyter Notebook, and it also exists within the high Excel ecosystem, which we might talk about. And it's basically sort of a context-aware assistant that is customized in a way to be able to help people with Python-specific problems. This is designed to have better context awareness of where the user is in their workflow and provide direct assistance to it. 

And so one of the initial places, for instance, when we started building Anaconda Assistant, we noticed was that we were routinely, if we were operating in a notebook and operating in that REPL environment, we were constantly copy-pasting information from the notebook over to the chat window, looking at what the chat window said, "Oh, I didn't provide the right context or it got a variable name wrong." The column name and my data frame is different than the one that I thought was there. You were not only copying and relaying what problem you wanted the model to solve, you were also copying and relaying a lot of the context that the model needed to do a good job of solving it. 

And this sort of process went back and forth. If you had an error, now I have to relay what the error is. I copy the error message and paste it over here. And then it might need to ask more information in order to get a good handle on how to solve it. We thought, "Well, any place there's a situation where I'm copy-pasting context back and forth between Windows on a routine basis is a really good place for me to see if I can provide more direct inline in workflow AI assistance." Anaconda Assistance, the concept, was really born out of that. 

And one of the core problems that we wanted to solve initially, we had these sort of user flows that we wanted to make easier, these kind of key pain points. One of them was around data visualization, for instance. Data visualization is sort of notorious. It's very powerful in notebooks. It's an amazing use case, but there's a bunch of different types of data visualization libraries, there's Seaborn, there's matplotlibs, Bokeh. Count them up, there's tons of them, depending on what you want to do. And they can have, depending on how complex the graph you want to make is, they can have complex APIs. 

Even people who are experts who work with those all the time often find themselves going back to the documentation and trying to figure out what are the right things that I do in order to format the graph in exactly this way. And so we thought, "I bet we can make that specific workflow a lot more efficient." And so that was one of the primary goals that we have with Anaconda Assistant. It's grown. We now have over 50,000 active users on it, and we are now expanding some of those capabilities and obviously does a lot more now than it did at the very beginning, where it's really designed to focus on just a few handful of workflows. But also, the models that we're having run inference workloads behind it are much more powerful as well. 

It's kind of interesting in a way to see those capabilities where we have an embedded AI workflow right alongside of what the user's already doing get additional value really just by kind of adding a better model behind it. And suddenly, things and kind of workflows that didn't really work before, all of a sudden, they just kind of work. And so we discover those every time we update a model what some of those things are. 

[0:15:26] KB: Yeah, I think this is a great example because this is something we're seeing showing up in a lot of places, right? In the general software development ecosystem. Cursor is doing something like this where it's like, "Okay, we can have the chat connected deeply to your code." And simply by being able to automatically load the right context in the right places, we can go a tremendous distance. 

I'm curious maybe to dive in a little deeper about how you implemented, for example, this graph approach. Now, one challenge that folks consistently run into is around, "Is this in the training data of the model or not? How are you approaching it?" When you were implementing those very first graph assistant workflows, what did that look like? Is that some additional context being loaded in? Is it a little fine-tuning on top of the foundational models? How are you approaching making that happen? 

[0:16:13] GJ: Yeah, great question. We actually, on that particular case, haven't done any actual fine-tuning. We've done a lot of prompt optimization. We watched very carefully what some of the cases were. We had a lot of internal use of the application. We have some folks who've kind of consented to have their anonymous information improve the system as well. And so we kind of look like what are the situations where it generates errors? And we track what errors to get generated and we can go and look and see what specific requests led to those cases. And then we try to figure out, well, what kind of ways can we prompt the system in order to make that not happen, right? 

And so a lot of the layer that's kind of sitting in between Anaconda and the raw sort of inference layer that we have sitting behind it, which is now in our bedrock, is in kind of adjusting and adding things to the prompt. And then the other thing, as you mentioned is, well, there's a lot of information that exists within the developer workflow. 

For Cursor or for normal workflows, it might be like, Well, they have snippets of code and they're trying to inject in snippets of code into the context window alongside the prompts so the model can reason about it more effectively. For us, it's what are all of the different variables that exist within the global scope of the notebook? If I have a data frame, what are the columns of that data frame? What are the things that the user has inside of the notebook in terms of the packages and the things that are available to them? 

A lot of these are sort of like things that we've already done, and a lot of them are things that are ongoing. We try to figure out what's the next piece of context that I can add to the system that's going to reduce the odds that somebody gets an error. And we track the number of codes, number snippets of code that we've generated that creates an error and we try to manage to that. The goal being obviously that people can do in a lot of cases now full interactive workflows for things that are really great in notebooks like interactive data exploration and other things that are great REPL type workflows directly with the sort of like Andrej Karpathy's vibe coding style where you don't necessarily have to write a line of code. 

I would say that's not quite at a fully ready state just yet in terms - you still have to know a little bit about how Python works and how notebooks work in order to get that kind of experience. And obviously, somebody who really knows a lot about what they're doing inside of those systems can use it to drive more complex behaviors. But for a lot of cases, I think using a notebook in that fashion to do things like interactive data exploration and asking questions about data sets, a notebook is actually the perfect user interface for that. It just so happens that it was kind of inaccessible, I guess, for a lot of people previously who might otherwise have been business analysts and relied on other tools. We're optimistic that going forward we'll have the ability to bring the power of Jupyter Notebooks to a lot more people because we think that AI is helping us unlock that and surface that in a much more accessible way. 

[0:19:21] KB: I love that. Well, and one of the things that's really interesting, if you're hosting the Notebooks, you're running the execution, you can do things. A common loop I've seen in Cursor is you're trying to debug something, and you'll ask it for help, and it will suggest logs that you could insert that will help with debugging. But there's still then a copy and paste loop because you have to say, "Okay, here's where the logs are going. Go in here." In your environment, you could potentially even introduce logging that isn't exposed to the end user that you intercept and just feed to the assistant for help. 

[0:19:50] GJ: Oh yeah, absolutely. And the way the system actually works, it has awareness. And the system generates an error, we intercept that and we pipe that to the assistant. And so we catch the error and we actually offer directly in line to the user, "Do you want to fix this error? I see you got an error here. Would you like us to try to fix it?" And most of the time, it will suggest a fix that works. 

And so, obviously, that's another thing we track and we want to make sure that that experience continues to get better. And if they have an error, we're able to fix it. Ideally, we identify enough of them and we're able to categorize them well enough that we start to minimize the odds they'll have an error in the first place. But those things, I think, are coming. But I think the notebook environment is just such a rich, powerful environment for those kinds of workflows, especially when you're doing kind of things interactively, you're sort of exploring. 

And my personal feel is that, actually, in many cases, it's a better test bed, a better approach for a lot of types of problems than starting off in a full IDE environment or starting off with a full code base. A lot of times, I may just simply want to explore how well a particular function does, right? A single function, run a particular type of data against it, swap out a particular function and see if that performs better or performs worse, right? 

Notebooks have a lot of the notebook magic, things like the ability to do timings of different runs. As you mentioned, sort of seeing a lot of the output directly in line and having that be attached to the specific cell. There's a lot of value there that I think is underexplored. And a lot of what we want to do with Anaconda Assistant is help to drive some of that capability into the community. Because I really think that there is a new - potentially, a possibility to sort of get a lot more people to use and understand the value of notebooks and the sort of REPL-based development approach when combining it with AI. 

[0:21:48] KB: Yeah, I think you're absolutely right. I mean, one of the things that to me stands out about notebooks is it takes this iteration loop of exploration that previously was really only accessible to programmers working in environments that expose that. Like I came up in the Ruby on Rails days. And the fact that you had the console that you could actively explore your code base in and have that interactive loop was an incredible innovation. And more and more environments provide that now, but notebooks are what gives that to data scientists, "Oh, I can do this and see my data and do those things." I'm kind of curious to explore that direction a little bit more around how do you see LLM-based assistants and agents working in the data science world and really expanding accessibility to understanding of data. 

[0:22:36] GJ: Yeah, I think you're correct that there is accessibility and understanding to data that, first of all, the way people pull data in to start to do the exploration in the first place, I think is something that AI can help to unlock. Organizations, large organizations, and not just large organizations, but even small organizations, often have tons of data sets sitting out there in different formats, different tables, maybe in different structured data sets. Maybe they're sitting in an Excel spreadsheet somewhere. And a lot of that is people wouldn't really try to access it normally because you don't really know what led to that data. What was the provenance of that data? How did it get created? 

I think that, now, having better use of AI throughout an organization can help people answer some of those questions, which maybe previously was sort of like to the tribal knowledge. If I, for instance, have a legacy code base and I see, "Well, it wrote this database in this form," it's actually reasonably straightforward now for me to put aspects of that code base into a LLM, into an LLM that has a giant context window and try to ask it to explain and even ask it to make something like a mermaid diagram that shows me what the process was that led to that data being created and then explain to me what the meaning is behind those tables. And then once it has that context, it can help me, for instance, write a query to get information out of those data sources that matter. Or it can help me to think about things like complex cross joints. Maybe I have multiple tables and multiple data sets within my organization that maybe wouldn't have been connected before. But now I think the ability for people to use language models to help to rapidly ingest a lot of information about how those data sets were created in the first place can help to facilitate that and get access to that. 

The second thing is within the notebook environment itself, pulling data in from an external database. For a software engineer, it's mostly not too difficult usually. But for someone who's a business analyst, sometimes that can be really problematic. They don't necessarily understand how to set up all the things to be able to connect to the database. They don't know necessarily how to write the SQL query. They don't know what package they should pull in to do the SQL query. They don't know what package they should use to set up a version of that database locally that is appropriate for analytic queries a local DuckDB or something. And those are things I think that we can think about as enhancements to the developer workflow that are also very likely to really improve people's day-to-day working lives using code and using Python. 

[0:25:26] KB: I love those examples because they're doing a lot of what I see as one of the most powerful things with LLMs of allowing you to translate intent into implementation, even without necessarily having to understand all the different underlying pieces in that implementation. Looking at that then, what are you working on now in notebooks or Anaconda Assistant to kind of bring that future to life? 

[0:25:52] GJ: Well, one of the things that I'll circle back on the last point you just made, which is, yes, you can absolutely do things now and not - you can sort of express the intent in more ambiguous terms, and the language model will fill in the gap as to what it thinks you meant or what probably makes sense to it. The old adage in machine learning, I think, was that all models are wrong, but some models are useful. I think that the analog in the AI space is all gen AI models hallucinate, but some hallucinations are valuable. 

And in this case, most of the time, the hallucinations make really good rational good decisions, right? But there's always a danger that when you are going through and kind of expressing your request in natural language against something that is code, that it's going to misinterpret things in such a way where it will run, but it maybe gives you the unintended result. 

One of the things that we've put directly into the notebook experience is the ability to inline, immediately explain any code snippets that the system generates. One of the things that I think Kanda has really internalized as its mission is not just arming people to do their jobs more effectively in data science and with Python, but it is also to help them understand and learn about the Python ecosystem, and how it works, and improve as developers and as practitioners. It's one of the places where we have felt like there's an opportunity to do that directly in line, and people's focus and attention is already on that spot. That's one area that I wanted to make sure that I touched on in answering your question. 

[0:27:41] KB: I want to dig in on this because I think this is a really key and important thing to understand about LLMs and also to incorporate in application design, right? A fundamental thing about how these things work is they're going to make things up because they don't have a core understanding of truth. I like to say they should not be your system of record, right? They're an interpretation environment. They're doing things like that. 

Tools like Perplexity do a really nice job of, "Oh, I'm going to show you. Here's all the sources that I'm using to create an answer for you and let you then follow through and verify with sources and do other types of kind of validation of the correctness of the LLM generated output." It sounds like, if I'm understanding you correctly, in the notebooks, you're doing something similar, where here's the generated code and it's going to explain it and link to things that will help you understand what it's actually doing. 

[0:28:32] GJ: Yeah. There's a sort of a control within each cell if you are running Anaconda Assistant that gives you the ability to explain any cell that you've written in the context of the notebook. Say it generates a SQL query, right? And it's going to generate a SQL query and you might say, "Well, I want to generate a SQL query that gives me sales by month," right? And it interprets what you said in terms of sales by month as being all aggregate sales of all products. What you really meant was sales by month of your product, because you are individually only responsible for one business line. Well, it might not be apparent to you that that's the result it's giving when it gives you the graph. If you don't look at the code and you just take at face value the code that it generates, it will probably generate working code because models are really good at generating working code. Now they've been trained and tuned specifically to do that, but the code it generates might not be the right answer. 

I think that it's a little bit like they're going to give you the answer that is probably the most high-probability set of responses or set of data that it was trained on. Whatever that was, that's sort of what's gonna guide its response. And that just may not be appropriate in all cases. We think it's very important to give people tools to help to understand what those things actually mean especially in cases where it's like a talk to data kind of an application, right? 

[0:30:04] KB: It's actually a really interesting area to explore because data has this interesting thing where the naive interpretation of a particular set of data may not always be correct. To pick an example, right? If you're looking at the outcomes of an experiment and you just look at what is the percentage difference between the experiment and the control group and you don't look at how many samples were taken to do that, you actually don't have any idea if a 2% difference is hugely significant or not significant at all. 

I'm kind of curious, if you're exposing data analysis to more and more non-data trained people, which is great, right? I think it's really valuable to broaden the access. How do you help them put that into context when they, for example, ask a question like, "Hey, is my experiment successful or not?" 

[0:30:55] GJ: This is a great question and I think it would be disingenuous for me to say that we figured that out. I think this is probably going to be one of the big challenges people have with providing additional capabilities to people with LLMs in general. I remember reading Judea Pearl's book and learning about the Simpsons Paradox, for instance, is another case that is a common misleading case that can happen in sort of social sciences and medicine where, because you've taken different size of distributions of different subpopulations, you can wind up with these really kind of interesting what they call reversals or paradoxes. You might say, "Well, medicine A is better than medicine B for both men and for women, but medicine B is better overall." And you say, "Well, how can that possibly be, right?" And I think those kinds of things are going to be really difficult to get an LLM to sort of understand that it should explain those things to people as they're doing data analysis and that should look for those things as they're doing data analysis and to not draw the wrong interpretations from data. Because drawing the wrong interpretations from data is also really, really easy to do. 

One of my favorite statistics quotes or somebody I worked with long time ago said if you torture the data long enough, it will confess. And it's true, you can sort of manipulate numbers in a lot of different ways. And it's not that necessarily people would do it intentionally. Perhaps some people do sometimes, but it's more so that it's very easy to kind of have the wrong interpretation of data if you make certain mistakes, cherry pick your end points incorrectly in a time series, or don't look at the subpopulations that you're sampling from and then make a larger sort of assessment of how those things actually roll up. 

I think these are all situations where we're going to have to, as a field, be very careful and cautious and just be aware that in the era of - I won't call this the era necessarily of AI-generated slop, but there's a lot of possibility for people to sort of do their own hobbyist data science and maybe they haven't seen these kind of issues before. And I think for these specific purposes, I mean, I won't say we're nearly there yet or nearly have gotten there yet, but one of our goals is, fact, to make sure that the AI assistance that we provide for this specific purpose, because Anaconda is one of the core organizations that helped unlock data science to the world, we want to make sure that our assistant actually does a great job of this. 

We think helping people learn about the things that they're writing and give an explanation of that as a second level of validation is part of that. Probably giving people better sense of what are all of the things that could go wrong or might go wrong with the way that they're thinking about interpreting the data is another part of that. But ultimately, yeah, I think we're a long way from sort of not having a human expert at all to be able to make these complex evaluations and draw conclusions, right? You can certainly draw conclusions, but I think we're a long way away from just turning the keys over to the LLMs. 

[0:34:13] KB: Going a little bit more in this direction of gaps. We've talked some about this goal of democratizing data and the ability to do things with that and how the assistant is sort of exposing more and more things to folks that they might not otherwise do it. What other gaps do you see that you're trying to address and fill in this space? 

[0:34:33] GJ: I think one of the other gaps that we've seen is the emergence of AI models is changing the kind of applications that people write. And more and more, we see those applications being written in such a way they incorporate an AI model into the workflow. I think maybe perhaps another part of your question is, well, how do we as a field think about organizing and architecting our applications in a resilient way to account for the fact that, all of the sudden, all of my tests that I've written for the most part, people write tests that are basically deterministic. And now, all of a sudden, I have a stochastic system that I've injected kind of into the middle of my previously deterministic workflow. I don't think that we've really reconciled with what that means yet and how tests themselves are going to have to change, how things are going to have to adapt. 

I will say, though, one of the things that we want to do to support that is, as people start to incorporate models into their local workflow, we want to provide a lot of this information and metadata around the capabilities of those models to help them at least start to have better decisions and understanding around where models are potentially useful for which use cases. 

We have internally started to curate some models and provide those alongside of our distribution. That's one of the things we're going to be rolling out in a more deliberate nature this year. And as part of that, we want to make sure that we also bring information about where they're good and where they're not. But that can help a little bit for that type of problem. But I think the whole field is going to have to understand this much more clearly, right? 

And we're sort of still in this era where AI is mostly used in chat applications, at least LLMs, generative language models, are mostly used in chat applications where people are bringing them into their workflows with some type of human controls still for anything that's generating like an external user-facing output that's gonna go out to a lot of folks. But when we start to directly bring those language models into applications, it's going to be different. And we're gonna have to think about how to change the way we're testing. And evaluations of those models and having a better idea around how to do structure generation and outputs is going to be, I think, a lot more important. 

[0:37:14] KB: Evaluation is kind of a good topic to go down because I think one of the things that is coming up all the time is, if you're trying to build applications around these, you need evals for how they're behaving. And it almost reminds me back of like everyone's having to grapple with the stuff that previously only like ML ops folks had to grapple with of like, "Okay, we're putting a new model in. How are we validating that it works for our sort of data distribution and all these different pieces?" Are you all building tooling for that as well? How do you do it internally even? 

[0:37:45] GJ: Internally, we have an evaluation framework internally that we have used on a number of the different - I mentioned that we have internally captured a lot of information about the specific use cases that people have and the way they use Anaconda Assistant. We've done some topic modeling and organized some of those questions and we've structured some evaluations around that, so that when we update the inference back end, it sits behind Anaconda Assistant, we have a good sense of where it's good, where it's bad. 

Even for a better model, a lot of times the prompts that you've chosen, and most real applications have a lot of sort of very specific ways that they do prompt injection and add context to the request, might be different, might need to be different in order to get the right level of performance out of it. I think you're correct. Any organization that's probably building any kind of application like this is going to have to have some kind of an evaluation. It's going to have to be specific to the problems they see, and they won't be able to kind of - I don't think they'll be able to understand or get it all right out of the box. Because even the best AI models can fail in unexpected ways and unexpected places. You just won't have an ability to kind of know a priori where to focus your evaluations. You have to do the manual back and forth testing, although people have done a good job now of figuring out ways to automate some of that. 

And we've done a little bit of this too, automating some of that with LLMs themselves to try to ask what might some questions be that users might ask in this particular situation. But I think that's a possibility by being able to use LLMs themselves to help to fix this. But, again, I think it's it's still going to require the domain expertise. It's going to require a lot of classical data science to help looking at the data, understanding where the challenges are, and redirecting your efforts to address them. 

[0:39:48] KB: Taking things in a slightly different direction, you said even the best models have things they're good at and things they're not good at. And I remember I read a review at some point of all of the different Microsoft tooling that they put out. One of the things that was completely panned was the Excel Assistant. Now you mentioned you have Pyxl, you're digging in there. Are you bringing your Assistant into that environment as well and how are you navigating that? 

[0:40:13] GJ: We have our Assistant inside of Excel in the form of an Anaconda Assistant integration with the Pyxl toolbox. Pyxl toolbox is something that we created. Now, I should say we have a great relationship and a great partnership with Microsoft. And there's a whole capability set around Pyxl which allows people now to run Python code within an individual cell inside of Excel. 

Pyxl toolbox is a capability that sits alongside of the Excel user experience, very similar to how the Jupyter Notebook Assistant sits alongside of the user as they're going through a Jupyter Notebook. And it can basically do things like help you understand data visualizations, how to make some Python-based data visualizations. We have the ability to share code snippets back and forth between people working in Python and people working in Excel. 

We have something internally we've developed, which you may be familiar with, called PyScript, which is a WASM-based Python. Less ability to run that in the web, natively sort of. And so that is driving a lot of the ability there. And yes, Assistant is a part of that. But we're really focusing Assistant inside of the Excel experience to be focused on that. We want to sort of have a first-class Python experience inside of Excel. We don't want to compete with Microsoft around making a generalized Excel assistant. I think that they're much better positioned to do that than we are. But at certain times, anybody who's used Excel for any length of time, starts to hit certain barriers with what it can do and what it can't do natively. And this is where Python and a lot of people who are sort of more advanced Excel users ask for a long time, "Can we bring Python code into the Excel workspace?" And so this is really where we want to provide that level of additional assistance for all of those more advanced users that are engaging with Python and engaging with those more advanced use cases inside of that system. 

[0:42:20] KB: Got it. That makes a ton of sense. And thank you for clarifying the sort of constrained scope. I am curious from the implementation side then, how similar can that assistant B to the one that's running in notebooks? Is the context layer the only thing that's different? Or how does that work? 

[0:42:37] GJ: There actually is a lot of similarity. We have a lot of the same front-end pieces we've leveraged now, say, building an Excel plugin is quite a lot different than building a Jupyter extension. Not a lot could be directly used in all cases, but some of it is overlapping. And on the back end, there's different endpoints that we have, and they both accept context in different ways, right? Whereas the context for a Jupyter Notebook might be things like, "What is the data frame? What are all of the different fields, the columns of the data frame? What's in globals? What packages are available in Excel?" 

It might be much more about like what is the user looking at? What data? What cells are on their page? And I should mention as well that in that system we also have - because of PyScript, we have the ability to let people run Python code kind of internally, like in that little sandbox inside of Excel as well. And that would also be something that we would get into the context window. But the way the system mostly works is if it interprets a data frame, think about a lot of people who are using an Excel table, that's kind of a common use case. We would take that Excel table and we would just read that directly into a data frame. And at that point, a lot of the same things that we've done to optimize the performance in answering questions or building data visualizations for how users should interact with data frames just translates. 

[0:44:08] KB: Cool. Well, we're getting close to the end of our time together. Are there any things that we haven't talked about yet that you think would be important to discuss for our audience? 

[0:44:18] GJ: Well, I think that for one, just that we recognize the entire sort of software stack is changing. There's some other things we kind of didn't talk about. Our workaround AI navigator, which is kind of a local AI access point to control plane to allow people to do work locally with some of those models that we're curating. And we're excited about where we can take that. We're excited about being able to work with developers, work with practitioners to figure out ways to better streamline a lot of those workloads. 

Some of the problems that we talked about around how people are going to be incorporating to a much larger degree models directly into their applications. Whink that package management, the whole way we think about package management is probably going to have to change, right? And it's scope is going to have to grow a little bit to accommodate that. Because now, all of a sudden, to run an application effectively, I might not just download a single conda package plus its dependencies. There might be external dependencies that I need to rely upon. 

Some of those external external dependencies maybe things that run on a remote somewhere, you know? And more and more, I think people will want the same kind of experience that they've gotten with conda, where I can just conda install stuff and it just works. They'll want that same kind of experience, but they'll want that experience translated to a world where they have more than just Python packages or even our packages we also support in there that they will also want the ability to run external models as a part of that and potentially multiple external models. 

And I think this will be much more important as we think about a future world where models and those AI-enabled applications are run as agents and I can have an agent which calls another agent. All of those things I think are just tremendously exciting areas of exploration for us and we're working towards very interesting things in all of them. 

[0:46:21] KB: Let me double click really quickly on AI Navigator. Is this essentially an Anaconda equivalent to something like Ollama? Or how would that fit into sort of the mental model? 

[0:46:31] GJ: In some ways, you could think about it that way, right? It has a lot of the same kind of capabilities as something like an Ollama. It's really designed to be used as a little bit of a control plan. I think Ollama is a fantastic tool. I use Ollama myself in many cases. It's similar in that way in that I can download a model and I can very easily stand it up as a server. We have that capability. We also have internally the ability to run agents, some specific agents on top of those models. We're sort of driving toward that environment where there's Python applications, which includes some type of a model that you might run locally plus the runtime for that model, and you can incorporate all that together. 

We think because we are in this position where we've been a trusted source of a lot of Python packages and we've helped to drive a lot of those capabilities into the open source community, we're in a unique position to provide a lot of value to users by connecting that to model workflows also. We want to pull that together. And AI Navigator is this local control plane that's designed to do that, which we're developing. 

[0:47:39] KB: Got it. Got it. So it's not just, "Here's a model I can stand up." I think access to it, but even you can package up an application that has an embedded model or is referencing a model and stand up a little endpoint that you can talk to. Now, is that connected to the package management as well, so I could have a dependency on, "Okay, AI navigator with this agent and that model and this thing?" 

[0:48:02] GJ: Yes. Yeah. Absolutely. This is what we're working towards. This is exactly what we're working towards. And we have a number of things working internally. Those capabilities aren't quite ready for full release yet. But absolutely, that's exactly what we're working towards. 

[0:48:14] KB: That is super cool. Awesome. Well, this has been super fun and interesting. I love diving into this stuff. I don't think I have any more questions. Anything else before we wrap? 

[0:48:25] GJ: I don't think so, Kevin. It was a pleasure talking to you. And I will definitely look forward to coming on again at some point and chatting with you again. 

[0:48:33] KB: Awesome.

[END]