EPISODE 1883

[INTRODUCTION]

[0:00:00] Announcer: Python's popularity in data science and back-end engineering has made it the default language for building AI infrastructure. However, with the rapid growth of AI applications, developers are increasingly looking for tools that combine Python's flexibility with the rigor of production ready systems. Pydantic began as a library for type-safe data validation in Python and has become one of the languages most widely adopted projects. More recently, the Pydantic team created Pydantic AI, a type-S safe agent framework for building reliable AI systems in Python. 

Samuel Colvin is the creator of Pydantic and Pydantic AI. In this episode, he joins the podcast with Gregor Van to discuss the origins of Pydantic, the design principles behind type safety and AI applications, the evolution of Pydantic AI, the Logfire observability platform, and how open-source sustainability and engineering discipline are shaping the next generation of AI tooling. 

Gregor Vand is a security-focused technologist, having previously been a CTO across cybersecurity, cyber insurance, and general software engineering companies. He is based in Singapore and can be found via his profile at vand.hk or on LinkedIn.

[INTERVIEW]

[0:01:34] GV: Hello, and welcome to Software Engineering Daily. My guest today is Samuel Colvin. We're really excited to have you here today, Samuel. 

[0:01:42] SC: Thanks so much for having me. Yeah, I'm really excited to be here. 

[0:01:46] GV: Yeah. Samuel, just to get this completely correct, you are the founder of Pydantic. Is that correct? 

[0:01:52] SC: I am the founder of Pydantic. I have to say, since I recently moved to the Bay Area, people have started asking me for the first time, "Did you also create the library?" which seems like a slightly weird question. I feel like I'd be a fraud if I created Pydantic, the company, running Pydantic, the company, and didn't create the library. But yeah, I created the original library way back, and now run the company by the same name. 

[0:02:12] GV: Good. Yeah. Well, I'm glad we cleared that one up at the start. I'm glad I didn't ask did you only found the company? As we like to do on Software Engineering Daily, just getting a sense of where have you come from from a developer standpoint. I've seen on LinkedIn you've worked through some interesting companies, and I think it would be really interesting to understand how Pydantic came about. We're obviously here to talk about Pydantic AI, but we're going to just hear about the story to this point in time. 

[0:02:37] SC: Yeah. I was a mechanical engineer way back, and then being a software engineer since 2014. Worked in a number of different roles, ran a bootstrapped self-funded company before, but then started - I don't know. Really got into open source 2016, 2017. And around then, type hints were just coming to Python in, I guess, 3.5, 3.6. And they seem really powerful, but it seemed to me then, and seem still to me today, completely ludicrous that they don't do anything at runtime. I totally understand the history of why that's the case. It makes sense once you understand that. But imagine week one of learning to code, and you're told you're writing software, it's going to be interpreted by a computer. Everything needs to be exactly correct. And then you're told, "Oh yeah, by the way, these type hints in Python, although you might get a squiggly line, everything will initially continue to work when you pass the wrong types. You'll just get an error later on." It's completely weird. 

It came from this. Like, did it even make sense? Was it possible to enforce those types? It worked. It obviously worked spectacularly well relative to my initial experiment of was that possible. Yeah, that was like 2017. And then the library just took off. I mean, took off gradually. But relative to other open source I had done at that point, kind of took off. 

[0:03:53] GV: Yeah. I mean, it's had like - is this number correct? 300 million downloads monthly. 

[0:03:57] SC: Yeah, we're about 460 million downloads a month now. Just hoping to cross the half a billion downloads a month. Sometime, I guess, end of this year, early next year. Yeah. 

[0:04:08] GV: Yeah. It's used by, I mean, if you just look at any kind of logo on the Pydantic website, all the big, big, big people, everybody, NVIDIA, Meta, NASA. Yeah, everyone. 

[0:04:18] SC: All the companies who are writing Python and they're using it somewhere. 

[0:04:21] GV: Yeah. Is it fair to say, though, when it was introduced, it was a little bit controversial to introduce types to Python? Is that a fair statement? 

[0:04:29] SC: As I say, types obviously came to Python long after Python existed. They were there for static typing and for things like documentation. Arguably, it was almost an oversight when they were created that they were left around at runtime. And I think there were those who wish they had got rid of them at runtime and stopped people like me doing odd things with them at runtime. Because, obviously, once people like me started using them at runtime and once they were found to be really useful, it did limit what you can go and do with them in static typing time, right? 

There's a world where, as in Typescript, they are not part of the actual language at all, and you'll have much more flexibility about what you do with them because they're not only part of the AST, but they're actually there in the runtime. We can use them, but that has some constraints on what you can do with them, as I say, in static typing. But I think it's incredibly valuable, right? I mean, it's the only language where you can do this trick effectively. I don't think it's obviously not the only library that does it, but it's kind of the preeminent one, I guess, at this point. 

[0:05:27] GV: Yeah. And you mentioned constraints there. We're going to touch on constraints in a little bit. Just sort of philosophy behind that. We're going to move on from pure Pydantic in a second. V3, where's that at? 

[0:05:38] SC: The first thing to say about V3, because I know obviously the transition from V1 to V2 was quite painful for a lot of people, we fixed a lot of broken edge cases that we should have fixed before V1. But we also probably made some mistakes in V2. The V2 to V3 transition will be much, much smoother. We will mostly be changing some config defaults that we haven't been able to change because we've been very careful about breaking changes since. And we can probably telegraph most of them. 

The biggest change coming soon in Pydantic is I think we're going to call it struct. So it will be a new primitive type in Pydantic, probably used as a decorator. It should be pretty much data class compliant. But the big difference is that under the hood, the data will be held as a Rust type rather than as a Python type, particularly if you're loading the data from JSON or from a binary format. So, we should be able to get 3x-ish improvement in performance out of that, which will be significant. 

Obviously, Pydantic is already very, very fast. It's 50-ish times faster than Pydantic V1, which was already faster than some of the libraries that went before. But there are a number of interesting things you can do if the data is fundamentally in Rust. One of them is go straight to a Parquet data without ever having to go through the Python types. We also will have an array type, which kind of will go with it to allow you to basically define a table. That's probably the biggest thing. And then there are some other we're discussing whether or not we add a binary input type, which would be probably Protobuf to make it very easy to basically serialize Pydantic models over gRPC or something like that. Not quite sure about that. But yeah, there are some cool things you could do if we remove those constraints. Obviously, David Hewitt, who now does a lot of work on Pydantic, is also the maintainer of PyO3, the Rust bindings for Python. We're kind of pushing the limits of what you can do with Python and Rust in Pydantic. 

[0:07:27] GV: Awesome. Yeah, very exciting. Yeah, hopefully, for anyone who's seen Pydantic in the title of this and came to hear about that, then there you go. There's the update. So, let's move on. The kind of next, I guess, library product that came along was Logfire, I believe. And am I right in saying that also was when, I guess, Pydantic as a company got venture backing as well? Is that fair to say? 

[0:07:51] SC: Yes. End of '22, beginning of '23, Sequoia wonderfully reached out to me. It wasn't a company before that. It was just me working on it. And so I started the company as I got the seed round. So yeah, raised the seed round beginning of 2023. Going back a little bit on Pydantic, early 2022, I started working full-time on Pydantic, doing the rewrite to Rust. About eight months into that three-month project, I was halfway done. Wondering how I was ever going to finish the Sisyphean task of rewriting the whole thing while it was blowing up in its usage in the background. 

Yeah, the first thing we did when we raised money was hire a team and go and release V2, which we did in the middle of 2023. And then we started looking around for what to build. I had actually owned the Logfire domain name from 2019. I had felt that logging, as I would have called it then, or tracing in Python was broken or at least nothing like as nice as it should be. We're trying to work out what we were going to build on the commercial side, and we settled on building this observability platform, which is now Logfire. 

[0:08:50] GV: Awesome. Yeah. As you say, it's observability. Let's just take sort of five minutes on. You've kind of touched on about what is Logfire? How does it work? Why is it there? 

[0:09:00] SC: It exists because I wanted the experience of instrumenting your Python application to be as simple as writing the rest of Python. And OpenTelemetry had come out. OpenTelemetry is a wonderful open standard for doing observability. It means that there are SDKs out there for every language, basically, that you might want to use. And there are things like the Otel collector that can proxy the data, spin it out to multiple different backends. Almost every platform out there supports OpenTelemetry now. 

The problem was that they made this pragmatic decision early on in the development of the SDKs to have the same API in every language. And that makes a lot of sense in some ways, but it means you can't do all the neat things you can do in Python. And I think it's also fair to say it's sort of managed by teams within the hyperscalers and the big observability companies. There's never been anyone else who has been particularly interested in making it easy to use as a library. 

And so, the first thing we have is Logfire SDK, the pip install Logfire. Incredibly nice experience for tracing. I think it's fair to say the nicest way of doing tracing in Python. That's just emitting OpenTelemetry data. We do some clever things to make it better than normal OpenTelemetry. We allow you to serialize a Pydantic model or a data class or even a datetime, and we'll record data about that which isn't supported by default OpenTelemetry. 

And then the commercial bit is the Logfire platform on the back-end, which is a closed-source observability platform, which is what we charge for. Although, actually, we have an amazingly generous free tier. Possibly too generous. But now we've made it. I think we're not going to change that limit, where you can look at your logs or your traces. And technically, it's all tracing data, but we make it look - it feels like logs. It's instant. They'll come through as your application is running. And then you can just basically click expand to dive into what's going on within a particular task or HTTP request or whatever else. But we also do metrics and logs. We have like full observability. 

I think the other thing that has changed - there are two other things that I guess make Logfire unusual. One, we let you write full SQL to go and query your data. So it's ultimately an analytical database with a nice UI on it and an SDK and all that stuff. That's useful for developers who want to write some SQL. It's easier than having to learn a new DSL. But the really powerful bit is AIs love writing SQL. 

The single best - they call it like AI as an SRE. And there's a whole industry of companies trying to do this. Honestly, the best experience I have seen for that is connect Claude Code to Logfire via MCP and ask it, "Go fix a bug, or go and investigate the slowest endpoints, or go and find out why my users are churning." And suddenly, Claude Code has the visibility into all of your actual application data, and it can go and investigate. That's incredibly nice for us, outcome of supporting SQL. I don't think any of us knew that's where AI was going to go when we started that back in 2023, but it's definitely super powerful. 

Then the other thing that's different about us is we have first-class foot support for the AI observability stuff. Things like evals, things like token usage and pricing, but we're also general observability. Because I don't think that in 5 years' time anyone will talk about AI observability. It just won't be a thing. In the same way, no one talks about cloud observability or web observability. it's just going to be a required feature of any observability platform. 

[0:12:21] GV: Yeah, lots of interesting kind of nuggets there. I mean, as you touch on AI as SRE, and there's obviously a bunch of companies just taking that. And I think it's then very interesting where, actually, if you just Claude Code to a good library, then there you go. You've kind of got it. And I think we're just seeing that where, sort of, I'll call these sort of low-level platforms are still beating out anyone who comes along with a sort of specific product around that. Yeah, super interesting. 

We're going to move on to Pydantic AI, which is the main topic for today. Again, let's just talk about where did it come from. And I'm sure there's maybe a bunch of people listening right now thinking, "Oh, well, of course. Pydantic, of course, they're just going to do an AI thing." But I know that's not the story. Let's talk about it. Where did it come from? What is it? 

[0:13:08] SC: I mean, in some ways, we were doing the opposite. We were reasonably cynical about some of the AI stuff for a long time. And that's probably why we didn't build an agent framework in 2023 as others did. In some ways, that probably turned out to be a good decision because we waited for the patterns to settle, and we were able to build something that has probably influenced the patterns a bit, but we've also been able to read what others are doing. 

Whereas those who created agent frameworks or equivalents in 2022, '23 are kind of stuck. Either they have to go and break their API again, or they're stuck with primitives. I think we've now moved on from so - yeah, come late last year, we were starting to build AI functionality into Logfire. I knew all these agent frameworks out there used Pydantic. Obviously, LangChain, LangGraph, Crew AI, LlamaIndex, all of these guys used Pydantic. And I assumed there was going to be a good one that I could go and use. Started looking at them and was super disappointed by what I found. They're not type safe. I think type safety is incredibly important and only getting more important with AI's writing code. 

If you look at the standard of engineering among the top 100 or 200 Python packages, it's pretty high. Everything has coverage. Everything has pretty thorough unit testing. They have CI that does releases. They have typed documentation, tested documentation, stuff like this. That was none of those like low-level things that, sure, no single one is a showstopper, but they're kind of indicators of quality seem to be the case with any of the other agent frameworks or LLM libraries. 

And so we thought for a long time about whether it does this. Or thought for a bit, is it really worth us going and building another of these things? It seems like a kind of gold rush. Don't we want to make the spades? But when we realized that we decided people weren't doing it the way we would, we decided to go and build Pydantic AI. 

So we try to keep it relatively unopinionated and low-level. Try to do the things that you definitely don't want to have to reimplement again, and leave the kind of opinionated, "How am I actually going to make this thing work with an LLM up to the end user?" Because we're not the AI engineers. We're the people who are good at building libraries, and we want to let you go and innovate on how exactly you're going to use an LLM. Is that a good starting point? 

[0:15:22] GV: Yeah. I mean, I think just to make sure we're not breezing past and assuming knowledge from the audience. I mean, Pydantic AI ultimately can do things, it can call LLMs, it can create agents, do function calling, do evals. It's an agent orchestration, would you call it or not exactly?

[0:15:40] SC: Agent framework, agent orchestration. We also have a graph library. We're about to have a new version of our graph library, which is a bit less boilerplate than the current graph implementation. I mean, I think there's some debate about how valuable graphs are. They don't do anything particularly special you can't do in other code. But they definitely can be a nice way of thinking about it. So, we have that support. 

I mean, for the most part, if you're building an application with LLMs, Pydantic AI will let you get going much more quickly, but unlike some of the other agent frameworks will go on to be usable in production and will let you do the customization that you want to do. Where we probably what we don't have is necessarily all of the integrations or all of the batteries included. Here's a button to add support for whatever database or whatever RAG service. We would rather let you build that. Because in production, that's probably what you want to go and do anyway. 

And as I said earlier, I think type safety is absolutely critical. We have a fairly unusual way but type-safe way of doing dependency injection so that you can access dependencies within tool calls, which is - I mean, a lot of it's inspired by FastAPI. I worked with Sebastian a fair bit on - not so much working on FastAPI, but we talked to him a fair bit within the team. And definitely a bit inspired by FastAPI. But actually, given that there are new typing concepts like concatenate available in Python now, using them to give the most type-safe experience you can. 

[0:17:02] GV: Yeah, and I wanted just to touch on kind of before we get into more of the agentic and tooling side of things. Type safety, obviously, huge. That is Pydantic. And, I mean, that is by definition constraints. And how have you thought about just that way of approaching things when it comes to Pydantic AI? I mean, have the concept of constraints come into it in a way or - 

[0:17:23] SC: Yeah. I mean, I think one of the things I'm realizing is people sometimes blur what they mean by type safety. I would say there's data validation or type validation. That's what Pydantic does. I have some untrusted data. I have some Python types. I will guarantee to give you an instance of that object that matches those types or raise of validation error. There's that thing. And obviously, we support that within Pydantic AI. I think we have some of the most advanced support for different ways of doing structured outputs. We support tool calling for structured outputs, built-in structured output that some models support, and then what we call prompted outputs, where you basically give the model a JSON schema and say try and match this. 

But when I talk about type safety, I'm actually talking about static typing, using the types that are available in Python to do relatively complex stuff. For example, agent is generic in the output type. That means that when you access the result or output from an agent run, that will be at typing time an instance of the output type. But we also guarantee it's an instance of that at runtime with Pydantic. 

But we also go much further. Like I say, dependency injection, type-safe graphs. Which, again, the biggest downside of other graph libraries is basically, sure, you have this possibly useful mental model of a graph to describe things, but you lose all of the type safety that you would expect in other bits of your codebase. We have a way of supporting graphs that is type safe. 

[0:18:46] GV: Got it. So let's move on to actually kind of, I guess, the usage, so to speak. But I'll kind of just throw out, it's a very generic question, but I think it's maybe something that can lead to more discussion around how Pydantic AI is approaching this. If I was to say what is the correct number of tools to expose an agent to? Very, very big, right? How do we think about that? 

[0:19:08] SC: I think people talk about like 10 to 15 max. I think it's interesting that that number has not moved this year. Although the same people claim models have got way brighter. And I think what has actually happened is people's models have got cleverer, but our idea of how big an agent should be has decreased this year. This is like everyone. Remember, in February, I was AI engineer. Everyone was saying this is the year of the agent. Well, that's true. I don't think that means we're going to stop using agents. 

But one of the things that's changed is our definition of how big an agent is. There are ballpark three definitions of what an agent is. There is the AI definition, which is an LLM calling tools in a loop until some condition is met. There is the engineering definition of an agent, which is effectively a microservice. And then the joke is there's the business definition of an agent, which is something that can replace an employee. 

Ignoring the third one for a minute, if you think about the first two, LLM calling tools in the loop and a microservice. At the beginning of this year, we thought we would have our agent, which would be a microservice. And inside it, it would have one agent in the code sense. It would be given all of the tools, all of the context, and it would iterate until it magically arrived at the answer. 

I think we have, for the most part, moved away from that idea down to the idea that we have multiple different agents that you piece together to give some constraints on what your application is able to do, and is therefore make it more deterministic while still giving the LLM the kind of space to innovate. 

Concrete example, let's say we have a deep research agent. We think of that at the business level or the infrastructure level as one agent. If you look inside, what's happening is you might have a planning agent which generates a plain-text description of the plan that you're going to execute. Then you have an agent which will extract structured data from that plan. Turn those bullet points into some structured Pydantic model of like here are the steps that we're going to go and execute. Then we might use one agent for each of those substeps. And then we'll have a final agent that basically takes all of that context and outputs our final summary of what's happened. The kind of research. 

And now, if you think about that system as, sure, you can think about that as agent orchestration. Again, AI people love inventing new words for existing concepts for the most part. Agent orchestration, we have ways of modularizing code. We've had them for 40 years. They're called functions and classes. And we don't need to invent new ones, it turns out. But yeah, that thing that at the beginning of the year people would have said, "Oh, yeah. I've got deep research. It's just one agent that goes off and runs with access to these many tools until it magically arrives an answer." I think we've moved away from that. And there are lots of reasons for that. We can switch which LLM we use for each of those different tasks, even which provider. We can switch in and out which search we want to use. And we can debug it more easily. We can work out which of those things went wrong. If you just give all of your context to an LLM and hope it gets it right, it's magic when it does, and it's unsolvable when it doesn't. 

[0:22:02] GV: And maybe this is seen as one of these AI faddy terms, but I think it's probably something that developers have heard the idea of swarms or agent teams. I'm sure you think of it much more nuanced than that. I mean, in relation to what you've just been saying. 

[0:22:19] SC: I think most of these terms are pretty much bullshit. I mean, our big thing in Pydantic is AI is still just engineering. Sure, what LLM can do is borderline magical extraordinary. If you had told us this is where we would be 5 years ago, probably none of us would believe it. But how do we go and use that? We apply the engineering principles that we have learned and improved on over the last 20 years, 40 years, however long you want to think about it. 

Yeah, I mean, I'm not going to show code now, but I had a deep research implementation that I wrote the other day, where, yeah, you could think about an agent swarm. That is like I call the same agent many times in parallel to go and do research. Then I take all the results and I pass them to a more powerful model, and I get it to summarize the result. 

There are more complex workflows, but it's very rarely actually a complex graph. I mean, I think that's one of the things you notice. If you try and go through LangGraph's documentation, go and find an interesting, genuinely innovative graph example. Doesn't exist. 

[0:23:23] GV: I mean, yeah, you said you're not going to bring up code. That's good because we're an audio-only output here. I'm not going to narrate code. But let's talk about how you would think about this. You're somebody who's obviously working in this realm day in, day out. So if you were going to be building, let's just take the obvious example, customer service agent. Do you have a framework for what are you going to expose if we're talking across a bunch of tools? And how do you think about adding and removing in the sense? What's your kind of experimentation process on that as well? 

[0:23:53] SC: I'd say a few things. I say first of all, unlike in traditional applications where any experienced engineer can basically eyeball what's going to be performant and implement it first time, that is not going to be the case with AI. You're going to have to go and try a bunch of things and throw stuff out and try again. And so the two things that, in my opinion, matter there are type safety, because you want to go and refactor. It's a heck of a lot easier to refactor if you've got type safety. If you want to tell Claude, "Go rewrite this to work in a different way," it'll do a heck of a lot better job with that if you've got type safety. 

Second thing is observability, and observability from day one. Observability is not something that you shoehorn into your application the day before launch because someone told you you should. It's genuinely useful from day one, trying to work out what's going on. And then thirdly, I think eval are important. Evals are a powerful mechanism to kind of allow you to have at least a chance of systematically improving rather than kind of random walk. 

But also, just digging in and trying to understand what it is that the LLM is actually doing. It's extraordinary how what they can do, but their processes are reasonably easy to follow for a human. They're not meaningfully more intelligent than us. So we can go and read through it and understand where they've come from a decision in general. 

I mean, there's a great talk from Barry Zhang, an AI engineer, at the beginning of this year called 'Think like your agent,' or something like that, or 'Think like your model.' The idea is it can be really hard to work out what data your model has access to versus what data you have access to. And therefore, understanding what mistakes it's likely to make, which basically a lack of context. 

One of the examples I like to use with this is if you give an LLM data in the form of some bullet points, do a pretty good job of understanding the different bullet points. If you give it that same data in the form of a markdown table or a CSV table, does a much, much worse job. And yet, you and I look at the CSV file and open it up in Excel or look at it as a markdown table. Really easy to see what's happening. We can look down this column. 

But if you imagine how an LLM sees your data, which is effectively as one long line of bytes, now trying to correlate where is ,73.4, go back all the way to work out which column that relates to, it's incredibly hard. And so you try not to give access to tables in that form. If you can give it access to basically XML or JSON where it has the key each time, it'll do way better. But that's just one example. But there are many examples, where ultimately the problem is you have failed to give the agent or the LLM access to some key data it needs to solve the problem that you implicitly have but you haven't realized you have it because it's kind of so obvious to you. 

[0:26:26] GV: I guess, then, taking that, I want to maybe go slightly back to if we think about multi-agent and that question around are we talking about a number of tools in one agent or are we talking about "multi-agents"? Again, how do you think about that? I mean, it's sort of organizational design, almost. You've already touched on the idea of think of it like a human. Are we talking - from a human perspective, it' be like, "Oh. Are we talking silos? Are we talking about communication of teams who get to talk to each other about something?" Again, how do you think about that? And this is also then thinking about, well, "How do we look at shared memory, or messaging, or the concept of voting between them?" How do you look at that? 

[0:27:04] SC: I mean, I think the first thing to say is if you have an isolated task and you can take the context that's required for that task and move that into a separate agent and call that within a tool or call that in a separate step, it can be a great way of reducing the amount of context that the main agent has. Let's say you're building a research agent that has access to go and query some big SQL database. Now, the obvious thing to do is to go and smash the whole of your schema into the main agent, and it now has access to all of the information it might need. And it can go and write SQL to query that data. 

Well, fine. But now if you're combining that with some other significant tasks, you've got an awful lot of stuff in your context. If you have a tool that is called run aggregation, let's say, and it takes a natural language description of the data that it's trying to find, and then within it, it calls a separate agent, which that has access to all of the schema and the database context and examples and stuff like that. Then one, we have a system that's way easier to debug because we can go and run the SQL agent, see when it works, and see when it doesn't, and write evals on that in particular. 

But two, the main agent that's got to do that, and also a bunch of other tasks, look up some RAG database, worry about memory, access some NoSQL database, blah, blah, blah, blah, blah, it doesn't have to think about that at all. It just gets this plain text tool, describe the data that you're looking for. Bit of context on the kinds of attributes you like. And you've gone down from thousands of tokens of context for SQL down to a few hundred or tens, even, to describe that get data endpoint. 

[0:28:36] GV: Okay. We're going to move on to graph theory or graph theory meets AI, if you want to call it that. I think I just did some sort of pre-research, obviously, as I do before all interviews. I think you've talked about the concept of graph theory meets AI, if you like. I believe there's this concept. And whilst I'm not in this domain. Audience, forgive me on this one. But DAGs, directed acyclic graphs. Could you talk to us a bit about that, and sort of how this is sort of looking at the process by which an agent might move along its kind of steps? I think in theory, one is kind of quite linear, and the other is you can kind of have a cycle. And I believe Pydantic kind of prefers one approach over the other. 

[0:29:18] SC: Like I say, I think the jury is out to some extent on graphs and their use. I mean, I think most people when they talk about DAGs, they mean effectively a graph with some dependencies that relate by node. The graphs that you end up building if you're using an LLM are quite often cyclic, as in there's no reason why you can't have cycles in there. 

We have Pydantic graph, which is part of Pydantic AI. It's used under the hood by agents. I'm a bit torn on how valuable they are. Someone said to me recently that the most useful thing that graphs do is make people who want graphs happy. And that is a very clear definition of why a graph is useful. How much value are they after that? I think not that much. I think that to be rude for a moment about a competitor, I think LangChain had taken a lot of heat for LangChain and how it didn't have any functionality. They chose to go and build LangGraph because that seemed like the right thing to do. And bluntly, they didn't understand how to do durable execution. They had graphs as a way of snapshotting. And now they're stuck saying graphs are the right way of doing it because they can't go and build a third library, that's their new way of doing it. 

What's happened? I mentioned this earlier that we were able to adopt the new way of doing things. One regard in which I have sympathy for them is that since they released LangGraph, everyone else, Anthropic, OpenAI, Google, us, and a bunch of other agent frameworks have all centered on this model of agents of LLM's calling tools in the loop, which are a very powerful primitive. Now you can implement that with LangGraph, but it's a lot simpler just to go and use our agent or even OpenAI agent implementation. There's a reason that one is so similar to ours, which it's, I'll say, inspired by our agent implementation. The other thing that LangGraph lets you do is snapshot at the end of each node. And so if something fails, you can go back to that point. 

Now, that works. But if you want to have that and parallel node execution, so run multiple different nodes in parallel, you basically have to abandon type safety completely. You have to manually check that your data is consistent. Our approach is different. We support durable execution frameworks like temporal DBOS. And we have a bunch of others coming soon. And they let you get the durable execution. This idea of an agent that can run for minutes or hours and resume. Basically, pick up from where it left off if it stops or if you get errors, which I think is a much, much more powerful way of getting the longevity part of long-running agents. 

But I was giving a talk yesterday on temporal and durable execution. Obviously, you can use durable execution with graphs to run your graph within a durable execution framework. And now you get that same snapshotting effectively behavior, restarting a graph from where you left off. It's much more fine-grained. It's literally snapshotting at every async call. And the code should be much easier to write because you don't have to worry about this snapshotting that gets in your way. 

[0:32:02] GV: I mean, just, I guess, talking sort of slight layman's terms here. But if we're talking about failure modes as such, and, okay, something fails, and there's this concept of snapshotting. But is there a concept of being able to kind of move on even though some part of the chain failed? 

[0:32:19] SC: Our graph implementation, we have some basic snapshotting. I don't think we'll retire it because people are using it. But I think my approach would be durable execution is the way forward. These are solved problems. People like temporal, but they're not by any means the only one. There's a whole space of those companies have done an amazing job of giving you ways of writing what feels like normal procedural Python code. But if you get a failure, it will automatically be retried. 

If you want to go and sleep for 3 weeks before the next task needs to be run, you just sleep for 6 weeks, and it will take care of restarting the process from the right point. That stuff is, I think, really powerful. I think it is a far better solution to the same problems as graphs. 

[0:32:57] GV: Let's talk about the general DevX. How did you think about that? I mean, I guess were there any things that you've learned through especially Pydantic V1, 2, and maybe even sort of how you've been thinking about V3, as to how the DevX feels and looks for someone coming into Pydantic AI? 

[0:33:14] SC: I mean, I think that we know we're battle scarred by introducing the wrong API and having to maintain it for a long time. And so we're very careful about the surface area and not just adding in any old thing that someone wants to add. I think others who are earlier on in the arc of maintaining open source perhaps don't necessarily think that way. I think that the extraordinary powerful thing about code and about good open source libraries is people can go and use them in ways you never thought of, right? 

Pydantic is used for a myriad of things I never thought of when I started it, and lots of things to this day. There are hedge funds, for example. But other large organizations who do stuff with Pydantic that I had never thought of. And that is, if you build a really powerful tool, the point it is universal enough that people can go and do things you hadn't occurred to you. We want to build an agent framework where we build those fundamental things you don't want to have to go and repeat. 

The very simple thing an agent will do is if you're doing structured data extraction and the model gets the response wrong and you get a validation error, it will return that validation error to the model and say, "Please try again." That is a very neat, very nice thing that will very often catch intermittent or intermittent bugs. You do not want to have to go and implement that again. It may as well exist in a library where you're sharing that implementation with everyone else. 

But then, how you go and implement RAG, for example, is far more opinionated, far more context specific, far more room to go and innovate and try doing unusual things. And we don't want to get in your way of letting you do that. We're, I think, yeah, strongly of the opinion that we're trying to build the right foundations rather than give you these like high-level abstractions that like tell you what to do but constrain you in doing it well. 

[0:34:55] GV: You touched on it in reference to pure Pydantic. But when we think about case studies or things you've seen Pydantic AI being used for, I guess, could you just pull out some of those that come to top of mind right now in terms of either things that you've just been very impressed by, or I would say maybe more interesting the things that even you hadn't thought of Pydantic AI going to be used for? 

[0:35:16] SC: I mean, I'm trying to think. There's someone in our public Slack who's written a coding agent with Pydantic AI, which is a neat working implementation that works very nicely. I think, again, coming back to the Logfire and the SQL thing, it's amazing how powerful agents can be if you just hook them up to a SQL connection and let them retry a bunch when they get the SQL wrong. We've done some stuff with Pydantic AI, but I've seen others do it where you're effectively doing data analysis with a SQL tool. 

The other interesting thing is how much has come and been implemented by the community or pulled from the community. So AGUI integration. AGUI is a protocol for talking to a UI to a chat interface, basically, but you have rich components. That was implemented by someone else. But also, some of the model implementations are either implemented, or maintained, or improved by the community. Have an awful lot of people coming and contributing to it. 

I think one of the unfortunate things about maintaining open source is you often don't see the most interesting things people are doing with it because those things end up being proprietary. But I hear quite often from people, "Once I found Pydantic AI, suddenly, I had found an agent framework I could actually bear. And now, that's the only one I will use." And I hear roughly that line from experienced engineers all over the place, and that gives me - that feels great because that's where I come from, right? I'm not a Cursor-type developer. When I start developing with Cursor, I've spent many years learning it the hard way. And there are lots of other people who come from that deep engineering experience for whom Pydantic AI resonates. 

[0:36:49] GV: Yeah, I've spoken to, especially through this podcast, many open source maintainers. And I'm not one. And I always just assume that they kind of know all the projects that are using their libraries, especially sort of the top ones. And they're like, "No, I actually have no idea." Not no idea, but sort of it's very hard to keep on top of the web of things that it's being used for. State management. Do you have any of recommendations for where that should be managed if using Pydantic AI? 

[0:37:13] SC: We have some neat examples in our demo repo of managing memory with either tools. You have a record memory tool and a retrieve memory tool. That works surprisingly well. Or just recording all messages and then doing a bit of work to cut off some older messages when you - if you have very long-running conversations going on, both of those work fairly well. 

I think the fundamental - again, I come back to it. But I said it before, but I'll keep saying it. We're not trying to give you the high-level opinionated, "Here is -" I know one of the other agent frameworks has three different memory implementations for short-term memory, long-term memory, contextual memory. It's very unclear what they do under the hood. We let you do tool calls. We return you structured Python objects as messages. You go implement the thing you want. 

If it's a simple demo you want to build, the simple thing will work. If it's a production application where you've got lots of nuance, those pre-built ones aren't going to work for you anyway. I think we might move a little bit more in the direction of having support for storing messages easily in a database. Basically, an abstract base class and a few implementations for that. Because it's a common enough pattern, it makes sense. And I think we're thinking still about how we'll have embedding support soon. At the moment, there's no embedding support in Pydantic AI because it honestly hasn't come up. It's one of the most upvoted issues, but it hasn't been a burning need for the most part. 

[0:38:37] GV: Is that as in like actually creating the embeddings? 

[0:38:40] SC: Yeah, generating the embeddings API. Because then once we have that, which is relatively simple API, do we then go further and have concept of RAG and hybrid search, or vector search? Or do we just say, "Yeah. Here's an API for generating embeddings. How you go and implement the next phase is up to you." 

Yeah. I mean, the other place, I mentioned AGUI already. We also are about to have support for Vercel, AI, Elements, which is another protocol effectively for communicating with chat UIs. And so the principle is you should be able to build a chat UI with really a few lines of JavaScript or no JavaScript at all, just using a pre-built UI. And then you can go and do your innovation within the agent, however you like, in Python. 

[0:39:22] GV: And then just to kind of, I guess, tie a bow on it, observability. I guess, Logfire, that's the sort of maybe batteries included piece if you want to - 

[0:39:30] SC: But again, we work really hard to follow open standards. Pydantic AI emits standard compliant OpenTelemetry data. A couple of us are reasonably involved in the Otel sig for Gen AI. We push Otel to work the right way for Gen AI, and then we support that. At my last count, there are about 13 different observability platforms that support Pydantic AI one way or another. 

We obviously think Logfire is the best of those. But unlike some of our competitors, we're not trying to use a proprietary protocol to kind of lock you into using our one. We think that we'll win because we have the best agent framework and the best observability platform, and we can kind of guarantee they work well together. But we're not trying to stop bringing a different agent framework or bringing a different observability platform. 

[0:40:12] GV: Yeah. And I definitely see that trend with a whole bunch of platforms where ultimately they are open source and they have maybe different sort of arms, but the key, I would say, part of their success is just maintain that thing as an open standard. And if somebody wants to swap out a bit, that's completely fine. But at the same time, you've got three to four bits of the ecosystem that you still know will work well together if you just want to kind of default to something. 

[0:40:40] SC: I think it's valuable if you're an enterprise that you have one company you can come and shout at when the two don't work well together. That's the powerful bit of what we have is we control an awful lot of the stack, right? As in people on our team, although they're not directly part of Pydantic. Marcelo maintains Starlette and Uvicorn, which is basically the modern networking stack for Python. 

From Pydantic, to Starlette, to Uvicorn, to Pydantic AI, through Logfire SDK to Logfire, most of those bits are literally under the Pydantic umbrella. And even if they're not, we're pretty involved in the ecosystem. And that is, I think, where companies of all sizes, with some engineering taste, but particularly enterprises, find the value in the one solution or the one point of contact for many different solutions. 

[0:41:22] GV: Yeah, absolutely. Being on the side shouted out on that basis, but that's a good place to be. We're going to hear about something kind of exciting in a minute. I believe there's a new product in the wings. But just to kind of wrap up this. Again, I'm just the voice of the audience here in terms of LLM gateways. 

[0:41:39] SC: Yes. So, we're about to release Pydantic AI gateway. I think, by the time this goes out, it should be launched. It's in particular from enterprise. It's a feature we just hear immediate need for. It's the hair on fire problem right now. There seems to be no good go-to solution. I mean, it makes sense, right? You're a financial services company. You're expecting to spend 5 million a month on OpenAI. Are you going to give everyone in the company an OpenAI key, where technically they can spend $5 million? And then internally, something running overnight, and you've spent a big chunk of that? Obviously, you're not. And you want observability into what's going on. And if you only have one model, now what if we're doing some research and we have Anthropic, Gemini, Mistral, Groq going on? It's obviously makes sense to have a single platform to manage those things. And once you have that platform, it's a very useful place to do a number of things, whether that be caching or security or fallback. 

And so we just kept speaking to enterprises who didn't have a good solution for this, but needed it. And so, yeah, we're about to release Pydantic gateway. It will obviously have a very nice, easy integration with Pydantic AI. So you can basically set gateway, and then the provider in the model, and your one API key will then let you connect to all of those different models without having to go and put a credit card into each of them if you're getting started. Yeah, also, we'll let you use all the big models through one gateway. The initial launch, we won't have many of the shinier features, but they will come very soon afterwards, the caching, and the fallback, and the security stuff. 

[0:43:12] GV: Very exciting. 

[0:43:13] SC: Yeah. And then the actual gateway itself is open source, but the console, the platform for managing it, is closed source, and that we will sell. And part of it comes back to like it's all very well, but we can make the process of getting going with building with LLMs incredibly simple. But at the moment, there is this barrier that a lot of the LLM providers are just - their platforms are not that easy to use and understand. And we want a really nice way of allowing developers, as they get started, to go from zero to I have an app running with an agent in it. And I have observability very, very quickly. And that makes complete sense. 

And obviously, one of the neat things we can do in the gateway is we can emit OpenTelemetry from within the gateway. If you're a large organization and you desperately want - you definitely want to record all prompts, and you want to look for phishing or for prompt injection, you can do all of that stuff in the gateway with Logfire, which is why they kind of play well together. 

[0:44:03] GV: Yeah. Yeah, just to call out, we're recording this the third week of October. I believe it's sort of end of October is the release on that one? 

[0:44:11] SC: I've got a big night ahead of me because we were hoping to get to the private beta tomorrow. I think it's now going to be Monday. And then, yeah. Hopefully, 31st of October, we'll do the public announcement and let you sign up. Put a credit card in if you want to use the models we resell, or bring your own key if you want to - put your own key in and that'll be free. 

[0:44:33] GV: Yeah. Yeah, just to call out, thank you so much for coming on given that this is your evening over on the West Coast. And as you just said, you've got a big night ahead of you. I misheard you saying you've got a big night ahead of you at the pub, but you were about to say public. Yeah, thank you for making the time. 

I thought it might be nice just to kind of round out with a little bit of Hacker News feedback, actually. Because I think it's kind of fun when people that given - and/or there's no kind of gotchas here. I think it's kind of interesting. 6 months ago, someone said, "I found that Pydantic AI framework strikes a perfect balance between control and abstraction." What would you say to that? 

[0:45:07] SC: I think that's exactly the kind of how we've tried to think about it. I mean, I remember speaking to people beginning of this year who would say, "I don't need an agent framework. I don't want all those abstractions. The one thing that I want is the model agnosticism and be able to plug into any of the big models, not have to go and use the OpenAI, or Anthropic, or Google SDK, and then be stuck to them." 

And so to some extent, we tried to build it as model agnosticism without too much on top of it. We've added a little bit, but it's all opt-in. And fundamentally, the agent is pretty simple. Some pretty minimal behavior on top of the standard LLM calls, but with that nice unification. You can switch model in very quickly. 

[0:45:47] GV: One more. Yeah, I've been building an integration with Pydantic AI, and the experience has been great. Questions usually get answered within a few hours. And the team is super responsive and supportive for external contributors. 

[0:45:57] SC: Yeah. I mean, I think we take that very seriously. We care about the kind of response rate on GitHub, even on the open source, or replying on Slack. One of us will reply to your message on public Slack almost always within an hour or two in most time zones. We've been there. We're developers, right? I'm writing code most of my day still, luckily. Maybe I should be doing more sales. And we care about that stuff. And because we are ourselves open source developers, that is one of the things we care about most. 

We had a sales call earlier with someone from enterprise, and they were compelled by what we showed them in Logfire. But really, the reason they came to us was the solution they were using before had 290 something open pull requests that no one ever responded. And that was the thing that had actually driven them to be like, "What else can we use?" and came to us. I think that responsiveness and engagement with the community is a big part of what makes us different. 

[0:46:49] GV: Awesome. Yeah. And obviously, I said no gotchas. But also, why not pull out something that maybe there's someone in the audience that is saying this in their head anyway. And there's always going to be people that aren't just throwing all positive comments, but someone saying, "I really wish Pydantic invested in Pydantic instead of some AI API wrapper." 

[0:47:07] SC: Fair. And I think probably one of the reasons we haven't been that much recently is when we made new releases of Pydantic, mostly people were annoyed that we broke things because we fixed someone else's thing. And so Pydantic is a big established library. 

David Hewitt, as I said, is working an awful lot on new stuff within Pydantic. You will see big new features come out. The other thing, though, to say is go and look at the top 100 most downloaded Python packages. Look for autonomous companies in that list. You will see four companies. You will see Google, Amazon, Microsoft, and Pydantic. If you look in the top 25, you will see, I think, only Amazon, maybe Google, and us. 

Now, per head, per dollar of however you want to rep measure it, we are one of the most impactful companies in terms of how much open source we do. But we have to make money, right? We're a startup. We've been given money by some big VCs to go out there and make a profit. We're not a charity. I find there was someone who hilariously worked in the CTO's office at Google who said, "Oh, I'm not sure about Pydantic. Now they've raised money. I'm not really sure about these open source projects that are trying to make a profit." Saying the same thing about Astral and Ruff. He ended up deleting his comment because he obviously realized he was on the wrong side of history on that one. 

But there are a certain number of millionaire communists, particularly in California, who would love open source to be entirely benevolent, but they get paid an awful lot of money by big tech companies who don't particularly contribute back. So we subscribe to the pledge, open source pledge. So, we give $2,000 per year per developer to open source. That's above and beyond all the open source we do. We think that stuff really matters. 

I see very few of the other bigger companies doing that. And I suspect the person who made that comment doesn't work for a company who does that. I think we do enough for open source. I'm pretty proud of what we do. And I'm pretty robust in rebutting anyone claiming that we don't do enough. 

[0:49:05] GV: Yeah, absolutely. I wanted to bring that out because I very much stand with you on that one. Both seeing what Pydantic does as well as many other companies that get the same heat just because they switch focus a little bit to something that happens to have a commercial arm to it, and somebody gets all rubbed up the wrong way. 

[0:49:22] SC: And I'll tell you that if in the unlikely event Pydantic doesn't work out and we end up shutting the company down or being acquired by someone that people don't like, we'll get a lot more heat for it then. Well, the reason we're trying to make money is so that those things don't happen, right? Because maybe I'm not as nice as Guido. I'm not going to maintain - and obviously, Pydantic is nothing like as big as Python itself. But I am not going to go and maintain Pydantic on my own for the rest of my life, being paid - I was probably getting towards $40,000 a year when I was maintaining it on my own. That's not me. I'm not going to do that for the rest of my life. These projects, the company needs to make money to support both our open source and the wider ecosystem. 

[0:49:59] GV: Absolutely. Just kind of talking about, I guess, the company today. How many people do you have in the team now? And are you hiring? Is this a good place to plug any hiring? 

[0:50:09] SC: We will be hiring a little bit over the next few months. The blunt truth is that we I think that it's got easier and easier to apply for every single job you can think of. And the number of pretty poor applications we get now whenever we put a job ad up, we're being more and more targeted in looking for particular people. Unfortunately, we're a small team. We don't have capacity for junior people or interns in general. And so I'm afraid if you email me, being like, "I'm a big fan. I'd love to do an internship with you when I finish university." Unfortunately, I'll try and get back to you, but the answer will be no. 

We're always looking for like really bright, experienced engineers who have a proven track record in open source in Python, Rust, or TypeScript. But if you haven't got a pretty impressive record in that direction, it's probably not going to work with us. But yeah, we will be hiring. We put everything on social media, and would love you to apply if you match the conditions on the application. But please don't email me. Our first rule of anyone that we hire is they need to follow the hiring rules, which start with email careers@, not Samuel. 

[0:51:14] GV: Yep. I've been there as both a past life owner of a company where developers were applying, as well as then being a CTO and having people sort of try and get round the process. And I just politely say, "Please follow the process. You're not going to get anywhere just by emailing me." Unfortunately, all these stories from back in the day, "Oh, I guess the email address is just nonsense now, or it should be." There's a process usually for a reason. 

[0:51:36] SC: I'll just add one thing, though. I mean, a lot of the great people we've hired have done significant stuff in open source. Now, don't think that as a 3-weeks into your Python career, you can go and like start using Cursor to generate pull requests on Pydantic, and we'll go and hire you for loads of money. But we have found some incredibly talented people who probably would have been overlooked by bigger companies by finding people who have been working away and maintaining awesome Python libraries for a long time. 

And so I do think that if you can get into open source, if you can go and do the hard yards of building up a reputation in open source, both us and many other companies will hire you. But it's not a quick fix. You can't just go and buy the right crypto coin and suddenly be a millionaire, right? It takes you 5 years of learning. How long does it take to get 10 years of experience? It takes 10 years. And you can't really accelerate that. Arguably, AI makes that even harder. Because the discipline you need to learn it yourself, when an LLM will probably get something approximately right instead, is getting harder. 

[0:52:37] GV: Yeah, it's something we've discussed a few times. We have an SED News monthly. And Sean and I yet have discussed this a few times. Just sort of where's the tipping point between people coming in now as developers and not to in any way dissuade people or - but yeah, there is no point in just firing up Cursor and then thinking that you're a programmer. It doesn't work that way. And it's not, "Oh, we're the old guard saying you can't come into the club." It's not that, whatsoever. It's just that it's engineering is a fundamental - there's all these principles and concepts that just helps if you understand them before the code that's written that you're then reviewing or editing. 

[0:53:11] SC: It's a bit like whether you're flying a plane or driving a truck. Sure, you can put it into cruise control when you're going down a motorway. But when you get to that narrow lane where you need to reverse around a corner, you still need those expertise. And in the end, sure, I use core code a whole lot. And it does lots of things for me that I don't want to have to go and write all of those React components. But when it runs into some weird bug, when you need to set up exactly how you're going to share type safety between the front end and the back end, that still requires me. 

And I don't think there was a - and maybe we're about to have AGI. We're all redundant. But like until that point, however little time it is, you still fundamentally need a truck driver in that truck to reverse around the corner, even if they spend a bunch of their time in cruise control. And the same is true with code. And in some ways, it's a multiplayer, right? Like we all have the resources of a team lead, but you still need the knowledge of a team lead to be able to deploy that team, whether it is human or AI, effectively. 

[0:54:09] GV: Yeah, absolutely. Yeah, just kind of final closing question, and this is more like a personal one to you, I guess, which is more just inspiration. I guess, have you got any people, whether it's in the community right now or even living or dead, who's kind of inspired you and does inspire you, I guess? 

[0:54:26] SC: I'll have to say Guido van Rossum, the creator of Python. I've met Guido a number of times now. He can be reasonably blunt, but he's always friendly and fun to talk to. And he is so humble. I mean, I have seen him walk into rooms where no one knows who he, and just stand there and listen to what's going on. He so often describes himself as the author of PEP 482, not the BDFL, and the creator of Python. I'm so impressed by what he has built in Python, both as a community and as a language. 

He got so much stuff right long before his time in terms of realizing that it was something that programming languages are for humans to use, not for computers to use. And prioritizing the human bit. And I think the success of Python, you look at every other successful language, with the possible exception of Rust, which I'm also a big fan of, they've all had been anointed somehow. They've all had a reason why they're going to win. JavaScript had the browser, Go had Google, C# had Microsoft, etc., etc. Python had one random Dutch guy and an amazing community. And I'm ever impressed by what Python has become. 

And you look now, right? In AI, sure, some people say TypeScript might take over. But it's TypeScript and Python. It's a two-horse race. And Python's doing amazingly. And I'm proud to be a part of that community and have had a little bit of impact on it over the years. 

[0:55:40] GV: Yeah. I mean, I have to say, I am more on the TypeScript side, but that's not because I dislike Python. It's just that's just the route I took in life. And I'm really impressed by how Python has - as you say, it's the leader clearly in AI programming. And it's been fascinating to kind of watch that. 

Well, look, it's been absolutely a pleasure to have you and feel very lucky to have you on SE Daily, especially as you've called out in the episode, this is sort of towards the end of your night, and it's not finished yet by any means. So, yeah, thank you so much for coming on. And obviously, we look forward to the Pydantic AI Gateway release, which is probably going to be out by the time that this is airing. Yeah, thanks so much. 

[0:56:17] SC: No problem. Thanks so much for having me. It's been a pleasure.

[END]