EPISODE 1797

[INTRODUCTION]

[0:00:01] ANNOUNCER: LangChain is a popular open-source framework to build applications that integrate LLMs with external data sources, like APIs, databases, or custom knowledge bases. It's commonly used for chatbots, question-answering systems, and workflow automation. Its flexibility and extensibility have made it something of a standard for creating sophisticated AI-driven software. Erick Friis is a founding engineer at LangChain, and he leads their integrations and open-source efforts. Erick joins the podcast to talk about what inspired the creation of LangChain, agentic flows versus chain flows, emerging patterns of Agentic AI design, and much more.

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:00:59] SF: Eric, welcome to the show.

[0:01:01] EF: Hey, Sean.

[0:01:01] SF: Glad to have you here. I've used LangChain a bunch, so I'm excited to get into it and talk about it.

[0:01:07] EF: Yeah, me too.

[0:01:08] SF: You were there from the beginning. I just want to go back to that point from what you can recall. For you and other people that were part of that initial founding team, what inspired you to create LangChain? How did you identify this need for a framework that chained together LLMs, created the abstraction layer?

[0:01:26] EF: Yeah, definitely. Some important context there is that there's a distinction between the starting of the open-source project, which Harrison in October of 2022, and then when the company formed around it, which was the first half of 2023. The open-source projects came out right place, right time, right before ChatGPT really launched, and everyone started building with these LLMs. That was really Harrison's brainchild. He was working with GPT-3, which, I don't know if you interacted with some of the text completion models, the chat completion ones, but they were relatively challenging to work with. It was string in and then it just continued the string output. There was a lot more manual sets you had to do as a user of those from the concept of output parsing was one of the big reasons that people use LangChain in the early days, all the way through like, how do we turn this into some agentic loop using some of the current research at the time.

We've continued that effort as the company, where we're keeping up with the latest models and integrations that people want to work with. That's primarily the area that I work in, as well as making usable implementations both for prototypes as well as production applications of the latest and greatest research. Like, how do you build software with these new models?

[0:02:40] SF: Back in the early days of GPT-3, there was, I guess, APIs were not necessarily the easiest thing to use, but they have as a whole, I think the model, foundation model companies have gotten a lot better in using their APIs directly. How does that change LangChain's focus in terms of the value add that they're bringing above and beyond just having, essentially, a cleaner API into some of this stuff?

[0:03:03] EF: Totally. Yeah. Let's compare October 2022 to now, where the landscape in October 2022 was you use GPT-3. Maybe you used one of the early cloud text completion models as well. There is, I think, three integrations in the original LangChain library, which were those LLMs. At that point, you had to as a user, handle all of your message formatting, all of your output parsing into messages manually, where the simplification of what these chat models are doing is it still just a text completion model, but they're trained on very specific formats of alternating human message, AI message, human message, AI message. That allows the API providers actually guarantee that the next message is going to be an AI message in a more strict way than we can do as just observers of output of the model.

With them handling more of that, our focus becomes a lot further up the stack, if that makes sense. Our main focus right now is really online graph. How do you orchestrate these kinds of agents as state machines, where the LLMs are clearly very powerful, but they're not quite powerful enough yet to build functional software with just the simple React loop, where React loop is just given LLM access to all the tools that you want to give an access to. Go and execute those tools and then pipe that output back into the model. That's the simplest React loop. Doesn't work all that well, because as you increase the number of tools that you provide to the model, it starts calling the wrong tool. Sometimes it doesn't call it with relevant parameters to it and those kinds of things.

What we've been doubling down on is, okay, give your LLM access to, and it depends a lot on the model, but give your LLM access to five tools, or have some flow where at different steps it might have access to different ones. Let's use an email assistant as an example. You might classify an email either as recruiting inbound, or from a recruiter, or a candidate reaching out to you. In those two different instances, you might want to give it access to different tools in terms of like, okay, I want to give it access to nothing and just respond like, "Hey, write a draft of interested or not interested, depending on the background company."

Or a candidate coming in, you might want to attach that to something like your greenhouse, or applicant tracking system in order to track that they emailed you. You can actually segment that request in this nice graph flow that we visualize in my graph studio.

[0:05:35] SF: Is that really about having an opinion, I guess, as a framework for how people need to stitch together these agents? Rather than someone making the mistake of creating this almost like a monolith agent, where it's going to have access to a thousand tools, you're saying like, don't do that if you use this opinion, essentially, in the flow that will essentially force you to break it up in the modular components that make more sense.

[0:06:01] EF: Precisely. That actually is a fun part of LangChain's evolution, where the first agent abstraction we came out with was the LangChain agent executor, which really just implemented that broad React loop. To be clear, lots of people are using that and being very successful with it, as long as they're engineering the tools that it has access to in the right way, where you have to have a limited number of tools and you have to have really good prompting and descriptions for how to call it, such that the agents actually end up calling it. It obviously performs better with the latest and greatest, larger models when you do that. Some of these LangGraph flows really enable you to use smaller models as well for cost savings, or maybe you want to run on hardware that isn't as powerful.

[0:06:44] SF: Yeah. Let's stay on agents. But maybe before we go deeper on that topic, can you explain what is different about agentic versus what we've maybe seen previously of these fixed flow architectures through things like RAG and so on?

[0:07:00] EF: Yeah. The distinction in my mind is really, whether it's a feed forward application, or a cyclic application. We distinguish them as chains versus agents, or graphs, if you're building them with LangGraph. A chain always finishes. It always just goes through the steps. Maybe for the RAG case, it does a retrieval step. Looks up some documents that it wants to paste into your prompt, passes that prompt to the LLM and generates some nice description that you might get out of a perplexity, or something like that.

The agentic version of RAG, you really do that retrieval step, generate that output, and then you might even fact check that output and say, "Hey, is this factually accurate?" Or you might do some other steps that filter out that output. If you don't like it, you can actually just bounce back to the beginning and say like, "Hey, regenerate this based on this feedback that our editor node, or editor sub-agent told it to do."

[0:07:54] SF: How do you know, essentially, avoid a situation where you are running a endless loop of reflection and planning, and then this cycle never actually finishes?

[0:08:04] EF: Yeah, there's a few different strategies. By default, LangGraph has a recursion limit. You can really think of this as the same problem you end up in when you recursively call a function too many times in Python, or in any language, where it'll hit that stack limit. It's the equivalent concept for a graph, where it's really designed to hit that, and there's ways that you can handle that case such that we can gracefully exit when we hit that based on the artifacts that we've generated through all of those steps. We've also seen a lot of people implement just tracking in the state of a graph.

Some important background on LangGraph is the model is this, like all these nodes and edges that you connect to each other, but all of the nodes operate on the same schema. We call that type dictionary, just the state of the graph. You can have a state field of number of times I've fact checked the answer. It starts at zero, and you just increment it each time. It's just a different way of writing a for loop. When we hit three, we say like, okay, we're done fact checking this. Now, let's just respond to the user and say like, "Hey, I'm not completely sure if this is the right answer, but here's what we ended up with." That caps the amount of time that your agent can be spent producing an answer.

[0:09:16] SF: Then in terms of patterns of behavior, what are some of the agentic patterns that people are using and that are supported in the box with LangGraph?

[0:09:25] EF: Yeah, great question. The first one that everyone starts with, well, I shouldn't say everyone, but many people start with, is that React agent that I mentioned before. Because it's so simple, it's just two nodes. One of them is the LLM calling node, and one of them is the tool. We call it tool node, but it just executes the code associated with the tool and produces the output that can be passed back to the model. That is the quickest dopamine hit when you're getting started building agents, where you can really build something that goes and sends an email for you, or sends a Slack message to you based on some input that came in.

We very quickly see people start adding human-in-the-loop type outputs where, okay, whenever I call my send email node, I really want to review that first. Whenever, before actually executing the send step, interrupt is the concept in LangGraph and show the user the draft that I've written and they can choose to give some feedback that can be then edited and shown as a new draft, or just hit send and send it away. React is where people get started. Human-in-the-loop is one of the patterns that we see recurring and is now a first-party concept in LangGraph.

Other ones that people are building a lot with, we now have a concept of a global state store. Whenever you start a conversation with an agent, we consider that a thread, similar to how you would have alternating messages in ChatGPT. You might want to have some memory that's actually tracked between multiple interactions with your agent. That's where we get into that global state store in the check pointer. We've experimented with a lot of different versions of memory, and what we've come to is a realization that less is more, right? Just being able to set and get keys, potentially with some added features around filtering by sessions, or other threads that are associated with a certain one. Those are useful abstractions, but maybe automatically editing and trimming conversation histories and things like that, maybe not as useful.

[0:11:25] SF: Yeah. I think another simple pattern, too, is basic reflection. I mean, and you can even - if you just go to ChatGPT as a user, you can experiment with this where you say like, "Write me an email that touches on these things," and then copy that email output, paste it back in and say like, "Analyze this and improve it." You'll get a better version and do that a couple of times. Essentially, that is the idea of automated reflection agent. You're just doing it manually.

[0:11:47] EF: Totally.

[0:11:48] SF: Then in terms of creating these abstractions, given how fast everything is moving all the time, is it hard to create these types of abstractions? How do you choose the right abstraction when things are - I mean, you've been doing this since 2023, but now we're nearing 2025, but in the life of Gen AI, two years is really 40 years, or something like that. Things are moving so quickly. How can you choose the right abstraction and not get into a place where you end up with having to make a ton of breaking changes as you learn new things and new things come out?

[0:12:21] EF: Yeah, great question. This is a constant struggle for us, as I'm sure you, as a LangChain user have experienced, as well as lots of the users listening to this podcast, we have gone through a lot of iterations, where the 2022 version of LangChain was really about these all-encompassing opaque chains, where you would create an - the simplest one was an LLM chain class, which actually did a lot of magic under the hood, and it was really difficult to debug. Then, in 2023, we really focused on the LangChain expression language, where you would compose these chains as distinct steps, but a lot of the steps were still a little bit opaque. You had to know that the JSON output parser would take a string in and output some dictionary and those kinds of things. Then this year, we've really gone towards LangGraph. 

Each of those, even though we still support all the old things, from a user's perspective, every single time that changes, it can feel jarring, right? Because the front and center quick start is now something that I didn't learn when I first learned LangChain. I think we've done a pretty good job of announcing those and then still supporting the old models, because, obviously, we have a lot of users operating on the LangChain expression language in particular. I think the philosophy has really just become more and more bare bones, where everyone who comes to LANG chain is either a Python developer, or a JavaScript developer, as long as we're talking about the two packages that we maintain. There are some community-driven efforts and go Kotlin. There's a few other ones.

For the two main ones, everyone knows those two languages. The more just raw Python that we can let people write, the better, because it's things that they already understand, and it's no magic included there. With LangGraph, everything is really just a Python function. The main abstraction is the same as network X, if you use that before, where you're saying like, create my graph, graph.add node, graph.add node, and then connect these two nodes to each other with an edge, or connect these two nodes with a conditional edge, which does those kinds of things.

I'm sure there's lots of bells and whistles on the side that you can use for interrupts, if you throw a particular error, or these check pointer features where you're storing state, or memory. But in order to get started, seeing some LangGraph code makes a lot more sense than seeing some LangChain expression language code, if you've never seen it before. I don't know if you've used it, but it's a lot of pipe operators. It looks a lot more bash than Python. That has been really the philosophy. That's been the big change in my mind.

[0:14:57] SF: Okay. Is it hard as well to, and when you're creating these abstractions, how do you think about how different models are going to have different limitations on them? Depending on if I switch the model suddenly from GPT-4 to a different version of GPT, or Claude, or whatever, then the size of the context window could be impacted by that. Maybe other types of features could be affected by that.

[0:15:19] EF: Definitely. As the industry has evolved, actually, the constraints have changed a lot, I would say, where when I first joined LANG chain, the main difference between models was the context window, right? You'd use, I'm going to forget the actual numbers. I think GPT-35 turbo had I think a 4,000 token context window to start, and maybe it came up to 16,000 later. Because then a lot of the smaller models were 1,024 tokens, and so you could barely fit in the messages that you wanted to send to them. They terminated with these really jarring errors where it's like, okay, you exceeded the token window, sometimes in the middle of the output it was generating, then you'd just get these partial things that weren't that useful.

Then nowadays, the main distinction, I would call it, is probably tool calling, where tool calling is easily the most important feature that LangChain and LangGraph users are using out of the models, where it's really useful for providing some tool calling and structuring output, where you provide a schema, and the LLM generates all the fields that you ask for, which is a really nice interface point between code and these LLMs. Different models perform very differently.

Even with the same model, you can chat a little bit about this in a second, but the open-source Llama line of models for Meta, the tool calling performance is actually markedly different with different providers, depending on how they've implemented parsing of those tool calls, which is another fascinating thing that we're working with right now, where it's difficult to call that out on some of the provider pages in our documentation. To answer your original question about how do we manage a lot of that, the first step is really just documentation, right? We add notes to the provider pages where it's like, hey, this model has a check mark for tool calling, or an X for tool calling, or maybe even a warning sign in some cases where it's like, hey, this model says it has tool calling, but it never actually calls tools.

The next step is really building abstractions that make sense around those. Now in the library, we have these bind tools and with structured output methods that if you just call them, they either work, right? They give you some structured output, or they throw a not implemented error of like, hey, this provider doesn't offer a tool calling, and so you can't bind tools to it.

[0:17:40] SF: Mm-hmm. In some ways, and you're probably too young to remember this, but it feels like the early days of the web when it was a non-standardization around HTML and JavaScript and stuff like that, and you'd have to have these control loops, essentially, or control statements around like, okay, if it's this specific browser of this specific version, essentially, this is the way that it needs to behave, or this is the call that I can make. It's just probably a byproduct of early days, things are moving quickly, and everybody's trying to push things out to production, and there's going to be, essentially, a non-standardization across all these things.

Then when you bring in the open-source models and they're going to be served by different people, there's going to be potentially different interpretations of how to respond to something like tools, for example, and people are going to have their own takes on those.

[0:18:24] EF: Totally. Actually, if any of the listeners want to dig into the lore here, if you look at some of the source code for the original anthropic integration, and actually, still, AWS Bedrock has one version of their integration that's like this, where everything is still a text completion. A lot of those message parsing logic actually happens in the LangChain integration, which is a crazy world that we lived in once upon a time, where there was an if statement of in Bedrock in particular, right? It's like, if it's an anthropic model, parse it in this way. If it's a cohere model, parse the output in this way, because the message tokens that are actually outputted are different, which is, yeah, obviously, a speed-to-market type thing that you see across all these different providers.

[0:19:08] SF: In terms of implementing agents with LangGraph, can you walk through what is that process? I want to get started with LangGraph. I want to build a basic agent. What do I need to do?

[0:19:19] EF: Yeah. I'll actually start from the very beginning, where we actually have a lot of different media formats to get people started, because we realized that some people like following video tutorials, some people like following written documentation where you can copy-base code, and some people really like starting from a complete template that they just design themselves. For those three, we have the LangChain Academy, which is ademy.langchain.com, which is a video format for this. We have the LangGraph documentation, just Google that and there's a quick start. Or we have LangGraph Studio, which has five templates to get you started, which is actually used in academy, if you end up doing that.

Definitely, if you want to dive in further, we'd recommend going to one of the sources for my co-workers who have made much better content than I can describe right now. For the brief answer right now, it's all just Python. You either open up a Jupyter Notebook, or you open up a text editor, and you create a bunch of Python functions representing the different steps you want your graph to take. Funnily enough, the graph interface for this tends to be more intuitive for people who've interacted with no code type editors, which the whole orchestration of it actually reminds me a lot of LabVIEW, or some of those kinds of old, connect a bunch of edges between different nodes for robotics type things. You define those operations, you connect them up, and then you really just run it and see what the output is.

One important step I left out is defining that schema. By default, we recommend just storing a message history. The most simple agent actually doesn't even have any tools, where it's just accumulating, like I send a message, and it appends it to the message's state, and then the LLM sends a message, and it appends that to the message's state. You just go back and forth in an interaction. Then you could also store something of like, number of turns of conversation and just increment that at each node. You could store which tools the LLM should have access to, and then modify that over time. You can really store any of that in there. With the caveat that if you store anything that's not serializable, you won't be able to use some of the checkpoint, or hosted features, if that makes sense.

[0:21:30] SF: In terms of the workflow and orchestration, that's all happening within my environment where I'm essentially hosting my code. 

[0:21:37] EF: Totally. Yeah, so actually, yeah, important distinction, LangGraph is mostly an open-source project, but then we also have LangGraph Platform, which is our hosting. If you've used Next.js and Versailles, a similar model, where LangGraph is all the orchestration, it knows how to execute everything. We have some open-source versions of checkpointers that allow you to serialize that state and fast forward and rewind through some of your execution using, essentially, database features for that. Then LangGraph Platform is really about hosting everything as a REST API and also visualizing it.

We actually have some features in LangSmith, which is our debugging and observability commercial product that lets you visualize your graph, interact with the state manually through these kinds of interrupts and things like that. Then overall, just makes it easier to build some of these things over time. Both of them have a generous free tier, but have to call out that they are not part of open-source offering.

[0:22:36] SF: For the nodes in the graph, is memory shared across nodes?

[0:22:42] EF: The state memory is shared across all the nodes, and it's identical across all the nodes. That's the whole reason for the abstraction. I think if that weren't the case, it would probably make sense to just write everything as raw Python. To be clear, lots of developers still do that. Then the check pointer, global state, that is stored across all threads as well. You might send a single message, which ends up kicking off a sequence of six nodes just from that one message before it returns something that's meant to be shown to the user, which is typically a message on the message history. But the execution of all those nodes does not affect your conversation with it, if I was the one to start that thread. Whereas, that global state is accessible for both. There's a few layers of grouping of data.

[0:23:35] SF: In terms of scaling these, I guess, essentially, on the developer who is going to be hosting this to take on the scale challenges that they might run into with running one of these agent flows, right?

[0:23:48] EF: Scale in which regard?

[0:23:50] SF: Well, in terms of if I build something in LangGraph and some agentic workflow, and I put it up on a server somewhere, and someone starts hitting it, essentially, one, it's going to be hitting my hosting infrastructure, but the orchestration and workflow, if I'm using the open-source version, that's also hosted within my environment. I'm assuming it's on me to essentially meet whatever the scale requirements and also, architect it for scale by potentially breaking this up as needed.

[0:24:14] EF: I'll answer this in two ways. LangGraph itself is the one that's helping execute those nodes just for a single request. If you have some architecture, if you, for example, use all the synchronous APIs, instead of async APIs, that's going to hog a thread for much longer than if you use something like Python async for that. There's decisions in terms of how you implement your graph that developers will always have to take responsibility for, because that's their graph code. Then once I have that packaged into, essentially, a fast API endpoint, or something like that, that's when you have a decision of whether you want to, or actually right before you have a fast API endpoint, you have a decision of whether you want to build that fast API endpoint yourself and host that just in a Docker container, or on EC2, or some hosting service, or you can go with LangGraph platform. We have both a cloud version of it, which is packaged with LangSmith, or self-host that LangGraph platform container, where we have a free tier. Then after that, we enforce an enterprise license for that.

I think that, getting to your point, there are challenges associated with building infrastructure that hosts these at scale. We are getting lots and lots of queries per second. that's really where LangGraph platform comes into play. We take that on for you.

[0:25:38] SF: What's happening on LangGraph platform in terms of what you can share in the backend infrastructure to handle that scale?

[0:25:45] EF: Yeah. A lot of it is segmenting the storage from the compute. As mentioned before, we have this concept of nodes and edges, so that's the execution. Then we have this concept of a check pointer, which is the storage state, allowing one worker to execute the nodes until it hits the first interrupt, or it ends for whatever reason. Then another compute node can actually pick that up and work on it later if a different request comes in. There's some load balancing challenges associated with that. There are even challenges on just implementing a check pointer in a way that handles infrastructure failure as well. Like, database connection goes down, those kinds of things. All of those are part of the hosted offering of those. Right now, we take all of those in PostgreSQL. The check pointer for hosted is there. The local one is all based on SQLite, which is, which works well, but is not as scalable.

[0:26:44] SF: What about in terms of tool integration? You're going to have a certain potential fragility with calling out to some third-party tools. I can write the function that's going to maybe it pulls data from my CRM, or something like that. Then, that call could fail. Is it really, I guess, on the developer to follow the best practices around distributed systems and retries and things like that? Or some of that offloaded by LangChain?

[0:27:12] EF: Great question. Node retries. We have some features that make it easy to use those. For the most part, we're seeing people implement their retries themselves with something like tenacity, or one of these retry libraries, just because you actually want different retry behavior in different situations, where ascend the email function, you probably don't want to retry that indefinitely, because a empty response to me. It could have still sent the email. I'm just not sure. You need some custom logic to check if the email was sent or not. Something like, retrieving results from Google, or something like that, that can be retried pretty indefinitely, because it's just going to - You might hit rate limits, or things like that, but you're not affecting any external state. Long-term potentially, but no plans for that currently.

We do work with a few providers that do make working with tools easier. One of the companies we started working with recently is this company ArcadeAI that does a lot of stuff around the auth for tools, where a few multiple users handling all the different permissions to their different services can sometimes be a challenge, and they handle that.

[0:28:21] SF: I'm just curious with what your thoughts on in terms of optimization around inference, both from a cost perspective and also performance perspective, because it's great that I can build these agentic workflows that can do really complicated, amazing things, but every time I'm relying on an API call to a model to perform some inference cycle, there's not only a financial cost associated with that, but there's also a performance cost with that.

[0:28:46] EF: Yeah, great point. We have a few fun anecdotes from a few of our customers on this, where most seem to be following, at least on the cost side, the philosophy is mostly that we're betting that the cost of these things come down.

[0:29:02] SF: Yeah, economies of scale.

[0:29:03] EF: Exactly. Open AI, inference cost has gone down by 50X or something in the last - It's more than an order of magnitude in the last year. That'll probably continue to happen, at least in the short term, as we get smarter about how we execute these models. Then speed also tends to come with that. A lot of model performance side is just making the models smaller, either through decreasing precision or doing different kinds of sparsity strategies. That gets you the benefit of both, while not really sacrificing performance. There is still a benefit to using, especially on the open-source side, right? You're going to get better speed and costs characteristics out of a Llama 70 model than a Llama 70B model. That's the main area that we see people pulling that lever in the short term, where you might have a classification step that's run on a 7B model, just because it's a lot faster, a lot cheaper.

Then, typically, when you're generating output for users and things like that, you tend to back up on the larger models. For important tool calls that decide the control flow of the application, especially for more complicated ones, you tend to use a larger model. We have this new iron triangle where you're talking about, I'd actually group costume, you see together in one corner, you have accuracy characteristics in another, and then you have reliability, which is always, always concerning one.

[0:30:36] SF: For those that are building on LangGraph, or even the older version of LangChain, what are some of the typical challenges that they run into that they have to navigate? What are some of the things that they should know about essentially?

[0:30:51] EF: Yeah, great question. Well, first of all, we'd love to hear from folks. We now have comments on all of our docs pages. We monitor our GitHub, obviously, through pull requests and issues. This week has largely been working through the backlog on pull requests there. In terms of challenges that users see getting started, I think the main one is information. This is a problem across the industry. This is actually, I think, part of the reason that we've been relatively successful in this space is we document a lot of things in our documentation on these new strategies in a way that you can immediately implement.

With something new coming out every day, it is often challenging to drink from that fire hose and information. This is also something that we struggle with internally where it's like, okay, what is the quick start to LangGraph, right? I think right now, we have three or four different ones, depending if you want to target more of a hosted production application on LangGraph platform, or if you're just playing and want to build some chatbot. I think the main one that I would like to work on in the beginning of next year is how do I decide which happy path I want to take as a new user? It's challenging. It's something that we rework every three months, just because the state of the art is always different.

[0:32:11] SF: What are some of the most surprising, or creative applications that you've seen built on top of LangChain?

[0:32:18] EF: Ooh, good question. I think, okay, surprising and creative. One of the ones that surprised me earlier this year is some engineers from Uber actually gave a presentation on, this is GitHub Universe, gave a presentation on writing unit tests with a code assistant built on top of LangGraph. Code assistants are obviously super popular right now. There is very distinct productivity ads from them, and so a lot of different companies are working on them. It was really cool to see the step-by-step process of how you actually lay out nodes in LangGraph to do this. The talk is on YouTube. I highly recommend giving it a listen. That one was a real-life code assistant that was being used by some portion of their engineering org.

I really like Elastic's security assistant. That's been a fun one. They actually, they've been working with us for a long time. They built the first iteration of that on the agent executor, which was that React loop that I mentioned before with some extensions to it. Then they've recently migrated that to LangGraph as well. That's really about generating security rules for, I forget what the term is, but the automated quarantining of software and monitoring logs for writing rules for that has been a cool one. Then all the customer support ones. Just the variance in what steps different folks want to build into their customer support systems is quite surprising to me. Where, I don't know, I think it's a domain that I haven't done as much work in myself. Just the number of different tools that you might have to integrate for those kinds of flows has been relatively surprising, and the different ways that people want to do it.

[0:34:03] SF: Yeah. I mean, I think if you can get to a place where we can automate unit tests, certain parts of security, governance, on call, documentation, the types of things that are not always the funnest jobs, but have to be done, I think that would make a lot of people happy.

[0:34:18] EF: I agree.

[0:34:20] SF: What about zooming out even beyond laying chain? What are some of the key innovations that are happening in AI right now that are interesting to you and you're keeping an eye on?

[0:34:32] EF: First and foremost, tool calling performance is definitely the best one. The way people are using tool calling in LangGraph is really as a reasoning model. An open domain reasoning model, where you have some context and data, any of a question of like, hey, is this an important customer, or a not important customer that just reached out to my inbound form? There's a threshold that we haven't hit yet, where you just trust the model to make the right decision out of the box. I think that's probably the one that throws gasoline on this fire in terms of what people can build with them.

Where right now, even if you successfully run your first 100 things in your test set through it, the 101st that fails in an unexpected way is a big hit in terms of your ability to release that. People are using LangGraph in cool ways to guardrail that and make sure those kinds of unexpected ones don't go out. the reasoning capability is definitely something that leaves it to be desired. 

I'm personally very excited about these multi-modal input and output models that we're starting to see, where we have seen text and image modalities as inputs to models for a little while now, starting with GPT4 vision preview and then 4.0 is now out of the box capable of that. Now we're seeing more models that are actually outputting mixes of text, images, audio that are not necessarily the most reliable yet, but they show a vision for what the future can look like once training those models produces something that's very good.


Potentially hot take, but personally I'm a little bit less excited about some of these video models and things like that. I think they make really cool art and they're really good for brainstorming in the creative space. I guess, for me personally, maybe a little bit less useful. We're totally going to see people generating prompts for those in LangChain and then generating critiques of them to edit over time. I'm sure that'll be a use case that we see.

Back to the modalities one now. I'm really excited about audio. I think OpenAI real time, or advanced voice in the app and then the real time API is an area that I'm personally very excited in. I actually got my start in NLP back in what was it? 2015. My first internship was at a company called Gebo, which is this little white robot that was out of a media lab. We were doing some strategies for open-ended text there, but the ability to interact with something over voice I think is a pretty magical experience when it works well. I was actually using advanced voice mode to practice my Mandarin for the last few months for a trip I took to Taiwan last two weeks ago. t's pretty fun to be able to have a teacher in your pocket for anything you want.

[0:37:26] SF: Yeah. Also, I think, open up the door to real-time translation and not being that far from the vision that Star Trek put out years ago of the universal translator and things like that. It's pretty amazing. I worked on Google Assistant for a while, so I know the perils of and frustrations around voice when it doesn't work. It's incredible the step functions that are happening. I think going back to your point around multimodal and how it's not perfect performance today, but if you even look at image generation from 2022 to where it went from there to 2023 to then videos in 2024, the speed with which things are getting better is exponential essentially. It probably won't take that long until multimodal is significantly better performance than where it is now.

[0:38:14] EF: That's the hope.

[0:38:17] SF: Yeah. In terms of productionizing LLM-based applications, what is the biggest hurdle that people typically run into?

[0:38:23] EF: I would probably say, reliability is the first one where -

[0:38:26] SF: Standard system stuff.

[0:38:28] EF: Exactly. But not infrastructure reliability. In this case, I think it's probably more the outputs are non-deterministic. Say, you want, you need - even just defining the criteria for your customer support application, like what percent of my emails have to not piss off a customer in order for me to release this in production. Defining a criteria like that is something that I guess, we've had to do to some extent with training human operators of these kinds of things where it's like, okay, as soon as we feel someone has enough, I don't know, credibility to respond to these things. The amount of emails that are still happening just with vibe checks - Harrison actually did a talk on Tuesday with James from Character AI. He was talking about some of a lot of the early emails on Character where literally, just the researchers playing with the systems and just vibe checking it.

I think there's still a lot of that going on just because defining the criteria, defining concrete evaluation criteria is really challenging. Now we have lots of systems. LangSmith has some great systems for running evals, both online as well as before you release new versions of your application. In order to get any of those evals that you really do need to put in the effort to define what that is.

[0:39:45] SF: Yeah, and there's a whole new crop of companies that are investing, building products. Brain Trust I'm excited about. I think we're having them on the show sometimes soon. There's people trying to address that issue. It is something that fundamentally has to be addressed, because the non-deterministic nature, it's very hard to tell whether when you make a change, are you actually moving things in the right direction or not? It is a lot of just dipping your finger up in the air and seeing which way the wind is flowing at the moment. I think there is a lot of this vibe checking that's going on.

[0:40:14] EF: Totally. I think LangSmith's eval actually has done a really good job of helping turn those live checks into real evals. We launched an annotation queue, I think about a year ago at this point. The way people have used that to curate data sets from live data, or even just from internal data in terms of interacting with things and converting those into evals. We obviously have lots of features from using LLMs as a judge, or converting, just running code to evaluate those kinds of things. It's been really interesting working with lots of lots of different customers on that to just see how different organizations think about that.

[0:40:50] SF: Awesome. Well, Erick, thanks so much for being here. I really enjoyed it.

[0:40:53] EF: Thanks so much, Sean. Have a good day.

[0:40:54] SF: Cheers.

[END]