EPISODE 1863

[INTRODUCTION]

[0:00:01] ANNOUNCER: Dynamic languages, like Ruby, Python, and JavaScript, determine the types of variables at runtime rather than at compile time. This flexibility allows for rapid development and concise code, but it also makes it harder to catch certain classes of bugs before execution. Type checkers for dynamic languages add structure and safety without compromising their expressive power. Sorbet is a static type checker developed by the Stripe team and designed specifically for Ruby. The motivation behind Sorbet stemmed from the growing complexity of production Ruby applications, where developers needed stronger guarantees and more scalable code quality tools than dynamic typing alone could offer.

Jake Zimmerman is a software engineer at Stripe and leads development on Sorbet. He joins the podcast with Josh Goldberg to discuss his background, the challenges of typing in Ruby, the motivation behind Sorbet, its architecture, performance optimizations, and more.

This episode is hosted by Josh Goldberg, an independent full-time open-source developer. Josh works on projects in the TypeScript ecosystem, most notably TypeScript ES Slint, the tooling that enables ES Slint and Prettier to run on TypeScript code. Josh is also the author of the O'Reilly Learning TypeScript book, a Microsoft MVP for developer technologies, and a live code streamer on Twitch. Find Josh on Bluesky, Mastodon, Twitter, Twitch, YouTube, and .com as Joshua K. Goldberg.

[INTERVIEW]

[0:01:45] JG: With me today is Jake Zimmerman, developer at Stripe on Sorbet and Ruby infrastructure. Jake, welcome to Software Engineering Daily.

[0:01:51] JZ: Awesome. Thanks for having me, Josh.

[0:01:52] JG: We're really excited. You do a lot of really interesting work, and you have a long history of very fascinating blog posts. Before we get into Sorbet and type-checking in Ruby, can you tell us, how did you get into tech?

[0:02:03] JZ: Oh, sure. It's a long story. It goes back all the way to seventh grade. My school had an elective for computer science and an elective for mechanical engineering. I took both of them and I liked the software side of computer science more. I've been working on tech stuff, basically, since middle school and then all through college. It's been a very long love of mine.

[0:02:26] JG: In university, based on your website, you were involved in student groups, and you took quite a few interesting classes. Was there anything that jumped out to you then that's now relevant to your work on Sorbet today?

[0:02:35] JZ: Yeah. I think the thing that's relevant today is how much being a teaching assistant in college feels like helping out with the onboarding program in the industry. There's a lot of overlap in terms of just trying to come up with exciting ways to keep people engaged, because the people who started your company are similarly checked out in the first week, where there's just so much going on, so much to learn. You want to give them all the tools that they need for their job, but you recognize that they're not going to remember everything that you can say. If you can use some of the whimsy that you might have developed as a teaching assistant in school, maybe the people will remember the onboarding material a little bit better.

[0:03:09] JG: That's excellent advice. Let's put it into practice. I'd like to give you two prompts. The first prompt is going to be, suppose I'm an overloaded engineer on my first day, how would you introduce Sorbet to me? The second prompt will be, I'm a much more relaxed engineer in my second week. How would you introduce Sorbet to me? Can you get started on the first?

[0:03:29] JZ: Sure. I think the way that we introduce Sorbet to people on their first day is that you're going to have a lot of tools that help you write good code at the company. One of these tools is Sorbet. Sorbet is a type checker for Ruby, which means it's constantly running in the background, looking for type errors in your code. When it finds them, it will flag them with little red squiggles, but it can do a lot more than that. It will also help you understand the relationships between your code, let's you jump to definitions and find references really quickly, show you autocomplete suggestions. It's this tool that's helping you get your job done faster and hopefully, make fewer mistakes.

Then, maybe the longer-winded example is, maybe after a few weeks of writing code, you've seen all the things that it can do, but you've probably only scratched the surface. Type systems are this very powerful tool that if you really lean into it, can help shape the larger design of your programs. Sorbet itself has some fairly interesting type system features that are somewhat unique, but also, somewhat shared with other type systems. The more that you know about what's possible to express and how to use the type system, the better you're going to be able to use those features I mentioned before, about catching errors and navigating through the code base and stuff like that.

[0:04:37] JG: That sounds exciting. Let's say, I know very little about types or type systems. I've only ever programmed in, say, Python without, or JavaScript, or Ruby without. What would you describe as some of the entry, or intro level features for a type checker like Sorbet?

[0:04:51] JZ: Yeah. One of the biggest features that you get when you're just stepping into the type system, and one of the things that was a key motivating example of why we built Sorbet in the first place was you have a lot of code where you've got a function and it says, "I accept some parameter called merchant." You don't really know what that merchant parameter is. It sometimes refers to some database ID representing that you could load that ID from the database and get back of real merchant object. Sometimes it is the merchant object itself.

Whether it's a string or an actual database model is something that's really important to know, because they're going to support different operations. If you can go through your code and annotate, yes, this merchant parameter is an ID and this merchant parameter is an object, you get a lot of things for free. You get those type-checking errors that I mentioned before, you get the ability to query what methods are available, because maybe a string is only going to have make this uppercase, or make this into two strings, or get the first character, or whatever. A database model is going to be able to actually get you the fields that are on a merchant object. That's the pitch for why you might want type checking. It just makes it easier to understand the relationships in your code.

[0:05:58] JG: Sure. All that sounds lovely. However, Ruby has a bit of a reputation for being a bit of a Wild West as a language, where there's a lot of wacky stuff you can do in Ruby. Makes JavaScript look like C-sharp. How do you represent all those wild and wacky overrides and added dynamicism in Ruby in a static type system the way Sorbet has things set up?

[0:06:21] JZ: Yeah. I think that's definitely the case. I think, especially when you compare JavaScript and Ruby are both dynamically typed languages. You can do the same things, but the languages are not just the syntax and just the features that they have, but they're also the communities that spring up around them and those communities develop different patterns around how to use the features of the language. I think that you're absolutely right that in Ruby, people have tended to really, really lean into that dynamicism. What that means in practice is that you will find Ruby code where it uses a lot of meta programming. By that, we mean it's dynamically defining methods at runtime. You'll have a bunch of logic factored out into these helper functions whose sole job is to define other methods. This can't be done in JavaScript. My impression is that a lot of the times when you're dealing with methods in JavaScript, they just show up syntactically at the top level of your class. They don't really get hidden so much, such that they're finding their way onto an object or a class at runtime.

Sorbet has to deal with the fact that there's all of these hidden from the static system definitions. It's not perfect. I think that Sorbet has mechanisms to deal with this in certain cases, but that is definitely a limitation. If the reason why you really like Ruby is because all of this flexibility that it provides you with meta programming and runtime introspection and stuff like that, in some sense, adding a type checker might just be a strict loss for you, because it removes the ability to use these things.

The selling point for why this might actually be a blessing in disguise is that those sorts of programming patterns in especially large code bases tend to be extremely confusing and hard to teach people about if they weren't actually the one to write that meta programming facility in the first place. Number of developers at Stripe is in the thousands at this point. I don't even know what the number is, but when you have that many people working on one code base, you really benefit from having guardrails to say, if you're considering implementing new forms of meta programming, maybe don't, because it will be harder for people to understand and also harder for the type checker to provide this intelligence for you.

[0:08:20] JG: Sure. You've been working on Sorbet since 2017. When was it first started as a development project?

[0:08:26] JZ: Sure. Yeah. I started at Stripe in 2017. It also started in 2017, and then I joined the project one year later in 2018. It has a pretty long history. The interesting part that people ask us about this history is like, how did you get started? How did you convince the company to start this type-checking project? The funny thing is that it was the opposite, that when we were going about our job, just maintaining Ruby infrastructure at Stripe, we continually ask people, okay, we've built this thing for Ruby. What's the next thing that you really want us to work on? What's the next thing that would really make you more productive in your job?

At some point, the overwhelming answer to this question was, it would really help if we had a type checker that provided all these benefits that we've already been talking about. In 2017, they started evaluating some of the options for how to proceed, whether that might mean picking an off the shelf Ruby type checker, which didn't really exist at the time. It existed in some hobby projects and research projects. Or whether we might want to rewrite into a different language, which had its obvious downsides of the complexity involved in that, or whether we want to hire a team and build a team internally to actually create a new type checker from scratch.

[0:09:33] JG: Jake, that means you've been giving similar, or very introductory overviews of Sorbet and its features for seven or eight years now. As someone who works deeply on it, there've got to be quite a few architectural nuances, or interesting type system features that you're excited about. Is there anything in particular you'd like to bring up?

[0:09:49] JZ: Yeah. I'm so glad you asked this question, because this is how I wish that every podcast would go, is we just dive straight into the super advanced details and architectural stuff. There's so much that I could talk about. I think for the sake of the podcast, keeping it short. One of the things that I'll focus on, one of my personal favorite features is how Sorbet models certain kinds of meta programming that you can do with classes as objects.

In Ruby and most dynamically typed programming languages, really, classes themselves are expressions. You can pass them around as values and stuff like that, which also means that you can accept a class object and then dynamically instantiate whatever the user happened to give you. Maybe if you give in a class that creates houses, then you can instantiate it and get back a house. If you pass in a class that creates cars, you can do the same thing.

In a statically typed language like Java, this ends up being really clunky, because you have all of these weird factory patterns. In Ruby, it's supernatural, because you just pass the classes around themselves. Technically, you're operating on the factory pattern, but it's super transparent because it's just operating on the runtime class values themselves. Sorbet also has support for this. It can know that when you pass a class object and then dynamically instantiate whatever you happen to be given, that you get back an instance of that class. It makes it easy to model the types of Ruby programs that you see in practice. I think that the way that it works is super cool. It uses generics, like class-based generics in a way that you might not have expected it to and falls out really nicely, and maybe you could think of the theory of Sorbet. I think it's just a really cool feature that also makes it possible to write some really cool code.

[0:11:23] JG: That was an explanation that very clearly described not just the Ruby and Sorbet features, but also what it means for a language to have first class, say classes or first-class functions. Ruby has a lot of features that other programming languages don't have. Do you ever feel that there's an added difficulty in describing them in the type system, compared to some of the more traditional languages?

[0:11:46] JZ: Yeah, absolutely. I think that a lot of the features that it has are just what's possible to do at runtime. Ruby has this concept of private methods, but these private methods aren't actually private methods like you would have expected them to be in aesthetically typed system. They're this really weird hybrid of actually protected methods and some visibility modifiers that only allow you to call things on itself. There are examples, where because the language and all of its features were designed about what was possible to build into the VM, no one really ever thought about the static semantics of these language features. Yeah, there's absolutely places like that, where trying to retrofit some type system feature to model what the VM makes it possible to do is it's basically the entire job.

[0:12:30] JG: I'd like to dive down a little bit more into areas, like the VM now. Could you give an overview of how Sorbet actually works, the programming product Sorbet?

[0:12:38] JZ: Yup. Sorbet is a C++ program that basically, it parses your code, reads all of the source files off of disk, parses them into some abstract syntax tree, and then type checks them. The type checking itself is somewhat different from what you might expect a typical static analysis pass to do, because most by the book compiler's courses will tell you that you should parse to an AST and then type check that AST.

Sorbet goes one step further, because the Sorbet type system has this notion of control flow sensitive typing, where if you branch in some if condition, or case statement, or something like that, Sorbet will know that in one branch, if a certain condition is truthy, that a certain assertion holds about different variables types. To be able to model this control flow sensitive typing, you can do one of two things. What Sorbet does is it actually compiles the AST into a control flow graph, where these control flow branches are explicit in the representation of the program, and then it does type checking on that. That's Sorbet's high-level algorithm, is it parses all of your code, builds some explicit representation of the control flow in the program, and then type checks that.

[0:13:46] JG: When you say AST, is that A for abstract, Syntax Tree?

[0:13:50] JZ: Yup. Sorry about that. Abstract Syntax Tree.

[0:13:52] JG: Interesting. Why do you need to do it this way? Is it because of how dynamic the types are in Ruby?

[0:13:57] JZ: I think that it's actually for convenience of implementation. When you take an abstract syntax tree, you're going to have anywhere from dozens to hundreds of different syntactic nodes for every different language feature that you have. If you have one construct that represents conditional branches, then you only have to implement the control flow sensitive logic once. You don't have to implement it once for if nodes and once for unless nodes and once for while nodes and once for rescue nodes and once for break nodes and continue to like - You just model all control flow as this one node in your control flow tree, and then you implement a very standardized algorithm for modeling that control flow. It's less because of Ruby makes it difficult and more because this model makes it easier for Sorbet.

[0:14:39] JG: That's fascinating. Now that you mentioned that there are quite a few different constructs in every programming language for a control flow, but what you're describing, if I'm understanding right, is that you've unified and abstracted away the language specific details of if, rescue, and so on, and just turned it into, this is how the code might branch through.

[0:14:56] JZ: Exactly. Yup, exactly.

[0:14:58] JG: That's great. Tangent, how come C++? Why not say, Ruby?

[0:15:03] JZ: Yeah, that's a great question. We have a whole internal design doc that I think would be really fun to publish one day, but it was a very explicit choice early on in the project. The choice of Ruby was definitely considered. I wasn't a part of this decision, but I know that the original team actually sat down and chatted with the team building Mypy, which is a static type checker for Python, that actually is written in Python. There were some people on the early team who were very excited about that, because it means that you might be able to attract the wider Python community to work on the Python type checker, or the wider Ruby community to work on the Ruby type checker.

The Mypy team had explicitly advised the early members of the Sorbet team not to take that approach, because what they found from implementing this in Mypy was that the performance of it was very, very hard to tune. Given that we knew that the whole reason why we wanted to build a type checker was to be able to give people fast feedback about their code locally and in CI and in their editors, and given the scale of Stripe's codebase at the time, building a type checker that was fast and in Ruby was going to be a Herculean task, basically.

That left all of these maybe compiled languages, like C++, or Rust, or Go, or whatever. I think even OCaml was mentioned in the design doc, because Flow is implemented in OCaml and there were a history of other type checkers being implemented in these functional programming languages. The decision to go with C++ was actually just a very practical one. It was three people on the team at the time, had tons of experience working on large C++ codebases from previous employers and previous experience. Rust at the time was up and coming and not necessarily the technology that it was obvious to bet on, and definitely no one on the team had experience using it in a large codebase, or a large type checker project.

Go, I think, was not considered, because of the founding members of the team's experience and understanding that the key operation that you need to control in a type checker is minimizing allocations. The actual computational complexity is not actually the thing that drowns a type checker. It's whether you have too many spurious allocations that really dominates the performance characteristics.

In Go you don't really have nearly as many controls over whether you're allowed to allocate in a certain spot. But in C++ and obviously, Rust, you have very, very fine-grained controls to guarantee that an allocation isn't happening in a hot path. It ended up being just a checklist of various conditions in C++ ended up checking the most boxes.

[0:17:31] JG: It's fascinating how the positioning of the project's people, the year it's released, the company around it can so drastically influence a major project in the ecosystem. For reference, TypeScript was originally written in TypeScript upon release, and just recently announced a very large effort to rewrite in Go, because now and almost a decade later, the characteristics of the Go ecosystem have changed and the needs of TypeScript, like you said, have evolved from less of we need to bootstrap and experiment and more, we need performance and better memory profile. A lower-level language like Go, C++ and so on can do that with a lot less engineering effort than say, Ruby or JavaScript.

[0:18:10] JZ: Yeah. I'm also very excited about the TypeScript Go rewrite. I've been following it a little bit, mostly because working on Ruby infrastructure at Stripe, I also sit very closely next to the people who work on JavaScript infrastructure at Stripe, and the TypeScript type checking job inside of Stripe CI is I want to say, dozens of minutes long, because of the performance architecture of TypeScript. 

To my understanding, TypeScript, the one written in TypeScript is single-threaded and JavaScript codebase the size of Stripe's, you have to go really out of your way to break up your code into separate packages to be able to get any parallel type checking. I'm very excited about the Go rewrite, just because I think it will unblock just tons and tons of performance optimizations for them. Also, to the point about, I think, one of the reasons why TypeScript ended up choosing Go was because their plan for how to port it was to actually be this port, verbatim almost, where you have a function called foo in TypeScript and you port it over to a function called foo in Go.

That's the thing about TypeScript and Go is that they're very similar. They're both implementing this structural duct-typing type system where classes just happen to implement interfaces as long as all of the methods have been implemented. You don't have to explicitly include an interface, or explicitly declare implements to get that type's relationship to show up. It makes a ton of sense for them to pick Go, because it's the easiest to implement that porting strategy. You don't have to fundamentally rethink the core types and objects and functions in your type checker.

[0:19:38] JG: Speaking of performance, you've talked recently about the different performance strategies that you've taken with Sorbet. Are there any particular initiatives you'd like to spotlight that have happened recently, or ongoing?

[0:19:48] JZ: There's always performance work ongoing. That is basically my whole job. This is why I lit up so much when you asked me to talk about type system features, because I spend so much of my time in the performance mind, as I sometimes don't get to talk about type system features anymore. The main performance work that we're doing right now is a somewhat fundamental re-architecture of Sorbet's internal. What I mean by this is the approach to performance that Sorbet has always taken is we are going to squeeze as hard as we can to micro optimize various parts of Sorbet. We're going to make all the internal data structures use crazy bit flipping hacks to squeeze every last ounce of performance that we can get out of the assembly.

That works really well for a long time, but at some point, you just run into scaling laws, where you basically optimized it as much as you can. The only way to get further performance is to rethink the algorithms and operations that you're doing at a larger level. That's the work that we're doing right now. The model Sorbet has always taken until now is it will read every file in your codebase. It will do some global fixed-point analysis to figure out where all of the definitions are, where all of the classes are, what inherits from what, and it'll do this all at once. It'll read everything and then do this global analysis to figure out where everything is defined.

Then after that, it will type check everything in parallel. Having built up this global model of all the classes and methods and types and whatever, it will type check each file in parallel. Which means that there is at some point, 100% of the codebase in memory. Given how big Stripe's codebase is, keeping 100% of the codebase in memory is anywhere from 15 to 20 gigabytes right now. If you have a 64-gigabyte dev box development machine, that's something that you can do. But if the codebase doubles one more time, now you're talking about over half of the memory doing nothing but just type checking the code.

What we're doing is trying to figure out how we can type check only a subset of the codebase at any given time and then move on to the next subset. You figure out this particular bit of code is entangled and all needs to be type checked at once, we're going to read that small subset, type check it, and then page it out back to the file system, move on and read the next thing. This requires having a much better idea of explicit dependencies from one piece of code to another. That involves some other internal tooling that Stripe has built for modeling these dependencies. The core algorithm is we fundamentally cannot read 100% of the codebase anymore. We need to figure out a way to make it only read a subset, which is a fun time basically, because it means rethinking a lot of the core assumptions. It almost feels like a green field project, even though the project is seven-years-old.

[0:22:24] JG: How easy, or how doable is it to feel confident that you're reading a subset of the codebase and you understand fully what other parts of the codebase it relates to? Do you ever have magic hidden, implicit dependencies that make this difficult?

[0:22:37] JZ: That is exactly the problems that we're running into is realizing all of the places where our dependencies are getting circumvented. It ties back to the fact that the language itself was untyped, right? You didn't have to be explicit about these dependencies. People just wrote code and it just happened to work. As long as it loaded in the right order at runtime, then it doesn't matter that those static dependencies weren't explicitly written down somewhere.

The good thing is that the system that we have for tracking these explicit imports of one piece of code into another piece of code is pretty robust. We think that it's mostly correct. It's mostly possible to capture all of these relationships, because no one was ever using it for the fidelity that we were needing to use it for building this explicit traversal of the dependencies inside of the codebase, inside of Sorbet. No, it wasn't 100% faithful, but we think that it is expressive enough, where we can just go find the places where the relationships weren't being written down and write them down. Then once it's powering Sorbet, that will be a check that it remains correct going into the future.

[0:23:40] JG: With the codebase, the size of Stripe's, I imagine you have probably a pretty good representation of most all of the wild and wacky stuff people might be doing in the wild.

[0:23:50] JZ: It's actually a little bit of both. We definitely have a lot of wild and wacky stuff, if it's possible to write it inside of a method body. Every kind of combination of control flow, or language feature one plus language feature two, we probably do. There's a lot of stuff where Stripe has decided, we really do not want this feature in use in the codebase anywhere. For example, one of these features is Ruby lets you dynamically get a constant based on a string. You can say like, there's a constant somewhere in the codebase called foo, and I'm just going to ask the VM to give me that thing, even though I don't have it in scope.

This is really nice if you're doing some sort of a meta programming that we talked about earlier, but it's also incredibly hard to understand if that foo is not a string literal, but rather it was a variable that was passed through 12 levels of indirection. You don't actually know what constant you're dynamically accessing at that spot. That's an example of something that Stripe's codebase explicitly disallows is you're not allowed to call the cons to get method in the Ruby standard library.

[0:24:53] JG: Are there any collaborations ongoing with the Ruby core team, or the Ruby language itself that might be useful for Sorbet, or even working with you in Sorbet?

[0:25:03] JZ: Yeah. We actually meet with the members of the Ruby core team every few months and talk about improvements that we can make to just the wider typing ecosystem. The way that the Ruby community more largely has decided to come up with type annotations is to have these type annotation files that live alongside your code called RBS files. They're very similar to TypeScript's .d.ts files. That's the main interface that the Ruby community has right now for type checker agnostic type annotations. We've been working with them a lot on trying to come up with the best way to move that specification forward.

Another big collaboration that we have is we work a lot with Shopify, who has a similarly large-sized Ruby codebase, and trying to make sure that we can build type system features and build type annotation features that make it easier to interoperate with the wider Ruby ecosystem, especially the part of the Ruby ecosystem that is enthusiastic to adopt to typing.

[0:26:03] JG: That brings up a couple of follow-up questions. I'd like to start with that section. Are there things that are happening, or going to be added to Sorbet and/or Ruby soon that you think the people who are excited about typing will be excited about?

[0:26:16] JZ: One of these things that I think people might be excited about is just the seemingly renewed interest in coming up with better syntax for defining types. Early in the development of Sorbet, we didn't really lean too heavily into, let's come up with a super, super pretty syntax, because the things people were asking for were all solved by having a type system at all, not necessarily having a type system that had a great type annotation syntax. As Sorbet and typing in Ruby have become more popular, the people who are on the margin, who might want to adopt the type checker or might not, maybe their last deciding feature is whether the syntax is tolerable, or palatable.

For a lot of people, it's a tough pill to swallow, to deal with Sorbet's type annotations syntax. That's where I think a lot of the energy is in the community right now is trying to come up with better type annotation syntaxes. I think that's probably for the people who are on the fringe, the thing to pay the most attention to is how much people are working on thinking and implementing better type system, or better type annotation syntaxes.

[0:27:20] JG: I'd meant for the second question to be, you have out stuff for the people who are on the fence, but let's switch a little bit to the people who are very excited. Let's say, that someone is already a Sorbet user. I'm equivalently large to a Stripe, or Shopify, or maybe just a large codebase. I know their performance improvement's coming down. I've started making my dependencies explicit, very excited stuff. What else do you have in store that you think I will be juiced about?

[0:27:47] JZ: There's a couple long-standing big type system feature gaps that we've never really had a chance to invest in, that I think we're going to get the chance to invest in somewhat soon. One of these, actually, we have an intern on the Sorbet team working on right now, which is that there's - Sorbet has interfaces, where you can have these abstract methods and then classes that implement them. It also has syntax for declaring records, or structs, where you have some simple syntax for declaring Getter and Setter fields with a certain type. But these two features really do not play well together at all.

If you try to implement an interface with a record, or a struct, the override checking vanishes. You don't actually get the interface override checking that you thought you were getting. I think that's probably the most exciting feature that we're working on right now is finally getting that feature gap closed. I remember, the last time I had an intern was 2019 and I was looking at the list of intern projects and it was also in this list. It's been this long, long standing problem, and it's really satisfying to finally be able to find the time to go fix it.

Part of the reason why it was tricky to fix is just because of how widely used these two features are. I think the last time we checked, when we actually implemented the code to check that if you implement an interface via a struct, this causes 30,000 type errors on Stripe's codebase. We had kept punting the problem down the road, because we knew that if we were going to build this feature, we're going to have to burn down that list of violations. That's really the hardest part of my job is dealing with the fact that we have so much code. If there was ever a bug in the codebase, probably someone was implicitly relying on it. That's super satisfying to find that we're finally getting a strategy for burning these down and then being able to turn that feature on for the rest of the community.

There's also just, yeah, a handful of other features. I think one of the things that people have been asking for a while is better support for shapes and tuples. Sorbet doesn't really have nearly as good support for shapes and tuples, as for example, TypeScript does with its object types, where you can just have a type that says, "I happen to have this key with this value and this other key with this other value." Sorbet technically has syntax for describing this, but this type system level support for it is almost non-existent. I think that relatively soon, we might actually be able to really focus 100% of our time on fixing this problem and building a solution that actually works.

[0:30:14] JG: This is going to be very joyous for you now, to be able to focus on types system features.

[0:30:18] JZ: Absolutely. Yeah, exactly. I think that's one of the nice things about this performance architecture that we've chosen is that it should scale as the code base scales. That frees us up a lot to be able to focus on the features themselves, because it's not that every six or 12 months we're getting pulled back to work on performance work that we can finally think a little bit more long-term about type system features, or editor features, or Ruby ecosystem features and stuff like that.

[0:30:41] JG: In terms of bang for buck, where something might take not that much work to do, but you are very excited about it, what is the most bang for buck feature that is now, or will soon be worked on for the type system?

[0:30:53] JZ: Yeah. I think the shapes and tuples is hard to say is the best bang for buck. I think that the bang will be there, but the buck will also be similarly high. I think that it will be a very impactful feature, but also a very expensive feature to come and figure out a solution for. I don't actually know necessarily what the highest ROI feature would be for us to build, because a lot of the really impactful features that we would like to build, the reason why we haven't built them is because we know that once we implement them, it will involve doing a lot of work internally in Stripe's codebase to get it adopted. It's a great question. I actually don't know necessarily that I have a list of features that are super impactful, but also quick to build. Because I think in a lot of cases, we've actually built them.

[0:31:36] JG: That's a good sign for your team that you're prioritizing the things that get the most impact out soonest. You mentioned that you're going to be working on ecosystem, or community areas as well. I understand that there are at least one or two different ways that say, the community can write type definitions, or types for packages that aren't written in Sorbet. How is that area of the type system in Sorbet shaping up these days?

[0:31:58] JZ: I think that's a huge gap. I think part of it is because Stripe's codebase doesn't routinely publish open-source gems that then get consumed internally. We don't exercise any of these flows. We're not necessarily dogfooding the features that you would need in Sorbet to have a really great ecosystem level support. I noticed this most acutely when people leave Stripe, go start a startup and then they complain to me about all of these features that I had never heard someone complaining about when they were in Stripe, where you just want to interoperate with gem, or you just want to interoperate with rails, or you want to publish a gem and have people consume the types in the gem that you published. These things are workflows that basically don't get used inside of Stripe.

There's ways to address it. There's ways to just go through and say like, "Okay. Well, this is a problem. There's an easy fix for it. This is a problem. There's an easy fix for it." I guess, that's the thing I'm most excited about is the chance to finally have space to go do that. One example of a feature like this that's missing is if you have a generic class and you declare that generic class as being generic in maybe it's a container and it's generic in some element type, and in your source file you declare that generic element type. If you also then write an RBI file to declare the types for people to consume your library and you declare the class's name and you declare the generic element type, that on its own Sorbet says has a conflict, because you've redundantly declared the generic type.

Simply putting a generic type inside of a RBI file means that you get this spurious error. That's not a complicated fix. It's just something that we have to go think through what the ergonomics of it are like. What's the meaning of, you have this type declared in one spot and you have it redundantly declared in another spot. It's the thing that blocks you from being able to publish types for your generic classes and your gems that people using your gem could use and it makes it tricky to have this ecosystem of typing, tooling, spring up around Sorbet.

[0:33:57] JG: Heading towards the end of the interview, and I have a few lightning round questions for you at the end, but before we go there, are there any other areas, such as the community work, or upcoming initiatives that you're excited to talk about?

[0:34:09] JZ: I don't really know. I think that the stuff that we've been talking about so far is the stuff that I'm excited to be talking about. I think that the other pitch that I might give is that Sorbet ties really closely into code quality tooling. I think that there's been a big push recently, at least inside of Stripe, but also in the larger community to try and figure out how we can measure productivity and how we can measure code quality and stuff like that. I think that a lot of the ways that we measure this right now are just vibes, and I think that one of the things that we've seen success with at Stripe in terms of deciding what good code quality looks like is figuring out what's important inside of the codebase.

Sometimes what's important is a very standard metric, like maybe test coverage, or something like that. But sometimes maybe the thing that's important about code quality is just whether you're using a specific library that was old and legacy and you're not supposed to use anymore and how many people are still using that old legacy library, or another thing are, you started in an untyped Ruby codebase and you're trying to migrate to a typed Ruby codebase, how many files are not typed yet?

I think that, yeah, one thing that I'm also excited about is just how each individual code base can craft a metric for what productivity looks like in that codebase, and what code quality looks like in that codebase, which is something I believe is basically, the only way to do it is these metrics don't necessarily transfer from one code base to another. But I think once you realize that the way to measure these things is hyper local and hyper specific to individual projects, it unblocks the ability to make progress on actually measuring the thing, because it'll be what matters for you.

[0:35:47] JG: I really like the way you're phrasing that. There have been a lot of efforts to make these one-size-fits-all metrics, for example, test or typing coverage as you described, but, A, those might change in each code base. And B, each code base's value, or waiting for each of those could be completely different. You also mentioned a little, yeah, harder to measure ones, like you're on a very outdated gem or dependency, how do you measure the quality impact of that? That's really interesting.

Let's enter the lightning round. Jake, I'd like to give you two features of a type system that I know you like, and then I will ask you for one to two of your own that you know are very useful and theoretically interesting, but you wish people used more, and I'm going to ask you to explain them. Are you ready?

[0:36:28] JZ: Oh, boy. All right, let's hear it.

[0:36:30] JG: Union types. What is that?

[0:36:32] JZ: Yeah, union types is basically, you can say, I have a value of this type, or a value of this type. It's this junction. It's like, either or choice.

[0:36:40] JG: Why is that useful or good, or how could I even work with that if I don't know what type my value is?

[0:36:44] JZ: The most common case where this is really good is for representing the possibility of failure. If you say, either my method returns a successful result, or it failed to produce a result with one of these known classes of problems, this is something that every time that I work in a language without union types, or without some types, I always notice that it's hard to represent this failure condition and propagate through the codebase. What ends up happening is people tend to get lazy and then they'll just raise an exception, and that exception won't necessarily be tracked in the type system and you end up with these, the happy path works really well, but the error conditions have not been thought through and the software ends up not being very robust.

[0:37:25] JG: The not being able to track exceptions, or thrown errors in the type system is, I think, one of the biggest source of issues in modern codebases, or previous modern codebases that is just prevalent. It's everywhere if it's not addressed.

[0:37:39] JZ: Yeah. I think, specifically, because you mentioned union types, there's a very minor nuance here between what you might call union types and some types. I think that the difference here is that some type traditionally is when you know explicitly, it's this closed union, where you only have either one thing, or another thing. When you talk about union types, you can bucket more things and make these ad hoc unions. You can say, in my method, it's either X or Y. The one method up from that, it's either X, or Y, or Z, and you don't have to necessarily declare a new type to capture that third alternative.

I think that one of the nice things about languages like Sorbet and TypeScript that have this ad hoc union type makes it really easy to just add one more thing to your error stack and not necessarily have to define upfront in the top of your whole codebase that you have one of 10 possible failures, you get these very fine-grain tracking.

[0:38:35] JG: This touches on what you were describing earlier about the pedagogy, the teaching approach of it, or the onboarding, where you can build all these fascinating useful features into the type system, but you also need to make it approachable. You need to make it, so that people enjoy having this stuff added on. It's not an added chore for them.

[0:38:51] JZ: Exactly. Yeah. I often think that it would be super useful, at least inside of Stripe, to give a whole undergraduate curriculum about how to use the type system. We start with this Stripe 101 session at their first or second week at the company. It covers really surface level things that doesn't really cover all of the really fancy things that you can do if you lean into every aspect of the type system. I think it would be really cool to, yeah, think through what those super useful type system features are and relate them to specific programming patterns and specific programming problems that the type system helps alleviate.

[0:39:25] JG: Well, let's dig in then. Here's the second of the two prompted type system features. Can you describe or explain what is sometimes called branded or opaque types?

[0:39:34] JZ: Sure. An opaque type is when all that you know about the type is its name. You don't know necessarily how it's implemented under the hood. The classic example of an opaque type is the Unix file pointer. When you open a file that's backed by some C struct, but the operating system only gives you a pointer to it. It doesn't tell you what that C struct's members are. It doesn't tell you that it happens to have an iNode pointer in there. It happens to have a pointer to a character string of its file path, or something like that. The only thing that it gives you is just, this is opaquely a file, which gives you the ability to then craft your own set of explicit operations that you're allowing to happen on this opaque type.

You say, explicitly, there are two public functions that accept a value of this opaque type, and maybe one of them is get the iNode and maybe one of them is get the file name. But if there were any other fields in there, you don't expose functions that let you access that field. This is a way to get information hiding, and then you get to control your public interface against your private implementation.

[0:40:34] JG: Great. Thank you. That's an excellent explanation. No notes. But now I require from you one last piece of technical content. What's a type system feature you wish people knew more about, or used more?

[0:40:44] JZ: I think a lot of people know about abstract methods and interfaces, but I wish people used them more. I think that, specifically in Ruby and Sorbet, people have this stigma around, "Oh, I should only have an interface if I'm going to implement it multiple times." An interface is a way to say, we have the test mode implementation of some interface and the production implementation of the interface and maybe an interface that's based on binary trees and an implementation that's based on arrays, or something like that, right?

People assume that an interface is only useful if you have multiple implementations back in the interface. I think that interfaces also make it really easy to get some of the aspects we were just talking about with opaque types, where you can essentially only expose certain things in the interface and hide all of the other things that the class that implements that interface would have needed to get its job done. Even if there's only one implementation of that interface, the data hiding and implementation hiding aspects of interfaces people tend to overlook. I think people should use interfaces and abstract methods more frequently.

[0:41:46] JG: I really appreciate that you, for the sake of the audience, didn't dive into some extremely difficult convoluted topic. Actually, you just brought up something that most developers who are experiencing these areas already know and understand, just a way to use it more effectively.

[0:42:02] JZ: I'd probably go into the example of using more abstract methods and interfaces every week. It's an answer that I give all the time, and I think that people still under use them.

[0:42:12] JG: Well, you're doing good work. Jake, I have one last question for you. I'd like to end the episode on something not technical. can you tell us, A, what does STP stand for? B, what is STP? And C, what is it like to partake, or to undergo STP?

[0:42:26] JZ: That's awesome. Yeah, so STP stands for Seattle to Portland. It's a bike ride, where you start in Seattle and you finish in Portland. The total distance is something like, 207 miles. Yeah, it's just an organized ride that, I think, the Cascade cycling group puts on every summer. It's a really great event. If you're ever thinking about doing it, I strongly recommend it. I recently did it, I think it might have been last weekend if I'm not mistaken. Maybe it was the weekend before. Yeah, it's a super fun event.

I was really excited to get the chance to challenge myself, because it was the first time that I had ever ridden anywhere near as far as that. The longest ride that I'd ever done before this was maybe 100 miles. Doing 207 was, it felt very accomplished to finish it. I had a pretty nice time, but I also think that the speed that I finished at was largely a function of the tailwind that I had the whole time. It's always nice when you get a tailwind. Yeah.

[0:43:21] JG: Did you think on work at all during the trip? Did you come up with any features, or bug fixes while riding the bike?

[0:43:27] JZ: I've noticed this about, I do a lot of outdoorsy hobbies, so cycling, hiking, skiing and stuff like that. I noticed that when I'm hiking, I have tons of work-related thoughts. I'm letting my brain wander and think about Sorbet in the background, or something like that. But for some reason when I'm on the bike, it's like, I may as well have no recollection of the last hour, or however long I was on the bike. I feel like, I just go a little bit numb and just riding my bike and it's blissful. I did not think about work at all for basically, the entire ride.

[0:43:56] JG: When you're, say, hiking, you're more of a Sorbet state. But when you're biking, you're more of a flow state.

[0:44:01] JZ: Yeah. Yeah.

[0:44:03] JG: Great. Well, Jake, thank you so much for hanging out and talking about Sorbet. You work on an awesome project that's doing a lot of really interesting work and very beneficial stuff for Ruby developers. We talked about the opening of the start of Sorbet, how it became such a powerful and big type checker, some of the architectural improvements you all are making in it, why it's in C++, some of the upcoming type system features, and of course, lots of great type system features for folks to use today. Jake, is there anywhere that you would direct people to find out more about you, or the project, or the stuff that you work on?

[0:44:34] JZ: Sure, yeah. You can go to sorbet.org to learn more about Sorbet. Specifically, I would encourage you to go to sorbet.org/slack if you want to join the Slack community, where people that are using Sorbet discuss and share their experiences. I think that it's pretty vibrant, and we always love new people showing up there.

[0:44:54] JG: Great. Well, for Software Engineering Daily, this has been Jake Zimmerman and Josh Goldberg. Thanks for listening, everyone. Have a good day. Cheers.

[0:45:00] JZ: Thanks, Josh.

[END]
SED 1863		Transcript

	(c) 2025 Software Engineering Daily