EPISODE 1641

[INTRODUCTION]

[00:00:00] ANNOUNCER: Sphinx Bio develops computational tools to accelerate scientific discovery. The company is focused on addressing the computational data analysis bottleneck by enabling scientists to do the analysis themselves. Nicholas Larus-Stone is the founder of Sphinx. He joins the show to talk about being a computer scientist at the interface with biology, the data analysis bottleneck in biology, designing a software tool for scientists, their go-to-market strategy and more. Nicholas also started Bits in Bio, which is a popular community for people building software for science. You can check out their upcoming meetups and hackathons at bitsinbio.org. 

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. 

[INTERVIEW]

[00:00:59] SF: Nicholas, welcome to the show. 

[00:01:00] NLS: Thank you. I'm glad to be here. 

[00:01:03] SF: Yeah, thanks so much for being here. You are the founder of Sphinx Bio and the creator of Bits in bio. But when I was doing some digging into your background I saw that your education is in computer science. I guess how did this crossover into biology actually start? 

[00:01:17] NLS: Yeah. I've been interested in bio for quite some time actually. Worked in a lab in high school over the summer. And I'd say always had these dual interests of computer science, biology. And then when I got to college, I took bio classes. I took CS classes and just enjoyed doing the CS homework more. 

I continued to work in labs in my undergrad, in my masters. But just felt like I actually liked writing code more. And the more time I spent in labs, the more I found that I wasn't actually great at the wet lab part of it. The actual physical manual labor there. And so, have been fortunate enough to like find myself at this intersection where I can do what I like, which is writing code and building software but apply it to a domain that I'm really interested in, which is science. 

[00:02:09] SF: Yeah. Amazing. And then I guess as you're building out a company in the space, if you need a wet lab, you can hire people to do that part that are maybe more interested and enjoy it. 

Yeah. I found when I was doing my computer science degree, I took physics classes and I took chemistry classes. And I think the unappealing side of sort of the physical sciences for me was just I found the labs just super lengthy. And I was just kind of not engaged. Even though I could spend hours in front of my computer actually coding. I don't know. That was just somehow more enjoyable and felt like less of a waste of time than spending hours in a lab on campus for some reason. It just didn't jive with me. 

[00:02:46] NLS: That means you found what you're excited about, right? I've worked with scientists who kind of look back fondly at their PhDs when they're doing 12, 14-hour time courses in the lab and they have to be in the lab doing something every 30 minutes. And so, I think to me it was just a good signal that, okay, I don't want to be a kind of hands-on biologist. I'd much rather be a computer scientist software engineer. Because like you, I'd much rather spend hours building something out of software than spend hours sitting in the lab pipetting liquids around. 

[00:03:21] SF: Yeah. As well, it's good that you were able to recognize that about yourself as well. Tell me about Sphinx Bio. What is it and what problem are you trying to solve? 

[00:03:30] NLS: We're a software platform for scientists at biotech companies to manage and analyze their data. And the core problem that we're working on solving is this idea of an analysis bottleneck, which is where you can run experiments in the lab but you're actually waiting on the results of your analysis of the data. And that in my view should never happen, right? The hard part is actually doing the experiment in the lab. 

And so, what we see here is that the data is maybe complex to analyze or it takes a long time to actually gather both the data from the experiment as well as all the kind of context around that experiment. Or you have to wait for someone who knows how to code in order to analyze that data. And so, we give the tools to scientists to actually do that analysis themselves in a streamlined easy manner so that they don't have to spend time waiting. They can spend their time actually doing science, which is what they're paid to do and what they like to do. 

[00:04:31] SF: What's the main bottleneck? What stops essentially the flow of data getting to them for analysis? 

[00:04:38] NLS: Sometimes it actually depends on the type of experiment. But sometimes they don't have the capabilities to analyze that. Because the tools that they're often trained with are Excel. Some other kind of like older legacy applications as well which might not be a good fit for the more modern experiments that they're doing. That's like one barrier to entry I would say. 

Another barrier is maybe not that they can't do it but that each analysis takes a couple of hours to do. And they're running dozens of these experiments a week, right? And so, they just don't have the time to do it. And so, if we can take that down from, "Hey, I have this 12-tab Excel sheet that takes me a couple hours to analyze," to, "Hey, I can do this in Sphinx in five minutes." Then we can actually remove the bottleneck of piling up of old experiments. 

And then the last piece is a lot of experiments in science are run by an individual but the context might be shared across a company. And so, if you are testing – if you're trying to design a new drug and you're testing a specific molecule that you've designed, the analysis of that molecule is important to put in the context of everyone else who's tested it. And you might not have access to that data either from a permissioning perspective or from just they did this only on their computer and it wasn't shared with anyone else. And so, we work hard to try and bring that shared experimental context to everyone as well.

[00:06:06] SF: What does the data actually look like? What is kind of the shape of the data that comes out of these experiments? 

[00:06:12] NLS: It's wide-ranging. It depends on the experiment. We work with mostly tabular or semi-structured data is how I would describe it. There are other companies who do a lot with sequencing data or imaging data. We've kind of deliberately stayed away from those domains to start. But these often look like, again, pretty messy Excel spreadsheets. 

When you're thinking Excel spreadsheet, you might think of something that a human has designed with reasonably named columns, and data and all the rows. Oftentimes, these are not that. They're coming off machines where it's laid out in kind of a visual manner to make it easy for a scientist to look at. But it's not great for software to analyze that. 

And so, there is a large piece of our platform that we do in order to help kind of standardize and harmonize that data. And sometimes we do get kind of CSVs, TSVs, a little bit more friendly formats that you can just upload and start working with right away. 

[00:07:15] SF: How do you go from this kind of semi-structured tabular format data into something that's actually standardized. Presumably, I'd love to get into sort of like what's happening behind the scenes. But, presumably, that probably ends up in some sort of structured data store somewhere behind the scenes so that you can actually provide tooling to perform the analysis down the road as the scientist is interacting with the product. 

[00:07:37] NLS: Maybe answering that last piece first. We do make everything tabular at the end of the day to fit with existing analysis tools. And this maps pretty well onto what scientists do. They just typically do it by hand. People are literally copying and pasting values from one Excel sheet into another. And that's not great because there's often mistakes made in that transcription. And it's also extremely time-consuming and painful. 

And so, what we do on the kind of turning semi-structured into structured side is we first start by seeing if it's like an experiment type that we're familiar with essentially. And so, we basically pattern match to figure out, "Okay. Do we have a clean deterministic way to do this?" We've built out some state machines on the back end actually to help parse that data. 

And then at the end of the day, there's a lot of machines out there that are producing data in a lot of different formats. And so, we know we, as a small startup, can't capture everything. And so, we've started looking into using LLMs to help parse some of that semi-structured data and turn it into these structured tables that we can use for downstream analysis.

[00:08:51] SF: In terms of the pattern recognition that you're doing, is that sort of – basically, you're applying different heuristics to identify the patterns in the data? Or are you doing something more complicated using a machine learning and some sort of classification engine? 

[00:09:04] NLS: No. I mean, heuristics and asking users are the two main ways. We will ask for some user input on kind of more complex tasks. But we do use heuristics. Maybe to give a concrete example. We work a lot with plate data. Plates are a type of experimental setup in biology where you have a physical matrix of different wells. Each well contains either an experiment or a part of an experiment in it. 

And so, that data often comes back in a matrix format. It might not be neatly returned in a matrix format. There's a lot of data around that that we can use to help identify where that plate is. One heuristic would be we know what the sizes of plates are. And so, if the matrix matches that size, that's a pretty good indicator that we have plate-based data that we can then turn into a more structured data source.

[00:09:59] SF: Okay. Once you first perform sort of the pattern recognition to figure out what kind of data that you're dealing with, what happens next? What's sort of this whole pipeline that's going on behind the scenes? 

[00:10:09] NLS: Yeah. What happens next is, once we figure out where the data is essentially, then, in this case of the plate-based data for instance, we can then convert that into a true tabular format. And so, we'll do that on the back end and then we bring up a confirmation for our users to say, "Hey, here's what we were able to identify. And here's what it's going to look. Does that make sense?" And then if they say, "That looks good," give us the go-ahead, we'll store that and save that in its structured form for them.

[00:10:42] SF: You also mentioned that you're now exploring using LLMs to do some of the pattern recognition on essentially situations where you might not know the signature of the tool or signature of the data. Can you talk a little bit about what the work that you're doing there? What's that sort of LLM tool chain look like? 

[00:10:59] NLS: We released an open-source package called plate chain a few weeks ago which uses LangChain, the kind of popular open-source toolkit, pretty heavily. And so, what we've done there is provided a set of examples for people to work off, first of all. Second of all, an actual chain in the terms of LangChain that will take in this kind of semi-structured data and output coordinates essentially of where the structured data is, which you can then build the kind of like back end of that, which is taking those coordinates and using that to extract the data and do whatever you want downstream. For us, that's turning into a structured table. Other people might have different use cases. 

And so, I don't know how sophisticated our LLM setup is in comparison to some of these others. But we're pretty heavy LangChain users in terms of using their LangChain expression language. We have a demo with them of setting this up as API server so people can deploy this in their lab if they want to do this for all of their experiments. And then, of course, we use them for evaluation as well. 

Because in our case, the experimental setups change pretty often for our customers. And so, it's important to tell if an experimental setup changes. Is our prompt still going to work here? I think a lot of the magic there actually is in some of the prompting logic and basically rules that we give the LLM to help it identify where the plate-based data is.

[00:12:30] SF: How long did it take to figure out the right combination of prompts to get what you needed? 

[00:12:36] NLS: I mean, it's definitely been an iterative process. I think, honestly, just a couple of weeks of playing around with it. Not full-time of course. But just kind of end-to-end. It's been pretty stable in terms of prompts. There are been minor tweaks here and there. But we're able to figure this out pretty quickly. It's not perfect. We're always looking for new contributors and people who want to help out there. But we haven't done extensive prompt engineering in comparison because it actually works pretty well for the use cases that we've been applying it to.

[00:13:11] SF: Do you think without being able to use something like LLMs and prompt engineering solving this problem would be possible? 

[00:13:17] NLS: It's theoretically possible. It's just very hard. There are other companies in this space, TetraScience and Ganymede are two of the kind of newer ones who build what are called instrument connectors. And so, what they have to do is they have to go and figure out how many instruments there are and figure out what those instrument formats are. And then they build connectors for each and every one of those. 

And let me tell you, there's a very long table of instruments in the biological space. And so, they end up having to hire massive implementation teams to do all that. And so, again, we're a pretty small startup. We can't build connectors for thousands and thousands of instruments. And so, we use this as a good catchall. And I'm excited to think about more use cases where you can apply LLMs in this way, right? It's not a perfect solution. It doesn't do everything for you. But it can get you pretty far without having to hire hundreds of engineers to build all this for you.

[00:14:20] SF: Yeah. Even if you did get to a place where you were building out some of these integrations, I would think that even leveraging LLMs could be a to sort of like fast-track that. Because it's going to give you insight into how to do the pattern recognition step that you need to do and actually maybe help you develop some of these heuristics or algorithms that you need to actually do the less expensive, non-GPU-based pattern recognition to figure out the data structure. 

[00:14:45] NLS: It's interesting you say that. I feel like a lot of the prompts came out of the heuristics we had built. But I can imagine going the other way, too, right? Where you could just – as you work on the prompt, that actually kind of tells you a little bit about the heuristics you should be building and turning into more deterministic code as well.

[00:15:03] SF: Besides plate chain, I know that you also have a product or a feature that is focused on the folding playground. 

[00:15:10] NLS: Yeah. 

[00:15:11] SF: I guess, to start with, why is the comparison between folding model is like an important problem to solve? 

[00:15:18] NLS: Yeah. Maybe just to take a step back about what is a folding playground and what is protein folding. The core problem here is we have a lot of amino sequences. Protein sequences. And we often don't know what the 3D structure is. To actually experimentally determine that is quite expensive. 

And so, about two – I guess, three years ago now, DeepMind from Google came up with this algorithm called AlphaFold, which, depending on who you ask, more or less solved this problem. They basically got an accuracy that was almost as good as experimental determination in a lot of different use cases. And this is way better than anything that people had developed before. 

And so, that unlocked the floodgates for structural biology. And a lot of new use cases in this area rely on using AlphaFold or some folding algorithm to take a sequence and turn it into a structure. And so, what we found is, as we were working with some more customers who were interested in these types of pipelines, they would make statements, "Well, we have to use ESMFold, which is a different algorithm, because this is an orphan protein and it's going to work better for that." And then we'd hear other customers say the opposite, right? Like, "AlphaFold actually works better in this use case." 

And so, what it kind of became clear to us is that people didn't have good basis for making these statements. And I think at least part of the reason was it's a bit of a pain to set up all these tools and to compare them effectively. And so, that's why we wanted to launch the folding playground is to give people a way to answer those questions themselves. 

With all the other LLMs, there's been a lot of LLM playgrounds and ways to compare, "Hey, which model actually works best for me?" I think in bio, that's equally if not more important. Because a lot of these models, which are fairly general in theory, turn out to work much better on certain sub-problems than others. And so, we want to make sure that scientists are using the right tool for the job. And we want to make it accessible to them. And that was the goal behind the folding playground. 

[00:17:29] SF: And the value of understanding the 3D structure in the context of something drug design, it gives you insights essentially how it might bond to the protein. Is that right? And its effectiveness or its level of activity? 

[00:17:42] NLS: Yeah. There's a couple of different ways that this could be useful. One is, actually, if you're trying to make a drug against a certain protein, oftentimes you might not have a structure of that protein that you're trying to make a drug against. There's a bunch of techniques that you can use once you have a structure. 

Now there's been some recent work that shows that the AlphaFold predicted structures still aren't quite good enough to make use of some of these other structure-based methods. And so, more often, what people use it for is what you're saying. If you're making a protein drug, then as part of that kind of design pipeline, you'll use the structure prediction as basically a measure of fitness of whether or not you think this drug that you're making is going to fold into a real protein or whether it's just going to be a mess of goop essentially. And so, this has become very common in the kind of protein design world by using these folding methods as a filter step.

[00:18:43] SF: And then for the different techniques or modeling techniques for creating this 3D structure, what tends to cause the variance in terms of the predicted 3D structures? 

[00:18:54] NLS: Some of them have pretty big discrepancies in design. The main one is some of these are MSA-enabled versus not MSA-enabled. MSA stands for multiple sequence alignment. And so, this is what the original AlphaFold algorithm brought, which is it looks at a database of kind of all known protein sequences and it takes your input sequence and it aligns it against those databases. And it uses that as kind of a good initialization step, let's think of it, to help design that 3D structure. 

But some of these other algorithms like ESMFold, like OmegaFold, don't do that step at all. They're just trained on sequences. And so, they're actually much faster to run inference. But the concern is that they're less grounded in biological reality. And therefore, their performance is worse. 

And so, there have been a number of papers that have come out with these different techniques. And so, there's kind of more coming out all the time. DeepMind keeps releasing improvements there. We think they're significantly different enough that it's worth comparing them on your specific problem.

[00:20:03] SF: And then what are some of the other areas that you're focused on in terms of improving where there are bottlenecks kind of in the analysis process when it comes to biologists and being able to do their work? 

[00:20:14] NLS: yeah some of the kind of more concrete experiment types that we work a lot with are QPCR. For a number of RNA therapeutics companies. Think Moderna for instance. They'll often run QPCR, which is quantitative PCR polymerase chain reaction where they're trying to get a number of quantitative readout on expression of RNA. 

And so, one of our kind of core workflows is helping scientists calculate the fold change of their QPCR experiment. Seeing, "Hey, I've designed this RNA therapeutic, this drug, I'd like to see how much better it is than a known therapeutic that has this amount of expression." Streamlining that is a bit different than the folding protein problem but we think fits in the same overall paradigm of making these tools accessible to scientists.

[00:21:10] SF: And how do you figure out where to make these investments that could have meaningful impact? Are you doing – conducting essentially user studies with different labs? Or are you doing this based on your own experience and other people at the company's experience? 

[00:21:26] NLS: It's a combination of both. I've spent the last several years working at drug discovery companies and building these types of tools internally. And so, there's definitely a lot that is drawn from what I've built and seen there. And then a lot of it is also kind of in concert with users. We have a number of design partners who we work closely with. 

And I think the nice kind of happy middle that we've found is a lot of the core ideas have come from my previous experiences and the team's previous experience is working with these types of companies. And then the specific implementation is best done hand-in-hand with the actual users for their specific experiments. 

And so, that's been where we found the most success so far. And we're always looking to talk to more people and meet new users and hear about their use cases. But it's a combination. I don't think you can do just one or just the other. It really has to be talking to users and then using your own kind of intuition here.

[00:22:27] SF: Mm-hmm. And then how do you actually test that you're moving in the right direction? Are you building sort of minimal viable products? What is sort of the design process to actually make sure that you're learning? I would think that one of the hard things about go-to-market or product development in the space is just the learning cycles are potentially slower than they might be when you think about a traditional b2c product that you might be just putting out there on Twitter and hoping that people engage with. 

[00:22:55] NLS: Yeah. I think the cycles are definitely long. Biotech is a slow industry. I mean, it takes 10 to 15 years to get a drug approved. And so, when the companies are thinking at that time scale, it definitely trickles down even into their software buying practices. I think it requires us to have a bit of patience. It requires us to have conviction. And, yeah, it's not necessarily that you can ship a lean startup MVP on Twitter. See if anyone gives you feedback. I think it requires us to work really closely with the design partners we have and try and build conviction with them. And then, also, look for other maybe noisier signals around, "Hey, if we're pitching to a new client, do they seem really excited by this? When we give them access to it, where are they spending their time in the application?" 

But, yeah, I think it's hard to just apply the traditional Silicon Valley playbook here. I don't think that's been very successful in this space. But, yeah, it's a combination of all of those things. I'm sorry if that's a waffling answer but that's the reality for us.

[00:24:03] SF: Yeah. I mean, it doesn't feel like a move fast, break things kind of go-to-market or product-building experience. Actually, I want to talk a little bit about the company and your journey as a founder. The company's been around for a little over a year. 

And first off, speaking as a former founder myself, founder years are like dog years. How's it been being a founder of a company? What's next experience been like for you? 

[00:24:27] NLS: I mean, it's been good. It's something that I'm excited by. I have been working in this space for a long time. I care very deeply about this problem and seeing this problem solved. And we had the opportunity to raise a seed round earlier this year and build out a team. And so, that's been really fun.

I think it's also been hard as well. Biotech's going through a bit of a rough patch to be honest. There's been a lot of layoffs. And so, it feels the industry is moving especially slowly. I think the upside to that is it really forces us to build something useful. There's not a lot of kind of money floating around. And so, when we have conviction in something, it's going to be a much deeper conviction than we might have had in 2021. 

But I think it's been a whirlwind year. Dog years is a great way to describe it. I think it's been really cool to see the progress both in the team side and the product side. And we're never moving quite as fast as we want to. But I think you got to take both the ups and the downs. 

[00:25:33] SF: Yeah. I think that's probably something that every founder or anybody involved in an early-stage company always feels like they're moving quite as fast. Things always take longer. Or they feel like they take longer than they really should. But it's hard to build companies and build them successfully. 

I do think what you're saying in terms of there's not necessarily a lot of money floating and so forth. And you have to be really intentional about what you're investing in. Now the good thing about that is that I think some of the best companies in history always have come out of these downturns in the market. Because it does force you to focus and really figure out what's working and what's not working and be somewhat conservative maybe in types of investments that you're making. You raised a round earlier this year, a seed round. What is the size of the company today? And how are things structured? 

[00:26:20] NLS: Yeah. Right now, there's three of us. We have a fourth joining early next year. We're all engineers. We're all technical. It's a very flat structure. We all work together. Work in a pretty collaborative manner I would say. I spend a fair amount of my time doing the, I guess, less technical aspects of building the business, right? Doing the sales and product work. But we all sit in the same room. We're all in-person in San Francisco sitting around a table, writing on whiteboards and working together.

[00:26:54] SF: And then what's the actual tech stack behind the product today? 

[00:26:58] NLS: Yeah. We have a Next.js front end and then basically all Python on the back end for combination of ML and bio libraries. Actually, there's a bunch of scientific libraries written in Python. FastAPI for our server. We use Temporal for some of our longer-running job cues. And then we use Modal for our GPU kind of ML jobs as well as some kind of paralyzation of kind of massive jobs as well.

[00:27:27] SF: Fantastic. Yeah. Actually, I talked to the founder of Temporal recently on the show. 

[00:27:32] NLS: Okay. Very cool.

[00:27:34] SF: Yeah. It's a great product. It's great to see that you're incorporating it. Yeah. I would think like there's probably a lot of – when you're talking about data analysis and so forth, you're going to have a lot of these kind of traditionally asynchronous long-running jobs that become a nightmare to try to vantage through, I don't know, cron jobs or something like that. Right? 

[00:27:50] NLS: Yeah. No. That would be a nightmare. Yeah. There's definitely a lot that – I think one of the interesting things about the kind of biospace is that it is much more of a batch-processing world than a stream-processing world. And so, yeah, we do have a lot of kind of asynchronous, long-running jobs that would be a nightmare to manage in pure cron. And so, definitely need some sort of kind of task queue behind us. And Temporal has been nice in terms of a lot of the DevEx around retries and being able to deal with flakes as well. 

[00:28:27] SF: Is it hard to recruit engineering talent that can work in a biotech startup? 

[00:28:31] NLS: I think it's always hard to recruit good engineering talent. I think, yeah, it is challenging. I don't think as many engineers are open to the idea of working in the biospace as they should be. Our kind of philosophy here is you don't need to have a bio background in any way, shape or form to be valuable here. Most of the work that we do is just software engineering, right? And you need to be a good engineer to be productive. 

But what is important I think is to have an interest in the space, right? Because you're going to learn some biology whether you want to or not. And so, if you're not excited by that, then it's probably not a good fit. And so, I think that's like a good rule of thumb for any kind of vertical software company in this space is you generally don't need a bio background actually to be productive here. But you should be excited by the space and interested in it. 

And I think more engineers should be excited and interested in this space. It's kind of clear positive impact. Really interesting technical problems. And you can still build software, right? We're a software company. We don't have a wet lab or anything like that. At the end of the day, we make money by selling software tools.

[00:29:42] SF: Do you think that it's also in some ways like a relatively green field space when it comes to technology innovation from like a software sense? Essentially, these labs that are being run, are they using sort of the most cutting-edge software? Or are they a little bit behind when it comes to software innovation in comparison to maybe, I don't know, a SaaS company or something like that? 

[00:30:04] NLS: Yeah. I think in general, you're right. They're a little behind. It depends on the lab. There's some very technically sophisticated companies and labs out there who are using the cutting-edge tooling. But there's kind of far more biotech companies and labs which might not be using kind of any software tooling at all. 

A lot of the companies out there have very minimal kind of computational expertise is what we would say and might not have any software engineers on their staff at all. Maybe they have some PhD bioinformaticians who are working with them. But they're probably not thinking about software in the same way that a software engineer in tech would be. There's definitely that aspect of it. 

What I would say is it's not just about having the more modern technology though. You really do have to solve an important problem for scientists. I think scientists are amazing users. They're highly technical. They're working on really interesting problems but they're not generally the most enthusiastic new adopter, unlike a lot of developers. 

And so, they're great loyal users if you can prove to them that you're adding value. But it can be hard to convince them to try a new tool. They don't care that we're using Temporal or any new piece of technology. What they care about is are we solving an important problem for them? 

[00:31:24 SF: What does go-to-market look like in the space though? How do you reach those scientists? It's not a, I assume, buy some ads on Google and people are at the sides as they're writing their queries on Google to try to find a software solution to this. You probably have to meet them where they are and get in front of them in order for them to even understand that they can solve whatever problem they need to solve through something that you can offer them.

[00:31:49] NLS: Yeah. It's a pretty traditional enterprise sales process. We do try and talk to individual scientists at companies to get feedback on the product. That's important product feedback from us. But generally, the purchasing decisions of these companies are coming from the executive level. And so, that does lend itself to a much more traditional enterprise sales process. 

I'd say, for me, a lot of the go-to-market has been through my personal network and from working at these companies and kind of meeting scientists and engineers who have gone off to work or start different companies. And then, also, through a community that I started called Bits in Bio has allowed me to meet people at all sorts of different biotech companies. And so, so far, our go-to-market has been very network-driven I would say. 

[00:32:43] SF: Yeah. That makes sense. I mean, for the stage that you're at where you're also probably making a lot of these folks, design partners, so you can figure out what it is that you need to build in unison with hopefully solving a problem for them. 

[00:32:53] NLS: Yeah. I'd say the one company who's had something like a product-led growth motion in this space has been Benchling. They're kind of the most modern, big company in this space. They started about 10 years ago. And they gave away their kind of initial product for free to academics and were able to get a lot of usage and product feedback that way. But then if you look at them today or even a bit earlier, it is still traditional enterprise sales into these biotech companies. 

[00:33:26] SF: Yeah. Do you think that there's unique tooling that needs to be built for software engineers that are working on biotech software? 

[00:33:35] NLS: I would say there's unique tooling that needs to be built for scientists. I don't agree that there needs to be unique tooling for software engineers. I've been a software engineer in biotech. And we also use kind of best-in-class tooling. We're on Kubernetes and use Kubflow for some of our ML inference when I was at Benevolent. I brought in Dagster when I was at Octant. 

And so, I think the challenge of trying to build vertical-specific tooling for developers in this space is that they have access to kind of best-in-class tooling and can customize that tooling with code to help them on whatever the task that they're trying to solve. 

Now the one potential exception here is bioinformatics pipelines. And so, there's a number of players in this space who are building very specific tooling for bioinformaticians. And I think that's a reasonable area to build in. But in general, I'm a bit more skeptical of people building tools for bio developers. Because, oftentimes, just the best tool for that task is going to be enough.

[00:34:40] SF: You mentioned Bits in Bio, which is a community for people building software in science. What inspired you to kind of start that community? 

[00:34:49] NLS: Yeah. I was at BenevolentAI, which is a computational drug discovery company. It was a few hundred people by the time I last left. And very tech-heavy. We had a really strong engineering team, ML engineering team, ML research team. And so, I felt I had a community where I could go talk to co-workers about which bioinformatics pipeline should we use? Or what LIMS vendors are we excited about? 

And then I went to Octant, which was a smaller startup when I joined. I was the first software engineer there. We had two bioinformaticians. And didn't really have those people I could go to and ask those questions of. And so, coming out of the pandemic, I'd seen a few other Slack communities on the health tech space that I was inspired by. And I thought, "Well, why don't we have this in the biotech space?" 

Just posted on Twitter and said, "Hey, who's interested in this? I'm going to start a Slack community." And then people started joining. And it's kind of taken off from there. 

[00:35:47] SF: How big is it? What's it look like? 

[00:35:49] NLS: Yeah. We have almost 5,500 members now in the past two years. We've grown quite a bit, which has been exciting. And we're organized around an online Slack community. Everyone should join. It's bitsinbio.org. And then we also have in-person meetups in cities all across the world. I think we're up to 12 or 13 cities mostly in North America and Europe. But we've been growing that as well. We're likely to open up a Bits in Bio chapter near you soon enough. 

[00:36:20] SF: Fantastic. And is there one in the Bay Area, I'm assuming? 

[00:36:23] NLS: Yeah. A big one in the Bay Area. We do events about once a month, I would say. Anything from just kind of networking, meeting other people in the space, to panels, lightning talks, hackathons. Once you join the Slack, we have location-specific channels that people can join and learn about the events that are near them.

[00:36:43] SF: Awesome. And then as we start to wrap up, is there anything else you'd like to share? 

[00:36:47] NLS: I'd say you asked a question earlier about software engineers getting into bio. I recently wrote a blog post talking about how you don't need a bio background to get into bio as an engineer. And I think this podcast is a great audience to try and promote that to. There's a lot of open problems in bio. It's incredibly impactful. We're working with companies who are trying to develop life-saving drugs or develop new materials to help with the climate crisis. 

And so, I would encourage everyone who's maybe thought about bio or excited about bio to either reach out to me or reach out to friends who are in the bio space and figure out where their talents can be best applied. Because they say the 21st century is going to be the century of biology. And you'd rather join early than late.

[00:37:35] SF: Yeah. I agree. I think that we are sort of – it does feel like we're on the cusp of major, major impactful innovation happening in biotech. Whether it's drug design or some of the other things that are happening in computational biology. And I think it is a really exciting time to be in biology even if it might feel like there's some resource and constraints going on. I don't think that's going to last, especially with everything that we're seeing from the world of gen AI and the potential impact that it can have on some of the hard problems that exist in biology.

[00:38:07] NLS: Yeah. I mean, super excited by gen AI as a process enabler, right? We talked about extracting semi-structured data. And then, also, the models. Those model architectures translate to bio. That's the kind of like folding playground that we were talking about earlier as well. It's definitely an exciting time to be in bio. And, yeah, all those advances we see in the tech industry are going to make their way to bio and accelerate that process as well. 

[00:38:35] SF: Awesome. Well, Nicholas, thanks so much for coming on and talking about your work at Sphinx and being an early-stage founder. I'm sure you have a million things going on. I appreciate you taking time to share your journey and share your vision for the company. 

[00:38:47] NLS: No. Thanks for having me. This was great. Always happy to chat.

[00:38:50] SF: All right. Thank you. Cheers. 

[00:38:51] NLS: Thank you. Bye.

[END]