EPISODE 1786

[INTRO]

[0:00:00] ANNOUNCER: QuantStack is an open-source technology software company specializing in tools for data science, scientific computing, and visualization. They're known for maintaining
vital projects such as Jupyter, the Conda-Forge package channel, and the Mamba package manager. 

Sylvain Corlay is the CEO of QuantStack. He joins the podcast to talk about his company, Conda, Mamba, the new Mamba 2.0 release, software supply chain security, and more.

Gregor Vand is a security-focused technologist and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cybersecurity, cyber insurance, and general software engineer companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk.

[EPISODE]

[0:00:59] GV: Hi, Sylvain. Welcome to Software Engineering Daily.

[0:01:02] SC: Hi, Gregor. Thanks for having me here.

[0:01:04] GV: Yes, very exciting to have you here today, Sylvain. I think a lot of the listeners will know quite a bit about you and the projects you work on, and you might not need any introduction to them, but equally, will have a lot of listeners today who know nothing, which is also exciting and you get to talk to some completely new people in terms of what we're here to talk about today. Generally, speaking, it's Mamba and through the company QuantStack.

I think the main thing is just to set the scene here. Actually, just to ask what is, you're the CEO of QuantStack and based in Paris. What is QuantStack and what's the relationship to Mamba? How did it come to be, I think, a key maintainer of Mamba? Maybe just start there?

[0:01:46] SC: Sure. So yes, so QuantStack, it's a team, mostly, it's a team of open-source maintainers of key projects of the scientific computing ecosystem. So, some of the main projects that we are active in are Jupyter. We're very active in the Jupyter project. The team comprises over 10 people working full-time on the project and we've been some of the main drivers in the recent innovations in Jupyter such as collaborative editing, the visual debugger for JupyterLab, the new version of like the new flavor of the Jupyter Notebook that came out recently. So, Jupyter has been one of the main projects that we've been active in the past years.

We're also very active in the open-source package management ecosystem for science, mostly with a Mamba project that we're going to talk about today in Conda-Forge. Finally, there is a new chapter that we've started actually a few weeks ago with the Apache Arrow project in the wake of the recent layoffs at Fortune Data. A bunch of maintainers of Apache Arrow showed up at PyData Paris and they told us that they were looking for a new home. So, we were studying this new chapter, which is very exciting. 

So, QuantStack, more than a team of open-source developers. We're not a startup. So, we operate under more of a service consultancy model, where people and companies that depend on these tools in their operations or in their products contract us out to do bug fixes, maintenance, and sometimes add new features.

It really started as this form of self-employment for myself and a couple of others. So, at the very beginning, we did not have any sort of strategy for growth. Nearly, all of the business was inbound. This was like this for a few years. Eventually, it became a bit more deliberate. Now, we are a team of about 30 people. The team is not just in France. We have, like the biggest group is in France but about for half of it and then we have significant team in Germany and also folks in Austria in the UK and Spain now.

[0:03:59] GV: Awesome. Just to sort of clarify, has it always been sort of quite scientific leaning in terms of the focus or did that come out of sort of a project?

[0:04:10] SC: It's always been indeed focused on sciences since the very beginning. Just from our professional backgrounds and the projects that we were focused on.

[0:04:18] GV: I think that's a good sort of just distinction to make in terms of where any of this has come from. So again, for anyone not familiar with any of these sort of companies or packages that we'll be talking about today. We're here to talk about Mamba and it has a sort of intertwined relationship with some other things like Conda-Forge, Anaconda. Do you want to just maybe speak at a high level to kind of what the interrelationship is between them?

[0:04:45] SC: Yes. So, assuming that people don't necessarily know what Conda is, maybe I should define it. So Conda is a general-purpose package manager that works on multiple platforms like Windows, Linux, OSX, and it's very popular in the scientific computing ecosystem. So, maybe I should better define what it's not because there is a lot of confusion about it.

Conda is not a Python package manager and that it's not a package manager for the Python programming language. It's more similar to YAMM or DPKG like you know APT get and the RPM. The classical Linux package managers. What is installed is binary packages and already pre-built assets. It's different from Linux package managers in that we can create multiple software environments in multiple locations on the file system, and it's cross-platform.

[0:05:43] GV: Yes. Again, sort of in the context of a lot of science-based projects, is it fair to say that they're often working with multiple environments and actually that could be using quite different packages or languages, and that's sort of part of the inspiration behind all of this?

[0:05:59] SC: Yes. So, came out of the Python scientific computing community, even though it's not a Python package manager. It started from the observation that a lot of the popular Python packages were actually built upon Fortune or C++ code bases for efficiency and had a thin layer of Python for usage in the Python interpreter. These posed significant distribution challenges. This whole thing started before Python had a real format for distributing packages that would embed binaries in the Python packages.

This was the reason of the project to really enable a better story for distributing binary packages. The other thing is, I think the notion of environment is really key. I mean, people use Docker as some kind of way to bundle a bunch of stuff together that you can easily distribute. But it's still more of a way to distribute something where there could be a lot of mess. Package managers are really key in distributing software in a way that's reproducible, in my opinion. Reproducibility is also another key problem in scientific computing and in science in general. So, being able to switch back and forth, like if you have, let's say, a Jupyter Notebook that you've written maybe like 10 years ago, which does some number crunching and produces a few plots that you use for a scientific paper. How do you run it today? How do you reproduce the set of packages and datasets that you were using to produce these plots? This is one of the reasons why having a strong environment story for creating such bundles of packages is really important in scientific computing.

[0:07:57] GV: Yes. It makes a lot of sense. So, I feel we've got a lot to cover today. We're going to dive straight in. Mamba 2.0, I believe, has not that long been released. What would you describe as kind of the big changes there? I mean, I guess Mamba generally is written in C++, is that correct?

[0:08:18] SC: So, Mamba is really meant to be a drop-in replacement for Conda, initially. And it started as, so basically when the Conda community grew, there is this community channel of packages called Conda-Forge that is starting to overgrow the rest of the ecosystem and had tens of thousands of packages. The way the Conda solver was devised was actually crumbling under the weight of Conda-Forge. Mamba was created by Wolf Vollprecht who was an employee of QuantStack at the time, as initially just to hack. Basically, I'm going to delegate the solving of environment and the dependency resolution to another library written in C++ and used Conda for everything else, which was still fragile, but proved very promising.

So, our first approach was to reach out to the Conda project, Anaconda Inc., and tell them about it and ask if they would be willing to fund QuantStack to make this a reality. At the time, they were not so interested yet in Mamba, so we decided to make it its own thing. What Mamba is today is really an alternative to Conda to install the packages of the same ecosystem that's fully compatible with Conda, even supports the same common line options.

But unlike Conda, it's written in C++, which really helps with speed in some areas. There is another flavor of Mamba that is also very popular called Micromamba. And Micromamba is a single statically linked bundle of Mamba. It's essentially the same code base, but with different linkage. If you want to install Micromamba, it's just four megabytes. It's just four-megabyte download while an installer for like a basic installation of Conda requires a Python interpreter and a bunch of dependencies in the space environment. And download sizes are in the tens of megabytes or maybe 80 or 90 megabytes over hundreds and some platforms.

So, Micromamba has also proven very useful in CI workflows where we want to bootstrap an environment very quickly and fully self-contained and can be used to create Conda environments from scratch. 

[0:10:48] GV: Yes, exactly. So, number two release in just the last couple of months, is that right?

[0:10:52] SC: Yes, that's right.

[0:10:53] GV: Okay, so talk to us about that.

[0:10:55] SC: Yes, so Mamba 2 is, first of all, it's the result of a very boring refactor of Mamba. Really, Mamba was, I would say that the Mamba first major release was sort of the result of a rush. Basically, there was such demand for it in the community that we were rushing to cover the features of Conda. It was entirely built to be used as a common line utility. But then some people started using it as a toolkit. They wanted to use the internal components of Mamba, and these were not necessarily written in a way that was very solid.

Because if we assume that people use it as a common line utility, we can make all kinds of assumptions that are not necessarily satisfied if you are making a toolkit or a library that ought to be used in web services and be threat-safe and whatnot. So basically, Mamba 2 is in many ways almost a rewrite of Mamba in a more deliberate software engineering, with a more deliberate software engineering bottom-up approach.

There are a bunch of new features, though. First, as of Mamba 2, we can specify mirrors for package channels, which is really important for things I'm going to talk about later. This will soon be utilized in popular installers, such as those of Conda-Forge and others. Also, we support more protocols for downloading packages. One key protocol that we wanted to support and is enabled in Mamba 2 behind an experimental flag is getting packages from OCR Registries and other cloud storage solutions. 

[0:12:35] GV: Gotcha. So, let's dive into one of the key topics I think would be great to cover today. I'd be curious, I think the thread throughout this would just always be what has, or maybe there's been no change, but what has number two maybe brought to this in any different way over version one. But let's talk about vendor neutrality effectively. What can you say on the basis of like, for example, what is, let's take Mamba specifically for now, and then if you would like to branch out into any other areas, like what's going to preventing a single organization? I guess in this case, it might be QuantStack, for example, dominating the project's direction, like what would you say to that?

[0:13:19] SC: Before we get to Mamba, specifically, I think just wanted to take a step back and talk a bit about more like why open source at all. I think there is a specific reason in the case of scientific computing that is not necessarily as present for other areas of computing. And that, in my opinion, there would be a very deep contradiction for physicists for example, to try to understand nature with a tool that they don't have the right to understand. So, this contradiction is the key reason why scientists long before the open source movement was a thing, we're sharing code and sending descriptions and in-depth documentation of their code to each other.

This is really the origin of the World Wide Web, which was studied at CERN and started by a physicist. Then if we look at more recent events and the reason why the Python Open Source Scientific Computing Committee is so big is that it was also built as a reaction to the scientific computing world being held by a few corporations that were imposing very high-cost licenses for computing tools. These costs were preventing people from certain countries from engaging because they couldn't afford the licenses. They also prevented, in some cases, students from using the tools and so they were really building a world garden around scientific computing.

So, everyone in this field has this atavic sort of need for openness and making sure that we are not giving ourselves up to one corporation. Distribution of packages and package management is really one of the key areas in which having one actor dominating in how you distribute and install packages can be very harmful. Obviously, here in the case of Conda, we're talking about Anaconda Inc., which in many ways is a company that I really look up to and has done great achievements in this area, making scientific software more usable and accessible.

But Conda has challenges with respect to vendor neutrality. So, obviously, if you check out the Conda documentation, it's pointing to the Anaconda distribution and channels. In some areas, Anaconda channels are hard-coded in the Conda code base. One more, I would say, problematic thing is that there are some root cryptographic keys that are also hard-coded in the code base that are used in package signing protocol, use the update framework, which is some kind of extension of - it uses asymmetric key pairs for signing packages and ensuring their authenticity. But at the moment, only Anaconda Inc. can sign packages that could be verified with a Conda client.

So first, Anaconda Inc. and the Conda community, they have taken steps to resolve this. Some, for example, some of the documentation has been fixed recently. There are open issues and maybe PRs open as well, removing the hard coding of Conda channels. But I think in the end, the elephant in the room is the branding and identity of the project, which is that Anaconda and Conda really share a non-trivial part of the name. Like the visual identity and the logos of the company and the project are very similar. Even if they really wanted to solve this, they are really facing a difficult situation.

So yes, this is one of the main reasons why just the existence of Mamba is important because it allows projects like Conda-Forge to potentially rely on another implementation of the Conda content trust protocol. In case of takeover or failure of the company, we have fullbacks that we can also use. Without even getting into the technical benefits of Mamba, that people could disagree with.

[0:17:40] GV: Yes. So, I think talking about, I guess, you've laid out a very clear reason why there are aspects of Conda-Forge plus Anaconda that their approach, it can work, but they're just various obvious drawbacks if you just take a sort of at face value in terms of then how Mamba has approached this. If we think about things like community engagement, and I'm very curious about what policies like how have you set out policies in terms of ensuring contributors regardless of affiliation, for example, have equal opportunity, which I think is kind of what you're getting out there. Maybe then just a follow on from that is things like the signing of packages, what policies are then around that if it's not this one entity who can effectively design their own policy around that and you can use it or not. Where has Mamba gone with that?

[0:18:33] SC: So, it's not as much as who can put code into the code base as much as what's in the code base in terms of hard coding of channels and hard coding of keys. I think these two things are really important if we want people to be able to, for example, start a new community channel and have a key signing ceremony for starting a well-devised software supply and security policy for the channel and then use Mamba. There is no place where we would prevent this from happening at the moment. 

Yes, I think that's the main thing. For example, if you download the main Micromamba binary, everything can be overwritten, and it's not hard-coding any channel by default, any package sourced by default. So, you have to configure it and choose where you're going to get your packages from.

[0:19:30] GV: I think an obvious kind of place to go next is really supply chain security. We've covered a couple of products and frameworks in the past on this topic. But Mamba, I believe, this is kind of a core tenet as well of what Mamba is supposed to be helping with. So, maybe could you just speak a bit to sort of what does it bring in, in that way? Again, if there's any, I guess, comparisons, effectively with Conda-Forge or otherwise as to what is different and why, I think that's very interesting to understand. 

[0:20:05] SC: So, broad subject, so Conda, supply chain security. One thing that is currently in the Mamba codebase is an implementation of the Conda content trust protocol, but in a way that would allow anyone to provide their - install the public keys on the system so that we could check packages. It's going to be really important for a community channel like Conda-Forge to be able to sign their packages in the future as it's increasingly becoming almost a regulatory constraint. So, with the recent laws that were enacted in the US and also coming to the EU state agencies, won't be able to use packages that are not implementing these kinds of good practices.

As a consequence, it's going to percolate in the entire industry. Preventing Conda-Forge, which is the de facto main source of packages for scientific computing from sending the packages, I think can be really harmful. That's why I think having an independent implementation of the current protocol, like without even getting into any kind of innovation, is really important. It's really bound to vendor neutrality. 

Getting back to this question of vendor neutrality, actually, I think one thing I didn't do is, I think that we have a path forward that should probably allow everyone to continue operating in a way that's satisfactory, including Anaconda and including Conda-Forge and everyone, and would also resolve this, this kind of branding issue around the Conda project. Because what I really think is that Anaconda is really trying to resolve the issue of the situation. There is knowledge that there is a problem at the moment. So, they have opened up, Conda for more community-led governance. They have transferred over the Conda trademark to the non-focus foundation and like a proposal for the future would be to actually rather than trying to have a Conda project become more open, but still be bound by this kind of weird situation with the name is to create a broader organization that would encompass Mamba, Conda, but all the clients like Pixie, and that would not be named after either of these projects, and Conda would just be one of the members of this broader community and this community should be built upon common standards and have an open governance for designing what the future should be.

[0:22:46] GV: I mean, I believe, at least at the moment, you have like bi-weekly dev meetings that you actually host for the project that people can come and be a part of, is that correct? Does that sort of play into what you've just been talking about? Is that almost like a grassroots thing there, where how can the people that have the same vision can come together and not just talk about that, but is that a sort of forum for talking about this as well?

[0:23:13] SC: So, Conda has their own sort of governance and regular meetings, and they also now have their really nice system for proposing changes in Conda with the CEP process. We also host public meetings for the Mamba project. Conda-Forge is its own thing as well and is really focused on the tools for building packages, but obviously they have a vested interest in the tooling. So, yes, this is why my take is we need to actually acknowledge all of that and have a broader community movement and to not continue with the snake-based names. We should call it something else completely, like not Rattler, not Conda, not Mamba. Maybe call it, I don't know, sci-kit packaging or whatever, like something that really conveys the idea that we are about package management in a generic way and we have these scientific routes.

[0:24:18] GV: Yes. I mean, this is obviously a bit of a sidebar, but I was kind of curious here why Mamba. But just based on, I mean, I think, obviously, at the beginning of the episode, you said you had approached Anaconda to suggest this concept. But at the same time, what was then the thinking to continue the snake naming if the idea was to have something that kind of didn't see it completely opposed to, but is meant to represent a different direction?

[0:24:46] SC: So first, in the very beginning, we weren't really thinking too much about it. Like Mamba was probably the result of a name search for fast snake. Speed was the main reason of why it was started. We were just continuing the series of snakes in that ecosystem around Python and Conda and whatnot. Since it was just a demo in the very beginning, we went to Anaconda as a natural outlet for this demo, and that as a potential client that could fund this work. Then we continued working on it, but outside of billable cycles and as a side project for pressed at QuantStack and a bunch of others. And eventually, the problems that Conda was facing became key for some of our clients and we managed to put some Conda and Mamba-related deliverables in some client contracts and it became a thing.

So, it's only recently that we realized that we needed to be more thoughtful about it and make it have this standing in the community that, "Oh, maybe this should be a thing." It should be part of a bigger thing that includes Conda and Pixie and whatnot.

[0:26:08] GV: So, just hopping back in from that, I'd like to just touch on the supply chain security bit a little bit more, and then we're going to switch gears a bit to WebAssembly because I think that's an interesting place to go and leave this stuff behind. But yes, I mean, just in terms of the software supply chain side of things, again, what can you kind of speak to in terms of, I mean, for example, I'm more familiar with, for example, node package manager and NVM and all of that kind of ecosystem and sort of understanding what policies and what sort of decisions have been made around that to by no means fix a lot of problems, but at least enhance certain issues that have come up in the last couple of years. So again, what is Mamba doing to that end?

[0:26:50] SC: Yes. So, one of the features that actually came with Mamba 2 was the support of package mirrors, which is great from a vendor neutrality standpoint, right? But it actually brings more challenges with software supply chain security, and that if you have a network of mirrors, could there be a bad actor? Could there be a mirror that is compromised? And there is a number of approaches to be a bad actor in a network of mirrors for package tunnel. This is actually, sort of by chance, a good reason why the Conda content trust protocol, which is based on the update framework, is a really good fit because it really addresses some of the key challenges that could happen in software distribution. 

For example, how could someone managing a mirror freeze their packages at a certain date before a security update? All of the packages that they host would still be legitimate. Everything would be rural. But how can we prevent that so the same signature should presumably work, right? So, tough addresses this by requiring some kind of cryptographic heartbeat, and that when you try to get a package from this channel or it will download cryptographic key that actually has expired unless the root origin of the package sources has resigned it for this content, it's going to not consider this content as valid.

So, there is a number of attacks that could be done on a network of mirrors that Conda content just really addresses well. That's why it was important for us to implement it. Now, I really wish that we could fix the upstream code base in Conda so that we can install alternative keys for Conda-Forge and other channels, and then we will be able to enable package mirrors for these key channels used by the community.

But really, distribution is just one part of supply chain security. There is another entire field that Mamba is not addressing at all, which is reproducible builds and guaranteeing that the content of the package that is shipped is legitimate, which is also a concern for many organizations. So, we've worked with companies that build their own Conda-based distribution from source and don't get any binary from the Internet, but use effectively Conda-Forge as some kind of Wikipedia of how to build stuff. Because it's actually really hard to build the entire scientific computing stack from scratch. So, Conda-Forge is a really good source of information at this.

[0:29:38] GV: Yes, I think we had an episode on that not that long ago with a company, Chainguard. That's sort of their goal is reproducible builds, containers that they build from source. I don't know what, if any, interaction they have with Anaconda, Conda-based packages at the moment. But yes, just if any listeners are interested, just in that pure topic, generally. That's maybe an episode-to-go to.

What would you say in terms of other plans to deliberately work with more, whether it's open-source projects or companies that are aiming to go this direction on reproducible builds? Is it a major concern at the moment?

[0:30:17] SC: Not specifically reproducible builds because we don't have funding for this. It's a hard enough problem that we can't just hack our way around it. We're talking about making sure that the packages that are uploaded on channels have been produced by the people we think they are produced by. So, really signing at the build time and uploading them to Conda package servers.

[0:30:42] GV: Let's switch gears to WebAssembly, kind of exciting that there's quite a large amount of support now for WebAssembly from the Mamba ecosystem. Could you maybe just speak to what inspired the decision to add this support and what has been that journey of evolution for this support and what's being produced as a result of this?

[0:31:04] SC: So, this is probably the thing that I'm the most enthusiastic about these days, in my work overall. It all started from a grant proposal. So, we were actually writing a grant for the French government because, we're French company to develop a GP2-based platform for secondary education, so high school education in - get kids to learn Python.

Around the same time, a group of high school teachers here in Paris worked on an interesting project called Baston, which is a pun in French, which kind of sounds like Python, but it's a slang word for fist fight. Baston was a sort of fork of the classic Jupyter Notebook, but instead of calling out to a server for executing the code that you would type, it would use this Python distribution in the browser called Pyodide. So, this whole thing started in 2019. And they got Baston to be in a workable state,
basically, and they deployed it for the first school district, and it worked really well. So, other districts in the country started showing interest. They signed agreements with the other districts and progressively they expanded this project to the entire country.

So, the thing has matured a bit, became more solid over the years. Now, this project, the deployment is called Capital. Capital has half a million registered users and they have over 200,000 sessions per week, user sessions per week. All of it is entirely searched from one machine. So, how come? The main reason is that we, by using WebAssembly, by running user code in the browser, we can become free of having to run, have a Docker image running in the cloud for each user session, right? So, this scalability is crazy. Just to give a comparison, UC Berkeley runs a data science class, of course, called Data 8. It's a really big one. I think they have of 10,000 registered students. But there is a team of DevOps engineers that operates the Kubernetes-based deployment of Jupyter and delaying this. It costs over $100,000 per year to run in cloud compute.

So, there is a team of DevOps engineers, and then there is significant hosting costs for allowing this, which is essentially having one Docker image per user session. Now, if you don't need this, all there is on this server hosted literally in the basement of high school here in Paris is a content management system for the user in the books. And that's it. So now, if we start making multiplications, you realize, "Oh wait, France only has so many high school students. It's not a very big country." Not all of them learn Python anyways. But if you consider a bigger country like Nigeria, they have over 200 million people at the moment. But the forecast is that the population is probably going to grow by at least 100 million in the next 25 years. Most of these kids are going to go to high school. Presumably, in the 21st century, they will want to learn programming.

At this scale, is it even feasible to have a community-based deployment of Jupyter to learn Python? I'm not so sure, right? And if Nigeria wanted to do this, they are not the home of Microsoft or AWS or Alibaba, they would probably need to rent that space on someone else's cloud, right? While, if they use a system based on what I just described in WebAssembly, they will be able to host the platform in a sovereign fashion.

So, to me, this is really enormous because this model can be used to teach programming to billion kids. It just works, right? Because basically, run everything on the browsers of the end user, right?

[0:35:27] GV: Just a bit in context, what is, just to take super basic points here, what is like a very base level of machine needed to run that just in a nice way? On the client side, that's what I mean, yes.

[0:35:40] SC: Well, it depends on what you want to run, obviously.

[0:35:42] GV: The example you gave with, for example, the high school kids learning Python.

[0:35:46] SC: Okay, so what kind of Python is a high school kid going to write? They are going to learn how to compute the greatest common denominator. The complexity of that code is probably lesser than the rendering of the UI. For very basic things, we don't need much. You can already expose this kind of interactive computing environment for them. So, problem of capital, and this deployment that I talked about earlier, is that they forked the original classic notebook from 2016 and this codebase obviously is not maintained anymore and poses many challenges in terms of accessibility, in terms of security. So, this was the reason why we proposed to build a Jupyter Lab-based solution that would also use this new Jupyter Lab based notebook and we called it Jupyter Lite. As soon as we started the Jupyter Lite project, it also grew in adoption very quickly.

Now, if you go to numpy.org and scroll a bit, you will find a Jupyter Lite console. You can try NumPy in the browser and there is nothing running in the cloud. If you go to the scikit-learn documentation, you will find "Reenable code snippets" and now they are going to run the MOOC on Jupyter Lite as well. If you visit the SimPy project documentation, you will also find a console there to try out SimPy in the browser.

After we developed Jupyter Lite, we realized that we wanted to start expanding a bit of what you can do in the browser. And Pyodide is really meant to be about Python and the Python packaging ecosystem and we wanted to do a lot more. We started this project called Emscripten Forge which is a distribution of Conda packages for the browser that goes way beyond the Python ecosystem.

One thing, for example, is that we recently released a JupyteLab terminal with an emulator of bash that runs in the browser. Then you can start typing bash commands like grep, sed, touch, cat, less, whatever, and see it reflected in your file system and address these files in notebooks running in your browser. There is another ongoing project to build our packages for WebAssembly and all of this using the same Conda-based and Mamba-based package manager.

Even beyond education, I think this is going to be really important for scientific publishing and long-term reproducibility. For example, today, like any binary that runs on your computer, like the ARM or X86, is probably going to require some kind of emulator to run on a machine in 20 years, while WebAssembly is a web standard. So presumably, WebAssembly binaries should be runnable by web browsers in 20 years. So, what I think is that a bundle of a Jupyter Notebook and a bunch of WebAssembly packages and small dataset, all serves statically at a given URL is like a time capsule. It's like a website from the nineties that we can still see today.

So, this time capsule has a research paper, doing some number crunching and data analysis and some discovery could still be runnable in 20 years. This, to me, is a real revolution. So, we're trying to push as much as possible the boundaries of what's possible in the browser.

[0:39:18] GV: Yes. It's a really fascinating way of looking at it. I mean, at the end of the day, the browser has kind of become almost the OS of the average user these days, generally. I don't - most, I would say, unpower users are basically just working through the browser for most things. But that said, there's still a way to go for what could in theory be run in the browser, and obviously, you've given this great example of JupyterLite and made possible by Emscripten-Forge. I guess my question is, Mamba as a whole, is there going to be a fork in the road where if the WebAssembly side of things is, I don't want to say, it takes off, but you know what I mean. Is there going to be a point where you have to actually make a choice between what the focus of Mamba is potentially?

[0:40:05] SC: So, this refactor that I spoke about earlier where we tried to make Mamba more of a toolkit is really paying off here because we are writing JavaScript bindings to some of the components of that toolkit so that we can actually do the dependency resolution in the browser and download things from CDNs from the browser directly and you create counter environment in browser. 

So yes, although in our team at the moment, there are probably more people working on WebAssembly focused than on the core C++ code base of Mamba. But for obvious reasons, whether the initial lift is really significant, and there is a lot of code to be written for pieces that are just simply missing.

[0:40:56] GV: Gotcha. So, I mean, we are slightly cruising towards the end of the episode today. I think just in general, it would be, I just want you to have a sort of a platform to be able to sort of speak to developers here as much as being able to sort of speak on behalf of Mamba and QuantStack generally. I mean, what would you say, what's the most important thing that you think people should know about your vision for the future of package management in terms of picking up Mamba and versus anything else or just generally?

[0:41:29] SC: For me, this WebAssembly story is the crossover between Jupyter and our effort on package management. Just having this idea of providing a platform that can be used to teach programming to billion kids potentially and that will just work at this scale. If I have the opportunity to do this, I think it's probably the greatest opportunity, a thing that I could do professionally in my life. So, I'm taking it. I want to try it, right? Maybe there's going to be another platform that's going to be that. But I think we have a shot. So, I want to do this upholding the principles that we laid out in the very beginning about the open source movement and the fact that the tools should be opened and openly governed.

So, join us. That's the message to the people. I think there is something happening now in the package management ecosystem as well as in the Jupyter ecosystem that's really important. And I've read somewhere that the, somewhere that there was a survey, I'm not sure. They were trying to estimate the number of users of Jupyter and the answer was probably in the order of magnitude of 10 million people in the world, which I think is probably fair, like certainly more than a million and it's probably not 100, right? So, like 10 million is probably not a crazy number. If we ever get to 100 million, it's going to be with a tool like JupyterLite. That's what I think. And yes, we need help.

[0:42:57] GV: Yes, well just off the back of that. Where's the best place to go? What's the best sort of in-road for someone to come? When you talk about joining you like where should they start?

[0:43:06] SC: We have a number of easy fix issues in the relevant repositories in GitHub. Both Jupyter and Mamba have public meetings that you can find reference references to in our websites. So, yes, it's easy to engage. But just a pull request. Fixing the tiny usability thing that you find annoying when using it is already super welcome.

[0:43:34] GV: Awesome. I mean, I think that's just great advice for anyone looking to get interested in open source full-stop. But I think, you've heard it from the source today, which is, it's a very welcoming community, it sounds like, and exactly just lend a small hand and then who knows where that can go. So, Sylvain, it's been fantastic to have you here today. I think, again, some quite meaty topics, and obviously Mamba is doing some pretty huge things. Obviously, there's always going to be some different groups in every area of tech and everyone has their different approaches. I think it's always great for our listener base to get to hear from those. They might have maybe seen some online discussions or even just they've really read a pull request or have issued kind of thread and now we're going to hear from the voice. So, I think it's been a really valuable discussion.

[0:44:23] SC: Thanks. I want, before we leave this to make a shout out to Wolf Vollprecht. So, Wolf is a former employee of QuantStack and he's the person who started the Mamba project while he was at QuantStack and did a lot of the initial work. Now, Mamba is maintained by a team of four or five people working on the code base, and Wolf actually is still a big driver behind this broader community. He founded a company called Prefix dev, which has the company behind Pixie. It's a set of tools built in Rust for addressing other package management issues, and also seeks compatibility with the Conda ecosystem in terms of package format. I would say that Wolf in the past few years, not just with Mamba and also what's going on with Pixie at the moment has been one of the main drivers for change and innovation in that space. Maybe that's another message for people that should really follow what's going on there.

[0:45:22] GV: Fantastic. Yes, well, thanks for calling that one out. And I hope we get to catch up again and who knows, maybe a year's time or something like that. We'd love to be seeing where things are going, especially on the WebAssembly side. That just sounds a very exciting place to kind of the epicenter. So, yes. 

[0:45:40] SC: Thank you, Gregor.

[0:45:41] GV: Thanks so much.

[END]