EPISODE 1679

[INTRODUCTION]

[00:00:00] ANNOUNCER: The InterPlanetary File System, or IPFS, is a peer-to-peer network that uses a distributed and decentralized model. Functionally, IPFS allows users to store and share files without having to rely on a single source of truth for those files. 

Matt Ober is the co-founder and CTO of Pinata. He joins the show to talk about IPFS and Pinata. 

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book Architecting for Scale is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. 

Lee is the host of his podcast Modern Digital Business produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com. And see all his content at leeatchison.com.

[INTERVIEW]

[00:01:14] LA: Matt, welcome to Software Engineering Daily.

[00:01:16] MO: It's great to be here.

[00:01:17] LA: Let's start out with some more details about IPFS. I'll be honest, I'd heard of it before. But I didn't know a lot about it. I'm sure other people who are listening in, there's some who know it very well and some who are just hearing about it now. Why don't you give a quick summary of what IPFS is? 

[00:01:37] MO: Sure. Yeah. I would be happy to. At its core, IPFS is just a file transfer protocol. Similarly to other protocols out there on the internet, it's a way of exchanging information back and forth between multiple parties. In this case, it's designed for files specifically. 

Where IPFS shines is it tries to tackle the problem of being able to verify that the content you are expecting to get is the content that you're actually wanting to get. The classic example that we like to use is think of your classic S3 bucket that many software engineers will be familiar with or even a Dropbox link. 

When you upload something to that, you're going to get back a link. In our case, we'll use example.com/dog.jpeg. You would expect that to be a dog. But the owner of that bucket could at any time return say a cat. Or if we're getting a little bit more malicious, they could upload something nefarious. And that's not what we want. We want to be able to download exactly what we're looking to download. And IPFS looks to solve that problem. 

What it does to solve that problem is it utilizes something called content identifiers. We talked about the dog.jpeg link. In IPFS, it's a little bit different. Instead of dog.jpeg, you're going to actually run your file through a hashing algorithm. And there's a few that are supported. But you're going to get back a string of characters that represent that file that you just added to IPFS. And this is what's called a CID or a Content Identifier, if you want to use its full and proper name. And this is deterministic. 

If you upload that same file again and again, you're going to get back that same CID again and again. And on the flip side, when you're wanting to retrieve content from the IPFS network, you don't ask for it via a url/dog.jpeg. You ask for it by its CID. And there's a few different ways that you can retrieve content on IPFS. And we can certainly jump into that. But at its core, core that's really the selling point for IPFS is you get what you expect and you're able to do it in a peer-to-peer, I'll use the word distributed, way to accomplish those goals. 

[00:03:57] LA: If we try and break that up, we see a couple things in there. One of them is the naming aspect. And so, rather than using a domain name and a file or a standard URL syntax, which is essentially just a symbolic link to where the content's located at. That could be changed at any point in time to some other content without you knowing it. Instead of using that sort of an algorithm, you use the hash, essentially, of the file. Which means if the file is ever changed, it ends up with a new ID. And so, therefore, it's a different file on the internet. Is that a safe way to say this? 

[00:04:32] MO: Yeah. Spot on. 

[00:04:34] LA: That's the same sort of process that Git uses. Correct? With source code for storing files. Don't they store files and using the hash name of the object and refer to files that way? I don't remember the details there. But it's very similar to how Git uses within its repositories. Does that sound about right? 

[00:04:51] MO: Yeah. IPFS itself right is, again, just a protocol. How the people interacting on the IPFS network actually serve and retrieve those files, that's IPFS. But how they store those files is really kind of up to them. And you'll see this through a lot of different IPFS implementations that exist out in the ecosystem. 

There's a very prominent Golang-based implementation which is called Kubo. There is a JavaScript-based implementation called Helia. And there's a few more other ones under the hood, mainly one called Elastic IPFS, which is a more cloud-based, natively horizontal, vertically scalable implementation that's built on a little bit more cloud primitives versus something that you would run natively on a desktop. But all of this work together in this kind of harmony where they're all serving and retrieving files together. And then how they choose to store them on their individual systems kind of up to them. Really, the important thing is, is if somebody's asking for a CID, whoever has that CID is able to serve it to them and kind of the IPFS language so to speak. 

[00:06:02] LA: Let's talk about that. One of the aspects is that not all nodes have all the content. And so, some content is on some nodes. And it might be multiple nodes. But it's not on all the nodes within a network. What's the process that's used for taking a CID? And from that, figuring out the closest node or a node that has that content and can get the content for you? 

[00:06:27] MO: Yeah. It's a great question. You touched on something that I think is kind of important that I'll touch on before diving fully into it, which is that not all IPFS nodes on the network have the piece of content at the same time. And this is very different than blockchain-based systems, which you'll typically see used hand-in-hand with IPFS. A term I often use is I say IPFS is the peanut butter to blockchain jelly. 

What I mean by that is blockchains are really good at timestamping data. But they're never good at doing that with a lot of data. And then on the flip side, IPFS is very good at having a mutable, you know, verifiable data, which is another thing that blockchains have as well. but it's much better at scale, right? And then the downside to IPFS is that it can't do time stamping. If you combine these two systems together, it creates a very powerful combination to kind of serve a lot of these Web3 applications. 

Then getting back to your initial question here, you say not everybody has the content on the network. How do you find it? It's a great question. IPFS at its core was based on a Kademlia-inspired DHT. For those of you that aren't familiar with the DHT, it's what's called a distributed hash table for the longer term. 

And kind of breaking that down to a very simple way of thinking about it is a DHT is a bunch of peer-to-peer participants that are all kind of storing broken address books, if you will, for content that may exist on the network. Not everybody has the full address book. But if you ask around enough, you'll probably find somebody that does have the piece of content that you're looking for provided you ask enough people. Or if that piece of content is very well advertised on the network. That's the main way that IPFS nodes have historically advertised content. And, vice versa, people have been able to find content on the IPFS network. 

Now, more recently, in the past year or so, there have been other mechanisms for discovering content. Namely, what's known as IPNI. Or cid.contact is an implementation of this. But you'll see kind of larger scale traditional key value lookup systems that are run by more centralized parties. But which do allow far more scale of announcing and then discovering content on the network. 

The way I like to think about that is very similar to how IPFS operates without content discovery is that things can be as decentralized or centralized as you want. And, often, it's a mix between the two. And you can have multiple systems working together to kind of make a balance of both decentralization distribution and also speed and efficiency. 

[00:09:21] LA: Let's talk more than about that distribution. In order for a piece of content to be in the IPFS, it needs to be stored on at least one of the nodes that make up the IPFS system. It could be stored on multiple nodes. How and who decides which nodes it's on? 

[00:09:38] MO: Yeah. Yeah. Great question. IPFS is, again, a little bit different than blockchain for some of the people that might be coming from that ecosystem, which is, in the blockchain world, every node has every piece of data and every node has to have every piece of data. 

[00:09:54] LA: It's one of the problems with blockchain is it keeps it from scaling better because of that requirement.

[00:10:00] MO: Yeah. Exactly. Exactly. And IPFS was very intentional in doing it kind of almost the reversed. It gets that content mutability and that verifiability. But in order to reach those levels of scale and performance that are quite frankly expected of most modern systems, it intentionally makes storing of content an opt-in thing. 

To add content to IPFS, you would either add it to a node that you are running yourself. Or you would utilize another service out there, such as Pinata, to host that content for you. And in doing so, you get to choose how that content is stored. How many times it is stored? And really kind of choose that level of decentralization, versus scale, versus performance. And find the mix that works best for you. To kind of summarize up, there are many replications of your content on IPFS as you want. And, ultimately, that's up to you as the owner of that content to decide how well-dispersed it is. 

[00:11:07] LA: Oh. You as the owner decides which other nodes it goes to. Is that what you're saying? 

[00:11:13] MO: Sort of. Yeah. You can decide to put content on a node that you own. And this is a frequent on-ramp into IPFS. If I'm running a local version of IPFS on my computer, such as like, say, IPFS desktop. Or if we want to get a little bit more into cloud, we can be running a server on a modern cloud provider and run a version of Kubo, which is that Golang client I referenced earlier. And you could use its built-in API to add a piece of content to it. And then at that point, it's going to be discoverable on the IPFS network. 

However, a common problem that people run into is what happens if I shut my laptop off? Or what happens if my online server goes down? And that is where services or pinning services as they're called will utilize kind of a more dedicated approach to keep that content online 24/7. And those are often paid for services. And many do have free plans as well. But that is kind of think of like you're paid for backup layer. Except in the case of IPFS, your backup layer can also be your performance layer. And it can be your like always online layer. They'll work together in harmony. If your note is online that you're running on your computer, for example, and the pinning services is online, you have two copies that are existing simultaneously on the IPFS network. 

And we want to talk back to asking for the content by its ID versus where it's at, if you go to an IPFOS node or gateway, and we can touch on that after this, and ask, "Hey, get me this content," it'll find the content from the first place it can find it. That may be your laptop or it may be the pinning service. But the important thing here is that it can grab it from either of those locations in that same content-addressable way. If either of them go offline, the content still lives. And this is a really important selling point for IPFS.

[00:13:16] LA: Got it. You decide how many nodes it's on from the standpoint of nodes that you own. And you can put it on as many of those nodes as you want to. And that's either you own or you're renting a service that does that. Or however that is. 

[00:13:28] MO: Yeah. Yeah. 

[00:13:30] LA: But on the other side, if somebody - Taylor Swift comes on board and puts a piece of content on, it's going to be a very popular piece of content. Sorry if you're not a Taylor Swift fan. We'll think of some other name. Some very popular piece of content.

[00:13:42] MO: No. that's totally fine. Yeah. 

[00:13:45] LA: And find a significant fraction of the people who are on IPFS actually want that piece of content. It can actually be distributed to other nodes and stored in other nodes beyond the ones that you control. Is that correct?

[00:13:58] MO: Yeah. Yeah. To an extent. I mentioned IPFS gateways earlier. And IPFS gateways play a very important role in the IPFS ecosystem, which is they provide an HTTP onramp for people that are looking to pull content from the IPFS network. It's not feasible for everybody that wants to participate in this kind of ecosystem to run a dedicated client on their machine at all times or have to install specialized software. 

And even in the case of some application developers as well, they may not want to make a dedicated daemon or implementation part of their software stack. They just may want to hit a URL endpoint. Gateways provide a really nice tool for that. What a gateway is, is it's an IPFS node. Many of them are running Kubo under the hood. And when somebody comes in and they say, "Hey, I want this piece of content," they do that by going to the URL of the gateway and then they just pass in /ipfs/cid. And then it gets passed through that proxy layer. And the IPFS node under the hood will go search the network, grab the content, and return it back to the user via that HTTP layer they're so used to. 

Why do I mention that? Going back to your point of kind of the Taylor Swift analogy and caching the content on multiple nodes, every time a node pulls content from the network for whatever purpose they want, it could be they want to download it to their machine, or it could be some gateway that an end user requested the content through. It will often temporarily exist there at a caching layer so that it can be utilized again and again for faster content retrieval. 

Now, unless that content is pinned, which is the terminology we use for saving that content to an IPFS node, it will eventually get garbage collected. It's not a forever thing. But it is kind of a built-in caching layer for, like you mentioned, popular content. 

[00:15:59] LA: Yeah. That sounds like a traditional caching. But with the ability to pin means you can essentially take cached content and not let it expire is basically what that - 

[00:16:09] MO: Yeah. Yeah. Effectively. If you were to go out as an end user and you were, let's say, grabbing the newest Taylor Swift song and you wanted to pull it from one of those gateways for IPFS, that gateway would cache that content for a brief period of time. And that content would be more discoverable on the general IPFS network for a period of time as a result. 

If multiple gateways were all experiencing this right, that piece of content would be very hot, very discoverable, and very performant for the most part. But those gateways themselves are very much likely not going to pin that content. This is where we talk about kind of that persistence layer that we touched on earlier, which is either your own computer. Or it could be a cloud server that you're running. Or it could be a pinning server service that you are paying to keep that content online for you 24/7.

[00:17:04] LA: Got it. But it doesn't have to be either geographically or network connectivity near where the content was generated. It can be anywhere in the IPFS network. And you can pin it all over the place if you so desire. But, generally, it's only pinned to places that you, the owner the content, would want it to be pinned generally.

[00:17:26] MO: Yeah. With IPFS, you're not able to tell other nodes to pin your content for you. You can only tell your own nodes that you control to pin that content for you. The exception is that - 

[00:17:36] LA: Can you tell them not to pin it? 

[00:17:38] MO: No. You can't. It's going to be - anybody can pin any content they want. And nobody can make you pin content for them. Pinning is a purely voluntary thing that you choose to opt into as a node runner. Or if in the case of a pinning service somebody is paying you to pin that content for them, at which point that is kind of more of a business transaction.

[00:18:01] LA: Right. Right. Yeah. The reason why I'm asking these questions is I'm thinking use cases. And I know we really haven't talked about, you know, what the valuable use cases are and some examples of it. But one that comes to mind is distributing public data without worrying about censorship. The ability to put information on the network. That even if your government asks you to take it down, you can guarantee or you can have a reasonable level of guarantee that the information will still be out there. That seems reasonable and possible with this sort of a network. But you need these additional pinners in order to guarantee the content stays around.

[00:18:40] MO: Yeah. It's an important topic of conversation. And to your overall point, yes, you do see IPFS utilized in that way in some cases. There is an important caveat, and that is going to be that any businesses that are participating in IPFS, such as pinning services that may pin content for their users, or gateway operators that are serving that content through domains that they operate, they are very much still responsible for following local jurisdictional law. Or else they put themselves in legal jeopardy. 

Yes, IPFS can be utilized to kind of keep important content online. But it is very important to keep in mind that the degree of which that is able to happen is kind of very much still falls under legal jurisdiction where applicable. If you're wanting to care about keeping a piece of content online on the IPFS network, it is ultimately going to be up to you as an owner of that content to make sure that that content is pinned in places that will keep it online and that have jurisdictional law that aligns with your needs.

[00:19:50] LA: Yeah. That makes sense. It's not the anti-censorship OS that you see other people trying to get - trying to find and use for various purposes. It still is - it's as free from regulation as the rest of the internet is, which basically means not very free.

[00:20:09] MO: Yeah. Effectively. I mean, IPFS, the CID layer of it, right? You can ask for content based on the content itself versus where it's at. If somebody does have that content, you can get it. It's just somebody has to have that content. If you are wanting to take that risk, then that is something that you can do. But, again, it's - 

[00:20:33] LA: But it is a risk. It's not designed for that use case.

[00:20:37] MO: Yeah. It's not designed - I always tell people, IPFS is not - the goal of it is not censorship-evading. Or it's not getting around legal issues. It's more making sure that important pieces of content don't go offline due to, say, dead links, or server outages, or something like that. And that this content-addressable system is able to kind of stay resilient regardless of the faults or limitations of one provider. 

If we go back to what I initially said is IPFS can be as centralized or as decentralized as you want. Ultimately, it's up to every individual that's participating in the ecosystem. How distributed they want their content? Or how performant they want their content? If you're optimizing for performance, you're going to want to choose something like a pinning service in most regards to get that content to your users as fast as possible. But if you are optimizing for resilience, you want to pin that in as many places as possible. 

And I will caveat that I want to take a brief moment to kind of mention here that many pinning services, such as Pinata, are often not a great place to store content that you're looking to kind of evade censorship with. We as a service, as I mentioned, we have to obey by our local laws and any sort of malicious content that we get uploaded to us is automatically flagged, taken down, blocked. Same thing with piracy. Same thing with really any content that can get us in trouble. We've had to invest significantly in systems that protect us from this type of content as it does create a domain risk for us. And you'll see many other IPFS providers have invested in similar based detection systems. 

[00:22:31] LA: Yeah. Okay. That makes a lot of sense. And that actually answers a different question I had too, kind of the opposite of the anti-censorship, is know if someone puts content on the network that you don't want to have on. You own the copyright of it. Someone else put it on. Whatever. Is it possible to get that content off the network? And the answer really is probably the same as it is with the rest of the HTTV internet is that, yes, because of local laws and jurisdictions. But that's not a surefire guarantee that all copies go away.

[00:23:05] MO: Correct. Yeah. That's very correct.

[00:23:06] LA: Okay. Let's talk about the advantages of IPFS over just straight standard client-server HTTP sort of content distribution. And I think a lot of it is focused around the identification of the content itself. Is that a fair statement? Do you want to go into that in a little bit more detail? 

[00:23:26] MO: Yeah. Yeah. It is. On the use case side there, like you mentioned, it's all around that CID. And one of the reasons why IPFS has found itself to be so powerful from the blockchain ecosystem, which is where you'll see a lot of these use cases, is you're able to effectively kind of break free from those blockchain limitations we talked about earlier from like the data size standpoint. 

And that what may be a 5-megabyte image that would be very much financially infeasible to store on a modern blockchain, you can kind of get around that by utilizing IPFS. Many people are probably familiar with NFTs. And what NFTs will do is, rather than host an entire image on the blockchain itself, they will instead utilize an IPFS pointer. And the form they'll typically take is it'll look like IPF://cid. Kind of similar to a URL but a little bit more designed for IPFS. 

And then all of the providers in the ecosystem kind of have this collective understanding of, "Oh, if I see an IPFS protocol URL in a smart contract, I know that I need to go grab that content from IPFS itself versus try and find it on-chain." What that allows is all of these application developers and kind of that Web3 blockchain ecosystem to have this immutable, time-stampable data that is much larger than what have otherwise would have been feasible on a blockchain just by itself. 

[00:25:08] LA: That makes sense. Now the way you describe, it would be useful if browsers supported IPFS natively. Is that a fair statement? In other words, just treat IPFS as another protocol like HTTP. And it would make those sorts of use cases of just popping up content when you see a URL that looks like an IPFS URL versus an HTTP URL, it would make those use cases much smoother. Do you see that as the future of IPFS? Or do you see gateways as more the future? 

[00:25:44] MO: I see it as a little bit of both. You see it as a little bit of both. There are browsers right now that utilize kind of built-in IPFS implementations inside of them to actually recognize and pull content from IPFS when it is noticed. Brave is a good example of this. You can go to the Brave browser and you can - if you put in an IPFS URL, it will automatically detect that and pull it using a built-in implementation. 

And there are a few other browsers that do this. And then most of, if not all of the major browsers have an IPFS companion tool that operates as an extension that you can install and kind of get similar behavior. This is super powerful for people that want to kind of bring it, you know, to the - kind of get to the bare metal layer and do things themselves. The hobbyists, I would say. Or the people that really care about decentralization. 

And it's a great fit for them. And we are really happy that that use case exists. Because as we mentioned earlier, sometimes domains can be taken down. And it provides people with the ability to still participate in this ecosystem if they are under a more - we'll call it strict jurisdiction. Or we've even seen this from a firewall perspective is some firewall providers at local ISPs, if they ever decide to block anything, an IPFS URL or a gateway domain, you can still access content. 

But speaking of gateways, to go back to your question, do I see one or the other? I think it's both. You'll often hear that people will say IPFS is meant to replace HTTP. And I don't really think that's the case. That's really never been the case or the stance that Pinata's taken as a company. We're very much for operation in harmony. Kind of choosing the right tool for the right problem. 

HTTP is great for a lot of things. And we're not here to say anything different. But we also think IPFS is great for a lot of things. And on that same note, we think that utilizing IPFS gateways is a fantastic way to onboard people from the traditional web ecosystem into the world of IPFS. 

The beautiful thing is, no matter what gateway you want to use, if that gateway goes down, you can just use another one and it'll work the exact same way. We have some open-source software that we've written on our end called IPFS Gateway Tools, which is a great example of this, which you as a developer of your application, you can choose what IPFS gateway that you want to use for fetching content. 

Let's say that you encounter an IPFS link or even a full gateway link that uses somebody else's gateway, we can automatically replace that with the gateway that you want to use and pull content via a gateway that you control and that you trust. We think that gateway is a very powerful tool that kind of it provides the best of both worlds, if you will.

[00:28:48] LA: That's cool. And so, I see for static content, it really can replace a lot of what HTTP is doing. But it doesn't require replacing it. It actually more supplements it is a better way of describing it.

[00:29:01] MO: Yeah. It's an excellent way of thinking about it. The old example we constantly use is I imagine a lot of listeners to this podcast will probably be very familiar with a dead GitHub link. And that's not fun. Right? I've been there hundreds, if not thousands of times. And, ultimately, the reason that's the case is because GitHub doesn't use something like IPFS or content addressability. It uses that traditional server-based, path-based methodology. And that's where kind of IPFS comes into play is if you ever want to link to something that you want the link to kind of stay persistent, not change over time, you're getting exactly what you expected, IPFS is a great use case for this. 

And you start to see this. There's a lot of examples of people kind of building almost like Git-based IPFS systems and using IPFS in this way. It's made for content that you want people to, again, just know exactly what they're getting. If I'm clicking on an old GitHub link, even if it's not 404'd, the content may change. They may have updated it. I'd like to know that. But I don't know when that link was put there. It's kind of hard to verify that stuff. And IPFS provides a really powerful primitive for just ensuring you're getting what you're expecting.

[00:30:23] LA: Right. Right. Let's talk about non-static data. Simple use cases of a piece of content that was updated and redistributed. Is there any way an IPFS to relate those two pieces so you know that this is an outdated version of the content? This is a newer related version of the content. I know you can get to both of them because they have unique IDs. But is there a way to correlate them in that sort of way so you know which one is the most recent version? 

[00:30:54] MO: Yeah. There are a few things that have attempted this. And it kind of depends on what layer of the stack you're looking to operate on. For those people that are looking at the URL, HTTP layer, there's something called DNS link, which utilizes gateways. And I can do softwareengineeringdaily.com and I can point that to an IPFS CID and I can update that using DNS records. But underlying, that's still using DNS in kind of a very similar way. It is a nice tool. But it's still kind of utilizing a lot of those traditional Web2 technologies if you will. 

There is something called IPNS, which is the InterPlanetary Naming System, which was meant to solve this problem. And what that does is it allows node operators to kind of publish a CID. Except it's not really a CID. It's more of a pointer to a CID on the IPFS network. And they can update things from there. 

This works for the most part. But there are a lot of performance issues, which have historically come from that. And at the time of this podcast, I'll be honest, I'm not exactly sure where those performance standards lie. From our perspective, from my personal perspective, I've often steered people that if you're wanting to work with immutable content, you're often just best using something like S3 in my personal opinion. Because a lot of the tradeoffs of something like IPNS, you still have to rely on somebody running a node. You have to trust that person that's running that node to get that content. 

And if you're updating a pointer there, you may be doing it on kind of a peer-to-peer network. But you're losing a lot of those immutable benefits that IPFS comes with. If you're giving up those benefits, oftentimes, for many use cases, I see that it's better for people to just use something like S3, which is really good at what it does. Right? I'm not here to say that S3 is a bad product. Because, ultimately, it's done incredible things for the world. It's just kind of using the right tool for the right problem. 

[00:33:06] LA: That makes sense. And I think that the next logical question that comes out of this, and I think it's a very similar answer to that last answer you just gave, is what about fully dynamic content? Is there any role for IPFS with fully dynamic content? Or is that just no? That's a case where you want to use a different tool. 

[00:33:25] MO: Yeah. It depends on two things. There's two cases that I'll break that into. And the one is people that aren't using a blockchain. To them, I say I would honestly just use S3 or a similar traditional server-based system. 

However, if you are utilizing a blockchain, which has that immutable ledger of changes, IPFS actually becomes a good complement there for that dynamic content as - are you familiar with smart contracts, Lee? 

[00:33:56] LA: Yep. Yeah, I am.

[00:33:57] MO: Perfect. For those people that are listening that may not be, effectively, a smart contract is think of it as code that lives on a blockchain that you can execute. And blockchains are really good at being time-stampable ledgers. And you may be familiar with them in the context of, say, cryptocurrency where money is going back and forth between some wallets on one big global monetary ledger. Smart contracts kind of break that down to the computer layer. So, you can write some code. 

And as things happen on-chain, there's a record of that that exists permanently on that blockchain. You can see a historical change of things that have happened on that smart contract basically since it was created. And for your purpose where you say dynamic content, what you can do here with a smart contract is you can have a pointer in that contract that points to a piece of content on IPFS. And then if you need to make an update to that, you can. Granted you need the code, the smart contract to be able to update it. But if you do make that choice and kind of plan for that, you can update that content that people are going to find when they're interacting with your smart contract. And you'll see this in a variety of different use cases. Almost too many to count. 

But when they do update that piece of content, the important thing there is that that immutable change record exists on-chain. For your users that are familiar with Git, which I assume many of them are, how Git kind of keeps that change log of content of your code as it's changed over time. Blockchains do a very similar thing. And for people that are wanting dynamic content with IPFS, that acts as a changeable log where you can see, "Okay, here was the content at this piece of time. Here was the content at this piece of time and this piece of time." It's a good match for that use case.

[00:35:52] LA: That makes a lot of sense. And so, when you think about security, I'm trying to get an idea for what IPFS thinks about security and its role in keeping content secure and safe. Now there's a couple of aspects to security. There's security from the standpoint of guaranteeing that it's not changed by somebody else. And IPFS, I think, we've shown with the whole idea of how pointers work or how CIDs work is pretty good at that. 

But what about security from the aspect of keeping content invisible to other people? It's not very good from that standpoint other than from the idea that CIDs themselves are hard to guess and the security associated with that. What's your thoughts on how security in the world of IPFS, where does security fit into this whole model? 

[00:36:43] MO: Yeah. There's two things I'll touch on there. Towards the end, you mentioned finding content by the CID. And, ultimately, IPFS is an entirely public network by design. If you're wanting to work with private content, we really don't recommend IPFS as a solution for that. It's just not the right tool. And I don't think it will ever be the right tool for that. Because in order to be the right tool, it would have to make a bunch of tradeoffs that kind of degrade it for all the other things that it's good at. 

But on the malicious content side of things, I touched on this a little bit earlier, which is, ultimately, it's kind of up to you as the retriever of that content to trust where it's coming from. You can trust that the content is not changed. But, I mean, similarly to random files on the internet, most people, it's advised that they don't go downloading random things on the internet that they're not sure what they are. Similar thing applies to IPFS. 

And we as Pinata and like other IPFS ecosystem providers, we do maintain lists of malicious content, of illegal content. And we do prevent our systems from serving those pieces of content. In a sense, that offers some protection layer for end users. But we still, as a general rule, recommend don't go downloading things that you're not sure or where they're coming from.

[00:38:13] LA: Yeah. It's security due to the fact that the main nodes that make up the network, like you guys, care about security. Not security built into the protocol itself.

[00:38:26] MO: Yep.

[00:38:26] LA: Well, let's move on to a different topic but very much related and something I know you're very much connected to. And I want you to talk about how this fits into the whole IPFS blockchain ecosystem, the whole Web3 ecosystem in general. And that is the social media service Farcaster, which I know you guys are involved with. Why don't you start out by telling me what Farcaster is and how it relates? 

[00:38:55] MO: Yeah. Absolutely. Farcaster is, to quote one of their founders, a sufficiently decentralized social media layer. What they mean by that is Farcaster is building a social network and a social protocol layer that is - it's not going to be as fully decentralized as something like say, uh, you know, the traditional blockchain. But it's going to be performant enough for the average user to utilize. And can see this through the many applications that are building on top of it. 

We as Pinata, we've always been very - try to be pragmatic about where we approach things. We always viewed IPFS as a great intermediary between fully decentralized and fully centralized. It's kind of like pick and choose as much as you want. And the same concept applies very much to Farcaster. 

Farcaster, you'll see a centralized team is building it. They are making clients for it. But the really amazing thing about Farcaster is that it's built so that all data on it is public and kind of ingestible by any developer that wants to build on that ecosystem. 

Why do I bring this up? It's in the context of a lot of the social media services that you've seen nowadays, you've seen instances where we find out that users don't really own their data. And, oftentimes, that data is just being either sold off or locked behind very expensive pay walls as we've seen recently with many social media services trying to kind of prevent those AI systems from scraping all of their data. 

Farcaster takes a little bit of a different approach here. And it's that that data is kind of open for everybody and it's intended to be that way from the ground up. And what this has allowed is rather than Twitter owning the only client that you can interact with Twitter with or Reddit running the only app that you can interact with it with, you're seeing this almost kind of explosion of applications and builder tools on how to interact with the Farcaster ecosystem and do really cool things in the Farcaster ecosystem. 

One of the things that recently was released was called Frames. And kind of sounds simple. But, I mean, it's effectively an app in your time line, which why didn't we think of that before? But it's kicking off some really, really creative things that a lot of the developers that are coming from this Web3 ecosystem really embrace. And we're just excited to kind of be a part of it. 

We are offering our IPFS tooling as part of that ecosystem. But you don't have to use IPFS to be a part of it. And it's ultimately up to you, right? It's how centralized or decentralized you want to be. And, again, going back to kind of just choosing that for yourself, Farcaster allows for that. Allows the user to be in control of their data. It allows applications to be in control of what things they want to support. What they don't want to support? It's kind of just building this base ground social layer that anybody can participate in, which is really exciting for us.

[00:42:16] LA: The time-based record of a blockchain, the immutability of large quantities of data of IPFS, and the social network and interconnections of Farcaster. Those three things can kind of work together as needed to solve whatever problems a particular application developer want.

[00:42:35] MO: Yeah. Yeah. And so, another way to look at it is Farcaster is kind of another extension in the Web3 toolkit ecosystem. Blockchains were very good from the monetization layer. IPFS solved a really important role at the file layer. And then what we're seeing with Farcaster is Farcaster is kind of providing that next layer of social interactivity or like messaging between people if you will. And interactions between people themselves. Not just applications. But like the people themselves. 

A lot of those primitives that I talked about just before, such as blockchains or IPFS, they're finding themselves being natively part of this Farcaster ecosystem where people are able to do things like interact with blockchain applications from their Farcaster timeline through one of these frames. Or they're able to load content from the IPFS network into their feeds. And it's all just kind of working together in this trinity if you will. I guess to kind of fully summarize is we're seeing Farcaster as kind of like this next building layer for the Web3 ecosystem.

[00:43:52] LA: Cool. That actually sounds pretty exciting. I've been trying to think about what based technologies do we need for Web3. And I'll be honest, I didn't initially think of social network as one that would fit into that. Obviously, distributed file system, blockchain atomic recording was an important part. But I never thought that the social network was part of that fundamental core. But it really is. It's an important part of the core that's going to make up the Web3 changes that are coming.

[00:44:25] MO: Yeah. And, ultimately, kind of when Farcaster, at its cores, it is a protocol. Well, the primary use case of it right now is being utilized for the Farcaster kind of social media network if you will. 

The way we're seeing things moving is that that's pretty rapidly getting extended upon to other layers and other types of applications. I'll shill out our VP of product here. He's making something called ReadCaster, which is - have you heard of like Goodreads before? It's kind of very similar to that where people are able to post books, and review them, and find similar interests. And that's just one example of what can be built here. It could be Twitter-esque things, Reddit-esque things. 

But what I mean by like a social layer, it's a layer where the human user gets to interact with things, and play with things, and basically communicate with other people the network in whatever format best fits them and whatever format kind of takes off to support that. 

If the Warpcast client, which is a popular client for Farcaster, kind of supports that Twittery, Reddity experience. Well, then on the flip side, maybe ReadCaster supports that. Find the books that you love type experience. And I'm sure there's going to be many more examples I can speak to here.

[00:45:52] LA: This is going to be an exciting next 3, to 5, to 10 years as Web3 really begins to mature here.

[00:46:01] MO: Yeah. Yeah. We're extremely excited about it on our end. Yeah, just the general energy that we've seen from that ecosystems right now is reminiscent of kind of some of the past years we've seen in the Web3 ecosystem. There's a lot of excitement. A lot of really smart people trying to do some really cool things. And I can't help but be excited about that.

[00:46:21] LA: That's cool. Well, thank you very much. This has really been a great conversation. I appreciate your time, Matt, and talking to us. My guest today has been Matt Ober, who is the co-founder and CTO of Pinata. An onramp to the InterPlanetary File System and Farcaster. Matt, thank you for joining me today on Software Engineering Daily.

[00:46:43] MO: Yeah. Thank you so much for having me here. I enjoyed the conversation.

[END]