EPISODE 1790

[INTRODUCTION]

[0:00:01] ANNOUNCER: OpenTofu is an open-source alternative to Terraform, designed for managing infrastructure as code. It enables users to define, provision, and manage their cloud and on-premises resources using a declarative configuration language. OpenTofu was created to ensure an open and community-driven approach to infrastructure tooling, and it emphasizes compatibility and extensibility for diverse deployment scenarios. Cory O'Daniel is the CEO of Massdriver, and he's a founding member of OpenTofu. Malcolm Matalka is a Co-Founder at Terrateam, and he's a founding member of OpenTofu. They join the podcast to talk about the OpenTofu Project.

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:01:00] SF: Cory and Malcolm, welcome to the show.

[0:01:02] COD: Yeah, thanks for having us.

[0:01:03] MM: Yeah, thank you.

[0:01:04] SF: Yeah, I was super jealous of both of your setups. You got a lot of band equipment going on. I just got a bedroom setup. Not that anybody can see this on the audio, but if you could see this, I'm definitely the person that's lacking in background to Cory here.

[0:01:17] COD: I mean, if you want, we prepared an alternate show. We can actually do Stairway to Heaven and just do a jam sesh, if you want.

[0:01:25] SF: Well, we'll see if things start to get stale, we might have to go that direction.

[0:01:29] COD: No stairway. Denied.

[0:01:32] SF: All right. Well, so we're talking OpenTofu today. The project's a little over a year old. Couldn't we just start off with a bit of background on the project, in terms of what it is, how it got started, how both of you got involved. Maybe Cory, you want to jump that off?

[0:01:46] COD: Yeah, so it was a fateful, I think, Thursday or Friday morning. I hopped on Reddit to see what was people were mad about today. Lo and behold, the license has changed for a number of HashiCorp products. It's funny, being in the OpenTofu community. Malcolm and I are buds, despite the fact that we're competitors. I've got friends at Spacelift, despite the fact that we're competitors. I immediately reach out to one of my buddies at Spacelift. I'm like, I send him a link, I'm like, "Did you see this?" He's like, "Yeah, I saw this. We're putting together a Google Meet later today."

We got six, seven, eight people got on a call from all the different orgs, like day one. We just talked through what that meant for each of our businesses, what that meant for the community. we decided, looking at the landscape of IAC, every single major IAC tool is owned by either a venture-backed company, or a public company. There's none that's actually truly open source, owned by a foundation, not completely being run by one org. We decided to put our efforts into figuring out if it was plausible to do a fork of a project as big as Terraform and went on from there. I mean, there's a lot more detail, but that was the early gist of it.

[0:03:01] MM: Yeah. I mean, so that's early moments. Terrateam is a pretty small company. I don't know, when was mass driver founded, Cory?

[0:03:08] COD: Oh, I think 2021, but I swear all the time that Spacetime doesn't work the same for me. I'm just not good with time and dates. I think it was 2021.

[0:03:17] MM: I think we're all the same age in terms of companies, but different sizes, different backing and all that. At least from my perspective, what was really wild was everyone just jumped into working together from there. I don't even want to say it was a shared enemy in sense, but it was more like a shared positive goal in that we thought that this should all be open and that's the right choice for people. We all just got together, worked together, like really no drama or anything like that on our side. It just happened.

Cory and I met through that. We had our first podcast date together shortly after that. It's to the point where I think, I consider Massdriver and Terrateam sort of sister companies, like you have sister cities. He's the Toledo, Ohio to our Toledo, Spain.

[0:04:03] SF: Nice. I like that analogy. You get this Avengers of IIC together on a meet call. You all agree that there needs to be, essentially, a truly open-source version of this. What happens after that? How do you actually go? I mean, there's a ton of companies, ton of project ideas that essentially die at the stage of that idea. It takes a lot of execution from there to actually turn it into something. What happened next?

[0:04:29] COD: Yeah. I think the thing that happened to us immediately is one org forked Terraform pretty early. What was wild is like, right? I mean, I think a lot of times, we talk about star counts and whatnot and it is definitely just a fluffy number. What was interesting is as soon as that fork happened, the count people following it started going up pretty significantly. I don't know, are people starting it so they can follow the drama, or are they starting it because they're interested in the project, right?

I feel like, if you're upvoting something on Reddit, you're probably there for maybe the drama, see what's going on, but starting something, you're not going to get that drama feedback, right? A lot of people say like, yeah, people were jumping in early on because they wanted to see what was going on. Like, we got our first piece of drama in the IIC space, right? I don't know. To me, it was like, no, these are probably people that are interested in seeing a project like this be open source, right? I mean, BUSL is not an open-source license. Then it was essentially making the manifesto. We wanted something a little bit more than just some stars. We wanted companies to actually come out and be like, "Yes, we will support this. We think this is a good idea." We tossed our names on there first, because we wanted to show them that we were in on it. We put what we were going to try to give to the project.

We just started getting people, like literally the amount of people that started showing up and modifying that HTML page on GitHub, sending pull requests, that was our immediate first problem was, okay, this already is a lot of people to manage just for this one HTML page. We do have a feat in front of us, but then it was immediately trying to figure out like, okay, how do we go about this? How do we get a good lead for the team? What does this actually look like? What's the right foundation for it? That was the next steps was taking that from just this idea that had a little bit of a spark in GitHub actions to how do we start to get people actually involved and invested in what we're doing?

[0:06:17] MM: Yeah, the interest was really palpable to not just from individuals, but other companies. Some of them, because they use Terraform at that point and really felt it was the right thing to do. Others, just because they want the infrastructure world to be more open, more open source, and they just want to support that. We had support from pretty much every direction, every angle, every size organization as well.

[0:06:41] SF: What is the role of supporting the project, like a tail? What do you gain from it by supporting them?

[0:06:48] COD: I'll let you answer first, because my answer is very different from everybody else. I'll let you go first.

[0:06:56] MM: All right. The way it's organized at this point is various companies have pledged full-time employees. Then they hire those employees. I think it works a little bit different for each organization. The rough idea is you are not really a part of that company, you're getting paid for it, but you're working on OpenTofu. In that sense, it's just giving money to this organization. Everyone works more with Linux Foundation. As a small organization, our support is less monetary and person time and more about almost marketing, talking about it, providing feedback. 

We have customers using it, so we bring that feedback to the devs on there and give them some thoughts on that and how people are using it in the real world. What we get out of it is just a good tool, a tool that's getting better, a tool that actually responds to what the users are interested in more so than necessarily what a particular organization is interested in.

[0:07:53] COD: Yeah, I think from our side, so mine's a little spicier. Massdriver is not a Terraform cloud competitor, right? A part of this whole change of the license was the FAQ attached to it. The FAQ is it's just a laundry list of this obviously not being thought through. When they have to add a changed timestamp to a single question, like you guys didn't think this one through at all. One of the cutouts -

[0:08:18] MM: Still updating it too, man.

[0:08:20] COD: I mean, because people are emailing that licensing at HashiCorp and they're getting new questions. I ask if people are getting new bills, right? For us, we actually fell in this tiny little cutout in the BUSL, where if you're using the tool, but not a direct competitor to the service, you can continue using it. Massdriver competes more with their waypoint products. We're not a direct competitor to Terraform cloud. We joined a little bit out of spite, honestly, and we've done a bit of writing about this. You can find it on the Massdriver blog. When we saw this happen, hearing that a bit about the community is not giving back, there's a bunch of companies taking from what we've built. It's like, dude, this is open source. That's the point of the thing. I've watched videos of the HashiCorp founders talk about this exact thing, the importance of this being open source and being owned by the community and not by a company.

To me, this was very obviously, I called it very early. I'm like, they're getting acquired and they're trying to capture IP. Before the IBM thing hit, pure speculation at the time. We joined out of A, this needs to be open source, but B, I felt honestly a little backstab. Community isn't just adding something to the repo. Community is, I've sold millions of dollars of Terraform cloud as a consultant. I've come into companies and said, this is what you need to use. That's how I've been treating that. I've built modules. I've convinced teams that that is the technology to use. I've done my part helping build this thing over the past 10 years into here. No, no, no, no. You haven't made a single commit to the go repo. It's like, that wasn't cool. That's why we got involved in it. It was like, this was just a pure passion play for us, because we're like, this was not a cool move. this does need to be open-source technology, whether we compete with them or not, we want to be a part of this. Honestly, I saw the end of Terraform as they made that decision. I think to date, they're still putting nails in their coffin around this project.

[0:10:09] MM: The thing you really have to appreciate about Terraform as well, which is different from the other tools that came in through that license change is Terraform really lives and dies by the community contribution in terms of providers, modules, tooling around it. You have all this other thing. You really have to think of Terraform more like a language, like Python or C, and all these different implementations exist. C or whatever is not really valuable in that the language exists. It's valuable in that you can get all these libraries to accomplish a task, or that the implementations are robust enough, you can write production software. That's really the value is.

Terraform in particular, I think, feels really negative on the community that are providing all this other value to make it useful to everyone else in the world, beyond just going to this go code base.

[0:11:02] SF: Right. Yeah. Sticking with that analogy of programming languages, there hasn't really been a programming language that is a closed source programming language in a long time. Even languages like C sharp that was historically closed source are now open source, because essentially, I think everyone recognized that a language alone within its own ecosystem only has a limited, essential, value, because it's really about other people contributing, essentially, to the greater ecosystem to your point of the community contributions have been made to Terraform historically. Exactly. in terms of open tofu itself, if I'm interested in investing my time into, I want to do infrastructure as code, how do I think about OpenTofu even outside of the sourcing, but just from a feature perspective versus things like Terraform, or CloudFormation, or any of the other available options on the market?

[0:11:57] COD: Yeah. I mean, as far as investing in it, I think, honestly, when you look at the state of CD report and you see the adoption of all the tools that are out there, Ansible by far is still the most widely adopted tool, but I think it's for a different era and different problem that you're trying to solve. When you look at the things for configuring and managing cloud resources, Terraform is by far the leader.

Now, the catch of this is there's still a lot of companies that haven't even adopted infrastructure as code, and that's a huge existential problem for all of us. Our data is in their software. Thinking of OpenTofu versus some of these other tools, you got the cloud formations of the world, you got the Bicep, which is the Azure equivalent. I think Google used to have cloud connector, cloud config, and they're doing a lot of Terraform now, it just generates the Terraform. I think that what's nice about where Terraform is, is it has become the lingua franca of how you manage all of these APIs at scale.

The thing that sucks about the cloud formations and the Biceps, they're great technologies if you're 100% in AWS, but almost everybody is hybrid cloud today, whether they realize it or not. As soon as you reach out and grab Snowflake, guess what? You're a hybrid cloud. You're running in two places. You need to reproduce environments. You need to reproduce configs.  That's where you immediately end up in this place, whereas a team that may be trying to adopt infrastructure as code. If I lean into a cloud formation, I've backed my future self into a corner already, and I don't even know it yet. There's very few tools that are out there that allow you to manage any cloud easily, and you can make your own provider super easy, too. You see the Pulumis and the, oh it's upbounds. I always forget upbounds, cross-playing. Also, great tools, also both derivatives of the original Terraform providers. Thinking about some of these cloud-specific tools versus the things that are built on the Terraform provider ecosystem, to me, I'm just like, any of my customers, I'm like, that's a bad choice. It's going to back you into a corner. You need to use Pulumi, or Terraform, or OpenTofu, or upbounds thing. I already forgot the name again.

I think that's key for just having a team that's nimble and agile and can support all the things they need to. Now, why OpenTofu over the rest is, again, I'm just saying this as a hypothetical. You don't know when an organization is going to be forced to change a license, and this keeps happening again and again and again and again and again in our industry. I'm honestly over it. It's happened with Mongo, happened with Terraform.

[0:14:17] SF: Redis.

[0:14:18] COD: Redis kerfuffle. It just keeps happening, and it blows up in the faces of operations people, DevOps platform engineers. We don't have enough time, we don't have enough budget, and then these license things explode in our face. The idea of something that is built on this provider ecosystem that can work with any cloud and can be extended to support any cloud that is not owned by anybody besides the foundation, a technical oversight committee, and a governance body. That is the most risk-averse decision I think any operations, or DevOps engineer can make when they're starting to take the IAC journey.

[0:14:51] MM: Yeah. I think you really can't stress the importance of OpenTofu being part of foundation enough, because yes, some people cynically look at it, say, "Oh, those engineers are being hired by these other providers." But the reality is it's the foundation, all that work happens in, and it's the foundation organizing all that, and the foundation has principles and I think even bylaws that it has to go by. OpenTofu can never become closed source, or source available only. It has to stick with its license. I think that is just really valuable, given, as Cory said, the prevalence of Terraform-style code in OpenTofu.

I mean, we talk a lot about people managing their cloud resources, but you can manage your GitHub configuration with it. You can manage your PagerDuty configuration with it. It really covers all of this, and you can put all of this in one tool, and that's super valuable, and knowing you can depend on it in the future in perpetuity is super valuable, too.

[0:15:44] SF: Yeah, and also, Cory, to your point, by using something, whether it's Terraform, or OpenTofu, but OpenTofu you have the advantage of it's not going to be company controlled, but you're avoiding that vendor lock-in that you could with the cloud formations of the world, because you might be whole in on AWS today, but for what a reason sometime in the future, you got to go multi-cloud, or you start doing workloads on Snowflake, or something like that, where you end up in this central hybrid system.

Do you think there could have been a world where the fork of this, you open sourced it, but it just didn't take off. It didn't get the love that it needed to be successful. What do you think you did, or happened to avoid that chain of events? 

[0:16:30] COD: I honestly, I think people were just hungry for it. I don't think it's anything we did. I think it was that there was a group of people, not a single org. I think that was the key, and the fact that we're actually seeing progress, right? We got the fork in quick. We had immediately had to start building up a lot of stuff. There was a lot of criticism on Reddit and LinkedIn and whatnot in the early months of like, "Oh, you guys are fork in this. Where are you?" It's like, well, we're dealing with all the legal fall out of forking something.

here's not a lot of legal fall out that just hitting that fork button on GitHub until you piss off a bunch of lawyers, right? Then there's legal fall out. There wasn't just a BUSL change. They changed the licensing for the Terraform registry. If you're not using that Terraform user agent anymore, guess what? You're in violation of the Terra - If you curl, if you curl something, you're in violation of the license for the Terraform Registry. We had to build a new one, right? Day one, it's like, oh, hey, we want this thing with the builder registry. I think that right there showed the dedication of the teams that were involved. We said, okay, well, before we can do this thing, which we're going to do, we're going to go build an open-source registry for all the modules and providers.

That wasn't us just waving the flag of ah, we got our own Terraform. That was us saying, we have to do this work so that this project's even valid in the first place. I think just people seeing that dedication, I think is what got them excited about it and got people to start investing. Now, the other thing that's interesting here is a lot of the early criticism of what we're doing is like, oh, this license change only affects eight to 12 companies. That was, I don't know if I can cuss on here, but that was a big old load of cow poopy, right?

I'm personally friends with multiple people in Fang that have pinned to 1.57, because they're not sure what to do. They're technically a cloud, HashiCorp is getting bought by IBM. That makes it very, very, very complicated as to whether or not they're allowed to use it internally. Almost every single CNCF project started gutting HashiCorp stuff immediately. There was not just us doing this, but you were seeing a lot of organizations that weren't affected by the BUSL change being affected by the BUSL change, right?

I think the engineers adjacent to that are also seeing is like, oh, this is bigger than what HashiCorp is saying it is. It's not eight to 12 companies. This is actually causing a ripple effect everywhere. I have a lot of feelings about that, so I'll shut up. I think that was it. It was the dedication and seeing these ripple effects across the industry, not just across 12 companies.

[0:18:56] SF: Malc, do you have anything to add?

[0:18:58] MM: Yeah. I think, again, I totally agree it wasn't anything that we did. It was very obvious that as soon as this happened, there was people that were interested in open-source solution. I think HashiCorp just made it worse on themselves, too, with the license having an FAQ that kept on changing. If you have any litigious company, anyone with serious value at stake, it's just it's too much unknown there. Then you have someone saying, "Hey, we're a foundation. It's open. Come use us and you don't have to worry about it." It's very compelling for anyone who's just not sure what the future holds.

[0:19:27] SF: Yeah. Then, how does the community help drive which features you get built? You forked it, you had to work through all these details, you're getting feature to parity, but now it's its own thing, right? How do you continue to evolve it and how does community get factored into that?

[0:19:45] COD: Yeah. The roadmap is, I think, mainly two things. One, it is maintaining that compatibility with Terraform, right? Our release cadence is a little behind. They'll release 110. Our 110 comes later. We have to do a lot of reverse engineering for - for anybody who doesn't believe, we do reverse engineering. I'll tell you what, we'd be releasing the shit much faster if we weren't. Unfortunately, we have to. I know. It's just like, oh, you guys just copy and shit. It's like, dude, we'd have 110 out the same day. It's like, we got to reverse engineer the APIs to get that compatibility in there. Yeah, it's just a little bit of thought.

Then, it's also, I think the more important part of the roadmap is the RFC process. That right there, that is the real roadmap of where OpenTofu goes. It's like, we have this support that we're going to have for Terraform for the foreseeable future. Then we have our RFC process. The importance of that is there is stuff in OpenTofu that as a CEO of a company that's in the infrastructure space that I don't think should be in there. But guess what? Doesn't matter what I think. It matters what the community thinks, right? That roadmap being shaped by people opening RFCs, having engagement around it, that's community, man. If somebody opens an RFC and pitches an idea and they don't write a single line of code, I consider them a part of the community.

It's extremely disappointing that HashiCorp doesn't, right? To me, that's it. That's the roadmap is that compatibility, that confidence, that risk aversion of it's going to work with what you have today. Cutting it over, we cut 120 customers over by changing a single word in our code base. Done. Nothing ever broke. Right? That guarantee right there, plus what goes in here is what the community wants. That's the roadmap. That's what people are after.

[0:21:26] MM: Yeah. I actually went and talked to the OpenTofu devs before coming on here saying, "Hey, is there any future vision you want me to share, or where the project is going?" The answer was just, we do whatever the community wants. If you want to be in involved in that, go and vote on things and go and write RFCs. That's it.

[0:21:42] SF: How do those get prioritized? What is the process, essentially, for what - you have to come up with your stack rank, essentially, because you can't do everything. How do what you're doing? How does that process work?

[0:21:52] MM: There's a few paths. One is if something exists already, anyone can vote on it through GitHub. That way, it tracks. It's a one-to-one relationship there. There's not a lot of gaming that. It's a mix of a vote. Then, you have to have a developer go in and be like, "Is this reasonable to do? Is reasonable to do in a time period where we want to release this?" That sort of thing. Really voting gets stuff to the top and at least evaluated as soon as possible. Before that point, though, if you have an idea, you can submit an RFC, which is just a GitHub issue. That will get back and forth from some of the community. Martin is, or May, I've submitted to RFCs, I think. Within two days, it's just full of posts from Martin, I think, just going over it and blasting out ideas, because he's so full of ideas. Then that goes through a box, once it gets more finalized, it goes into a point where it can start being voted on.

Really, yeah, if you're interested in something, go vote on it, comment on it. If you submit a pull request, that is absolutely going to get something going faster, because the things that get voted on are with the devs in OpenTofu who should directly work on. If you have an idea and you say, "Hey, I did the pull request. Can someone review it?" That'll also get it in there fast as well.

[0:23:02] SF: Okay. Then can you tell me a little bit about Stateful Encryption? Is that a feature, am I correct, in it being a deviation from Terraform? Was that the first feature that went out that was the new one? I know, there's a lot of them today.

[0:23:17] MM: Yeah. That came in the premiere release.

[0:23:20] COD: Yeah. That one's important too, right? If you're using any of our tools, like A, it's important to secure. It might have a password or whatever. Honestly, if you're using any - if you're not running an open-source tool yourself for managing your state, your Terraform, or OpenTofu runs and you're using something like M0, Harness, Massdriver, Spacelift, Terra Team, Terramate, anybody, you don't want us to know your usage details of the cloud, right? People are always worried about the passwordness, but the ability to sell the usage data and the information behind the scenes, encrypting that is more than just encrypting some secrets. It's a really important one. That was one that sat stagnant and open for years, and there was excuse after excuses to why it couldn't come to be.

It's like, well, it was pretty easy. I mean, for eight companies that never contributed to anything, we got it out pretty quickly. It's pretty easy, right? It's like, it wasn't as hard as it was made out to be, or impossible to integrate into systems that are managing state like it was made out to be. Just, do you want that data visible or not? I feel like, was the real question. Somebody was like, "Yes, we want to be able to see it." That was one of the first ones. It's an awesome feature. It's very easy to use. It's driven by a variable, so you can pass in that key for encryption. There's a lot of different mechanisms for decrypting and rotating keys and whatnot.

I'd say, if you're flipping to OpenTofu today, it's easy to just throw in a GitHub action for a plan and see that it's all working. Now the one thing, the one little bit of lock in that you'll get as soon as you bite on that is security. If you want to go back to Terraform, guess what? Your state's in the air.

[0:25:00] SF: In terms of the encryption key management, is that done through a cloud KMS?

[0:25:05] COD: Yeah, there's a whole - geez, I can't remember exactly how many drivers there are behind the scenes, but there are a whole host of them.

[0:25:11] MM: Yeah, there's three major ones right now. One of them is just you present the key to OpenTofu, however, you got it on your own and it does it there. Then there's a GCP and AWS ones as well.

[0:25:25] SF: Okay. So, I can bring my own KMS if I want, but you can also keep your line.

[0:25:30] MM: Also, you can go back and forth between an encrypted state and an unencrypted state. If you decide the key management is not what you are right now, or is not important enough to you, you can always go back after you've played with it a little while. That's one thing that's really great about the OpenTofu mentality is there's also, we have migration docs and then switching out is easy enough as well. There's never this point where you switch over to OpenTofu and then you're stuck with it.

[0:25:55] SF: You mentioned this earlier, Cory, the OpenTofu industry and how that was one of the first big challenges of the fork. Can you talk about the process to, as if we build the registry, how you went about discovering version modules and have those providers within the registry?

[0:26:11] COD: Yeah. I mean, the module spec is pretty well defined. They get to go around the Terraform site a bit to figure out how it works. Under the hood, it's mostly just redirects. Mostly of what you see in the Terraform registry is just mappings back to a Git repo, right? We just had to go through Terraform, figure out how that worked. Then it was like, okay, well, how are we going to host this thing? Who's going to run it? That was another concern is we didn't want necessarily like, oh, Massdriver will dedicate the infrastructure now in the registry. I mean, I would have loved that as a business owner, right? It was a great little thing to get on the bottom of the site and a bunch of information to get at the same time. We don't want, again, any company having information about the download rates of these modules and providers. It's important that A, that registry was open source so people can see how it works. B, that we get it hosted by somebody besides us.

That may have been our "big first partnership." I'm throwing that in air quotes with, I don't want to say the wrong name. Cloudflare, right? Sorry, I mixed up them and one other company vastly all the time. I think that was the first big partnership, right? It's funny to look at it, but the amount of thought and passion for the community and how they will perceive it goes into this. I think it's one of our big problems is we haven't done a great job of talking about that passion that we have for building that trust. It would have been extremely easy for any of us to have made a just a registry and thrown some compute at it. I've got more AWS credits that I can deal with. Massdriver could have thrown credits at it. The importance of that being in the public eye and being owned by the foundation and hosted by somebody that's not one of us was also key. That is a part of the decision-making process. I think that's one of the things that builds this trust, even though we don't make us think about it. I honestly, I think we, again, I think that's where we haven't done a great job is talking about like, thinking through that process and making sure that it works for everyone outside of the organizations that are involved.

[0:28:08] MM: Yeah, definitely. I mean, independence is so important for OpenTofu. I don't think we do as good of a job as we should to show it's really, yes, there's money coming from various companies to pay for these employees, but it's really independent in every other way.

[0:28:23] SF: In terms of populating and keeping the registry up to date, did you run into any rate limit challenges with the GitHub APIs?

[0:28:32] COD: I don't think so. The registry isn't - it's not something that is updated super-fast, or it's not like Google indexing the web all the time. It's really more of a batch process. The GitHub rate limits are actually pretty high as long as you identify yourself.

[0:28:48] SF: You're continuing to use HashiCorp's configuration language. Are there considerations, or future plans around enhancing, or evolving PCL specifically for OpenTofu?

[0:29:00] COD: I'm not positive. I think that would be, I mean, honestly, somebody on the OpenTofu team submitting PRs to HCL, I could see that if we need something, or if we find a bug. I would honestly hope that in the idea of community that they would accept that. I mean, I don't think that we're going to diverge from HCL. That's maintained its license. I think changing that is a very nuclear option. I think HCL and the providers both fall under that umbrella of like, if those licenses are changed, the ripple effects there are catastrophic. I think we can just depend on that for the foreseeable future. I mean, if we have to make changes to HCL, we'll obviously submit PRs to it. Hopefully, that goes through.

[0:29:45] MM: Yeah. I think if you look at the - so if you're the OpenTofu on GitHub, and you see the project, you can see what's planned for the various releases. They're usually one or two out. If you'll get what's limiting in the existing Terraform world and now Tofu, it's really not language specific. It's functionality. A big one coming out now is dynamic providers, which let you say, "Hey, I want AWS across all of these different regions. I want to make a new provider for each one. I just want to reference it like you would a dictionary in Python, or something like that." All of that fits well within HCL. It's really about the semantics of how you interpret that HCL and then how you turn that into an actual plan and all that. Really, I think most people are content with what HCL can do. Just getting more functionality in there.

[0:30:37] SF: In terms of some of the deviations that you've made from the Terraform, we talked about the stateful encryptions, but what are some of the other changes that you made?

[0:30:46] COD: Yeah. I think one of the things - so there's a bunch. I can go through a couple of them. One of the things I think is, now I want to point out first before talking about the rest of them is is another one of these ones, it's important, right? As Malcolm mentioned, not only are there guides for how to get into OpenTofu, but how to get out if it's not the right thing for you. It's one of the things we introduced fairly early on is the .tofu extension. With the idea being that Terraform will only touch a .tf file. If you want to start using OpenTofu specific functionality, you can put it in a .tofu file, OpenTofu reads all the files, right? So, you can actually have your plans running side by side and Terraform will only see the Terraform stuff. Tofu will see the entire thing.

That's also important for adoption. It's important for easing that onboarding story of an ops team that's just overwhelmed with a million things to do. I can start trying out Open Tofu without just absolutely blowing up all my pipelines and have to do all this painful stuff. It's just another place where it's like, we don't want to make this hard for anybody to adopt, because we know that these teams already have enough stuff on their plate, but the stateful encryption, obviously, super cool.

We talked about provider defined functions, but one of the other things I think is so rad, I'm a big, big, big, big, big fan of using Terraform, OpenTofu now, obviously, as building operational abstractions for my engineering partners. When you look at the cloud, every API is half ops, half dev, right? I don't like teams having to conflate these things in their heads. I like building really abstractful modules. One of the things that landed was we have a Go and a Lua provider. You can actually just toss in a main.go, or a Lua file, and you can actually write Go that will now execute as a part of your OpenTofu module executing, which allows you to start doing some really interesting expressions. To build out logic, so you can actually codify your processes in, right?

I'm already seeing things, we're like, okay, based off of the types of data and the team, it'll hit an API and pull that team's cost center, right? Something dumb, something silly, but who members their cost center ID and remembers to enter it? Being able to hide that away from engineers was just like, you just get what you want and your cost center stuff's going to be in there. We don't have to worry about associating costs for bigger projects. It's done for you. You being able to get to do that, because you have this richer programming language that can sit behind the scenes, but still in this declarative model. That was one that I - what I saw, I was like, oh, my gosh. If you see my Terraform code, I just have local blocks. There's just huge, there's just loops and maps and all this logic that I could try to build into locals. Now I'm just like, oh, I can just write. I can just write some Go from my complex expression. That was one that I was stoked about. Malcolm go. I don't know what Malcolm's favorite one is, but that was - I was dancing the day that landed.

[0:33:39] MM: My favorite is actually ridiculously mundane, but I like it because I think it opens up a lot more automated functionality. That is early variable evaluation. One of the things that you do is say, you're referencing a module, you used to have to hard code the entire path to the module in your HCL. But now you can make that a variable, possibly environment variable, or environment file, which has a very simple syntax, and reference that variable in the module source path. What's that let you do? Well, now you can do very easily more dependabot type things, where it's really easy to automatically update these versions now.

If your organization chain updates a module, you can feed it through your entire organization automatically with very little logic and very little smarts associated with it. I really like that one. It's one of those ones too that was just on the HashiCorp issue tracker forever. I was like, no, no, no, no, no, no, no, no, no, no. Then OpenTofu is like, "Oh, it's done. Here it is. Have fun."

[0:34:41] SF: Yeah. I mean, a lot of these things, they're fairly - they sound on the surface, fairly simple, but make people's lives so much better. You're moving that death by 1,000 cuts situations that you tend to run into with some of this configuration. In terms of using infrastructure as code, OpenTofu or otherwise, how do you go about teams typically go about debugging and error handling infrastructure changes?

[0:35:10] COD: It's hard. There's not great tooling for it across the board. Many teams, they have a staging version and their prod and they'll just deploy it and yeah. But the thing that sucks about all infrastructure is none of your configuration matters if the apps that are running on top of it don't work. You can stand something up and be like, oh, OpenTofu, apply was green. I can see the thing in the cloud and it's healthy, but until you put an app on it and verify the functionality that you're trying to get to happen, none of that matters. There's levels to testing. It's like, hey, is the code validate? Is the code valid? Does it apply?

A lot of people that are newer to infrastructure and cloud sometimes think, "Oh, it applied. Everything is good." It doesn't mean you didn't have an outage for 45 minutes while it was applying. That stuff is just very hard to catch. Patterns that that we do is we internally master, how we do our own is we define use cases for each piece of infrastructure, like what it's going to do. There's Terraform and OpenTofu test now that'll test your code, but the Terra grunt team made Terra test. You write all your tests and go, but you can stand up other things besides your infrastructure.

You can run it. It'll do your Terraform apply. We'll do things like, hey, we made this queue and we're actually going to push data into it with Terra test and then pull data off of it just to make sure that is the IAM right? Is the IAM right? You make a queue all day long, but if the IAM isn't what you expected to be, that app's going to fail when it goes in the prod. We do a lot of rigorous testing from the developer view of that infrastructure world. It's more rigorous, but we don't have a lot of outages and we build libraries for all the different things that we want to do, so it makes it real nice and reusable. I think that's one of the best ways to think about it. I mean, the number one cause of outages today is configuration changes. 

Maybe we should be thinking a little bit higher than just, did it apply green? Did it exit zero? How does it function from the intended use case? We use those use cases as drivers for our test suite. We'll say like, hey, we support PostgreSQL in three versions internally. We have a development version, which is a single zone, fairly straightforward, we have development serverless, if we're spending up something for a preview environment, and then we have a production config that does our multi-zone and stuff like that.

Our test for our production one will nuke a zone. It'll nuke one of the instances in one of the zones to make sure that we're still able to read and write. That right there, the developer view of the world, the operational day two of the world is the test that I want to assert, not simply that didn't make a PostgreSQL. That doesn't matter. Can I use a PostgreSQL? Does it have the resiliency that as an ops person I'm trying to guarantee? You have to reach outside of pretty much any IAC tool to do that.

[0:38:09] MM: Yeah. I think you really have to appreciate one thing about IAC, or infrastructure as well is that it corresponds to a physical thing in the world. One of those physical things can only exist at one point in time. I know it's very convenient for us to think about infrastructure as code, like software, but software we generate an artifact and we can run tests against it. If those tests pass, we can deploy it to our infrastructure and it'll act the same in those two cases. Dev infrastructure will never be and can never be the same thing as your prod infrastructure. Even if everything works great in dev, you don't really know if it works in prod.

If you are doing tests, just be aware of that and understand what you're going to get out of doing tests in dev and what places they will, or won't apply to in prod. It's just a slightly different mentality in how all of that works.

[0:39:03] COD: Yeah. I think a really good, like I'll spare everyone the details, but a really good example of what each has said is, I have a buddy who works for a big public company and they have hit the IP limits of Kubernetes in AWS. You can't test that, right? That is a very hard thing to test in an environment, right? The rally is like, yeah, when you get to prod, prod's prod, baby. Like, prod is going to do it what's prod's going to do. We can do as good as possible to test it, but you're never going to get prod. Prod is the one and only source of truth. No matter what anybody tells you prod is the source of truth. What's happening there is happening there. To get an exact replica of what you have requires you standing up all the services and getting the traffic, right? You might have something fail because of the amount of ingress that you have. That stuff's hard to test.

[0:39:52] SF: Yeah. Well, given that the number one reason for outage is configuration issues, and I think something like 70% of data breaches, or leakage of information is due to some misconfiguration that led to overprivileged access. I think, Cory, you mentioned this earlier that most organizations still today are not using these infrastructure as code approaches. Why is that? What is the blocker to adoption of this approach, given that it can help solve some of these, or help reduce the risk of some of these challenges for organizations?"

[0:40:26] COD: I want to make sure Malcolm has time to answer this, because I think we have very different user bases. I think we're both very interesting opinions here. mass driver, where we sit in the space, we have a lot of people that use the platform that bring their own IAC, but a whole bulk of our customers do not have infrastructure as code. I actually get to talk to these folks that are starting to make that decision and they're hey, which, which IAC tool should I use when I start using Massdriver.

The biggest thing is time and time again is how little as an industry we invest in our operations teams and our dev ops teams and our platform teams, I think, are getting a bit more love. When we don't invest in those teams, they are any ops person on that's listening to this, you are the force multiplier in your business. It's just waiting to happen. When you invest in those teams, that's where you get that 10x engineer, right? If you think about software and infrastructure as a manufacturing line and the products at the end, ops and dev ops is at the beginning. If that's shitty and not working well, the whole thing's wrecked. If that's performance tuned, the world is amazing.

When we don't invest in these teams, they don't have the time to start looking at tools. They're literally just dealing with fires all day, right? I look at dev. I know Malcolm's there too, because I see him on the same threads, but our dev ops, our Terraform, our Kubernetes, you see these people they're discussing what's going on in their business, and it's harrowing to read. I was like, "Oh, my God." You forget how good you have it sometimes when you're at org with well-established dev ops principles. Many of these companies are just like, their ops teams, they're just underwater. They don't have the time. It's like, "Hey, why aren't you guys doing infrastructure as code?" It's like, "Dude, I work 60 hours a week. I hate my job. I can't work another 80 hours a week."

That's the thing that sucks is if you put in that little bit of effort into the automation of reproducibility, it pays dividends, right? I think it's all about just finding the time. I think that one thing that we've been brutally bad at as an industry and from the apps ops to dev ops transition is is marketing what we do within a business. I remember one of my first startups, it was a video streaming platform in 2008. Imagine that. 2008, video streaming platform in the browser. Our CEO walked by one day and was like, "I don't understand what you do here." I was like, "You know how this thing has never been down? That's what I do." That was the moment that I realized, it was like, I work my ass off.

I remember, very early that was we had 5,000 people watching a video stream of the Mythbusters on our platform, and the thing stayed up. I was just like, I didn't realize that we were that good. I thought this thing was going to fall apart. This person that was literally two seats away from me was like, "I don't know what you do." I never talked about it, right? I think that on the teams, we are these people that maybe might be a little less social. We're definitely less boastful. Especially versus a front-end engineer. Sorry guys, I love you guys, too. But I think we don't do enough to show the importance and the impact that we have in businesses. That's why a lot of these teams get loaded with all this debt. They get loaded with so much death by 1,000 cuts all day long. They just don't get the time to embrace that original idea of DevOps, that Kaizen, that continuous improvement.

If you don't have the time to do it, you just don't have the time to do it. That's where you got to find is you got to find that small project, that little thing that has a big impact in your business, whatever it is that's going to take some of the stuff off of your plate, so you have the time to start investing in it. That's what we see almost every single time that somebody comes to Massdriver and they're like, "We haven't picked an IAC tool yet. We haven't started doing IAC. How do we even get started?"

I think the other thing for them that is also a part of this death by 1,000 cuts is, okay, we're going to adopt IAC. Well, guess what? The business has been around for eight years and we have about 40,000 things in the cloud. How do we even go backwards? That itself is harrowing, right? I think the right answer is don't - just pause on that. Because that is a huge endeavor and it is painful. What do you have coming up next that you can start to use IAC for that's net new? Great. Now, you can reproduce it. Now, you can understand it, right? That's going to buy you time. Let's do that again. Then when we get to the point where we got a really good established baseline for how our business works around infrastructure is code, now let's go look at the big bag of stuff that we got to bring in. Looking at everything you have today and saying, that's how we're going to start. That's a death knell. You're doomed.

[0:45:03] SF: Yeah. I mean, I think ops probably needs to hire a new PR firm. I mean, the big challenge, I think, of a lot of the back-end work that happens is it's always easier for organizations to justify clearly customer facing applications, right? If it's front end, I don't need to understand the engineering work that goes into it to see that, okay, this is actually touching one of my customers. I have no idea about the magic black box is happening behind the scenes and who does what there and why it's important and stuff like that. It's really on, I think, engineering leaders in the organization to make those business cases. Anyway, Malcolm over to you. What are your thoughts on this?

[0:45:42] MM: Yeah. Our user base, I would say, we tend to work a lot with people who are somewhat established, but deciding to transition into IAC, or they're up and coming and maybe they just created an ops team, or something like that. A lot of what we find is it's really just this cultural inertia of I'm used to clicking around the UI, or we want to cut to more of a structured workflow, but half of our team just doesn't want to get their credentials revoked and they'll be really upset about it if they do, so we got to ease into it.

A lot of us, what we see is just definitely cultural and not even just the - even ones with buy-in from above, just getting the rest of the team in on it. Especially in the larger organizations that decided to adopt IAC, they can have groups in just other countries and on different time zones. Once you revoke those credentials and someone's really upset, you don't want to deal with that in the morning, or you maybe have a language barrier, or a cultural barrier that it's hard to explain what's going on there. It's fractal, right? We're talking about the Reddit post and you have some people coming in just saying, "Hey, I'm trying to do this, but I can't figure out how to do it." If you're converting to IAC and you don't have someone who can whip up those examples really quick when someone is, for example, trying to set up their networking layer, the person is just going to get frustrated and click around and be like, "Fine, it's done. I figured it. I did in half a day." Their manager is like, "Great. That's awesome." Rather than spending three days working through how all the providers work.

It just takes a lot of time. That's why I think what Cory said about just start the new things and how you want it to be and go from there is absolutely fantastic advice for anyone doing this. It's just too big of an elephant, if you look at all the things you have right now.

[0:47:33] SF: Yeah, you don't have to boil the ocean. You can walk around, essentially. Well, Cory, Malcolm, this was awesome. Thanks so much for being here.

[0:47:40] COD: Yeah, we appreciate it.

[0:47:42] MM: Yeah, thank you. It was awesome.

[0:47:43] SF: Cheers.

[END]