[00:00:00] LA: Tyson, welcome to Software Engineering Daily.

[00:00:03] TK: Lee, it's always a pleasure. Thank you so much for having me. I'm really excited for our conversation today.

[00:00:08] LA: I'm glad you're here too. This is what, the second time we've gone through this process. Once for my other podcast, and then now we're doing it here. So, I'm glad that we can catch up and maybe make some enhancements from our previous one.

[00:00:23] TK: Likewise. Can't wait.

[00:00:25] LA: Cool. Cool. Let's start out with, let's talk about complexity. Okay. The game of the complexity of IT solutions, really what that means is that human error can play a huge role in the quality of a cloud-based infrastructure. Human error can dramatically impact how your IT infrastructure systems work in the cloud.

So, infrastructure as code solutions in general, things like Terraform, et cetera, that's a tool that can help reduce the role of human error, or at least identify when it occurs. How are we as an industry helping to reduce the cost of human error in cloud-based infrastructures?

[00:01:11] TK: I think that when you look at infrastructure as code, when you look at kind of the promise of Pulumi and Terraform, and even the CSP native tools, ARM, Resource Manager, CloudFormation, right? The goal of infrastructure as code is to help eliminate choice, because with the cloud, there's just so many choices you can make. You talked about complexity, and just the overall complexity of building resources on the cloud. I have a couple interesting statistics to back up just why this is so complex. If you look at all the industry analysts, Gartner, Forrester, even the annual Stack Overflow developer polls that we're seeing, it's clear that every company is trying to become this cloud enabled software company. But less than 8% of developers globally say they have true proficiency in cloud, right? Expertise. And only 3% say they have proficiency in security.

So, what happens, what we've seen is you have all these organizations migrating to the cloud are using it. But in most cases, they really didn't have all the necessary expertise to do that, and they end up making all these mistakes along the way. Because of that, they're now paying for all these tools to help catch and fix these mistakes. They got six, seven fingers of budget to spend on a security tool, because their security isn't good. It has to be, but they just have a hard time staying on top of it. They're buying cloud cost reduction tools, because the spend is too high, and now the CFO is unhappy, the CTO unhappy with the head of cloud. You get these compliance tools, because people are changing resource config when they shouldn't be tools for governance. You get the point.

But all this budget, these companies are allocating for these tools, that need only exists because with cloud, as I talked about, they're just these endless choices and configurations. So it's easy to make mistakes, unless everybody has complete intentionality behind their actions, and that intentionality Catch 22, it only comes after years of experience.

[00:03:15] LA: You have to be an expert in order to know what you're not an expert on. Until you become an expert, you create all these problems. So, you can potentially create all these problems.

[00:03:26] TK: So many problems, right? That is the root problem right there. It's balancing the need for cloud expertise and controlling choice with getting to market quickly. Everybody wants to get to market quickly, so we can innovate. All these tools out there, they're fantastic tools. Don't get me wrong. But instead of solving the root problem of just getting things right from the start, they're finding these clever ways to put band aids on things in order to cut cloud cost. Your spend is too high, great. We'll help you optimize your workloads. When we think about it, the root cause of cloud overspending is because the system was just built wrong in the first place. Why not build it correctly? Ninety-nine percent of all cloud security breaches are caused by resource configurations, which are entirely preventable. All you need is the expertise to not mess it up.

If we take this back to infrastructure as code and managing cloud complexity, our view at AutoCloud, I think, just generally for the industry, is instead of spending all this time and this money, putting band aids on things after the fact, and beating your head on the wall for how to do cloud correctly at scale, why not just do this right to start? If we use infrastructure as code, and we take away console right access, you can't go into the gooeys or the cloud consoles and do anything, you're able to understand ahead of time what your compliance posture is. How much things are going to cost. Who created something you can codify all this complexity and have a single pane. Because in 2023, every cloud resource you have must be represented with infrastructure as code state, otherwise, it’s shadow IT. It might break your rules. You can't have that it's an unacceptable risk.

So, that being able to represent everything as infrastructure as code, and then have the tools necessary to understand those insights, you can de-risk cloud adoption, scale and cloud usage ahead of time, and infrastructure as code is just so central to making that happen. So, it's an incredibly important tool in our arsenal, especially in this day and age.

[00:05:23] LA: Early in the cloud world, when people first started using the cloud, they go to the console, they create the resources they need, they make the configuration changes they need to figure out what's going on, and then get it working, and then realize that I have no idea what I did, and I can't reproduce it anymore. But that's really the way most people start, or at least, whether we're talking individuals who are new to the cloud, or organizations that are new to the cloud. That's kind of the model that they originally started. They very quickly moved to adopting infrastructure as code, because it becomes a way they can programmatically create the resources, and to do change management and all those goodness that comes with change management to manage the infrastructure.

But that by itself doesn't get you an infrastructure that is secure and safe, and does all the things you want to. You need something on top of infrastructure as code, and that's really what AutoCloud is doing, right? It provides the mechanisms for ensuring that what you're building an infrastructure as code is reasonable. Is that a safe way to say it?

[00:06:32] TK: That's a really good summation. Yes, I think there's two major things to talk about here. You brought up the fact that a lot of organizations start off using the cloud by doing what we call ClickOps, which is like login to AWS as your Google Cloud. “Hey, I don't really know what I'm doing. I'm not a cloud expert. I'm going to mess around build some stuff. Okay, I'm going to get it working.” Great. But then what, right? How do you take all of those clicks that you've made, and somehow capture those? The scary thing is, you can't. Even scarier thing is, this leads to infrastructure sprawl. This is the root cause of that complexity that we're talking about, not having a centralized way to manage everything.

So, in order to make sure that you're moving forward, and setting your organization up for success, you not only have to do everything right moving forward, and I'll talk about what that might look like in a second. But you also have to capture all this existing legacy stuff out there, right? Because if it's not in your single pane of glass from a state perspective, with infrastructure as code, with a tool like Terraform, or something like Pulumi, well, it might as well not exist. It’s shadow IT. Then, as you capture those things, it's really important to turn those things you have into the standards and practices that your organization can use moving forward, and clean up any gaps with security, or compliance, or governance, or cost, before you codify those patterns.

To your point, with infrastructure as code, there's a ton of complexity. You have to know cloud, you have no security, you have to know how do I declaratively state what I want these resources to spin up in the end state there. So, what happens today is you have this fragmented ecosystem of tools. You have IAC tools. You have security tools. You have cost tools. You have all this process that needs to happen to be able to stitch these together. Our philosophy at AutoCloud is to let anybody create a bespoke marketplace of infrastructure as code blueprints, that allow you to create different cloud resources and different cloud providers with built-in guardrails. Because if you think of where platform engineering is headed, we have DevOps, and we have platform engineering. And as we look to the future, we think that in the next two to three years, most companies will build self-service developer portals, because it's really annoying for the platform engineers and the DevOps teams to go and spin up this new three tier web application for Azure, for their application developers to run their systems on. It would be much easier if they could take all that institutional best knowledge they have and the best practices put in place the guardrails, and then give the application developers the ability to say, “Hey, I need to go and develop something. Let me go and create my own infrastructure. I don't know Terraform. I don't know cloud. I just know this thing needs to run.”

But if we empower folks with the tools necessary to do these activities, then we have a much faster deployment rate. We have a much cleaner process overall, less headaches, less burnout for developers and engineers, less overworked people, and we're able to de risk everything ahead of time by knowing that we're good to go from a cost, from a compliance, and from a governance perspective. 

[00:09:41] LA: That's great. I think one of the problems here is that Terraform isn't easy, right? It's a learning curve on top of the cloud. So now you have to learn cloud just to do ClickOps, as you call it. I love that term, and I have heard it before but I love that term. Just to do ClickOps in the cloud is, there's a learning curve that's all based on the cloud provider. But to add Terraform on top of that means you got a learning curve for Terraform on top of the learning curve for the cloud, and it's another level of complexity there initially. So, the learning curve is deeper, but obviously, the better that's come on later.

Can AutoCloud help without learning curve, too? Is that part of what it does, is it reduces the need for that learning curve?

[00:10:31] TK: Absolutely. We want to demolish that learning curve. We wanted to simply evaporate and not exist anymore. The way that we do that is we can take any existing code that exists. We have a vast library of patterns. If you have your own patterns that exist as most organizations do, you can simply upload those to AutoCloud, and then we can put on top of that all the guardrails for you automatically to make this easy to use. Now, your experience of making a new VPC or new VMAT, or AKS in Azure, or GKE, or EKS on AWS, throwing a lot of acronyms out there. But it's as simple as filling out a form. You don't need to know what Terraform is, the syntax, the best practices. AutoCloud can help you automatically generate that infrastructure.

Because to your point, learning AWS self is tricky enough. I think of to the solution architect’s exams that I've taken, those are really tricky, right? That requires a huge amount of time and knowledge and many years to be able to effectively pass those certifications, and feel like you have expertise. Now, double whammy. All right, now go do this in code. Write code and understand, “Hey, not just any code”, not imperative code, like, Apple application developers, but declarative code that is going to go and spin this stuff up. It's almost impossible and that's why so few developers globally say that they truly have the expertise to make that happen.

We're trying to democratize this and make it easy for teams, regardless of their cloud experience, to use the cloud with complete intentionality, remove the expertise barriers for both the cloud providers themselves and infrastructure as code, and let you get your cake and eat it too. The best of both worlds, allowing you to build different workloads and different cloud providers, without necessarily needing to be the world's foremost cloud experts. It's something that's very near and dear to our hearts, and we're really focused on that developer experience and that quick time to value.

[00:12:32] LA: I recently went through a process of taking an application and building a production deployment environment in AWS cloud in this case. I chose CloudFormation to do it. Because it was an experiment, I hadn't used CloudFormation as extensively for this large of a project before. But I ended up creating, oh, 3,000, 4,000 lines of CloudFormation code, in order to create this infrastructure. It took me about two weeks to create this thing, because of how complex it is.

Now, I know AWS pretty well. I was there in 2005 when started, and I built part of AWS back in those days. So, I know how these things work. But yet, even with that, this was a complex process to build. It would have been so much easier to do it in the console, but I decided I didn't want to do any of that. I want to do it all the right way in CloudFormation, even though it's a smaller project, I wanted to do it just to find what the experience, the startup experience is like into those environments. And I was shocked at how difficult it really was. I've had a little bit more experience with earlier Terraform days, and earlier, some experience with CloudFormation as well.

But I found that CloudFormation for this complex application I was building was more – it seemed like more complex than it needed to be. Is it correct that I could have used AutoCloud and gone in and done something like, I need to create three services with deployment pipelines for those set up using POM code build and these services are ECS Fargate, with a load balancer front end? Do all these things descriptively in a much simpler way than those 3,000 lines of code. I could have done that in AutoCloud and AutoCloud would have generated that code for me.

[00:14:45] TK: Lee, I feel bad. We could have helped you for that process to make it so much simpler. Don't get me wrong. CloudFormation is a powerful tool. You're able to do a lot, but there's a lot of drawbacks to using things like CloudFormation. CloudFormation doesn't really have the concept of state. It's not as a friendly as a language to work with. At the end of the day, this is a very common story that we hear where people go write thousands of lines of code to do these things. If we look at Terraform, or Pulumi, or more modern infrastructure as code tools, what we would do in this case is we would create a series of best practice modules, that would be able to help you facilitate what you need to do, and then make those really easy to do.

Think about what you tried to do with your services, right? There needs to be a VPC module. There needs to be an ECS module with Fargate. There needs to be load balancer modules, right? And then, not only would the experience be much simpler, and much more modular, but you have these best practices that you can then deploy over and over again, and we have the added benefit of having the stitch together components that allow you to apply security, compliance, policy, all that stuff to that code, because I'm guessing with your CloudFormation code, that's hard to validate and do the static code analysis and understand, “Hey, am I making any mistakes from a cost perspective here? Am I making any mistakes from a compliance perspective here? Did I accidentally have my security group 00000/0? Is it open? There's a lot of risk to doing things in these ways where writing these large difference multi-thousand line files to go and spin up this infrastructure.

So short answer, yes, that's definitely something we could have helped you with, to make your life a whole lot easier.

[00:16:32] LA: In this case, I was trying explicitly, as an experiment on CloudFormation, just to see what CloudFormation was like. But certainly, if I would have gone to Terraform, I would have gone with you and talk to you about that. Then, I highly regret that now. Because I think Terraform, and with AutoCloud would have been so much easier than just raw CloudFormation. I was actually disappointed at how difficult that was to create this in CloudFormation. But the advantage of CloudFormation, of course, is it's tied closely with AWS, and if all you're doing is AWS, it's the closest to the metal, so to speak. But yes, Terraform, has a lot more features and a lot more capabilities. Certainly, with AutoCloud on top of that, that would have been great.

But I noticed recently, there's been some debate in the industry, right? There's been some debate, Terraform recently changed their open source licensing language. I'm wondering, I know you've got some thoughts on this, and I wonder if you could give me what are your thoughts on that licensing change? First of all, can you describe to our listeners what changed?

[00:17:44] TK: Yes, for sure. So, for those of you that are not familiar with Terraform, for the last nine years, Terraform has been under the MPL license, the Mozilla Public License, which essentially allowed them to build this vibrant open source community to its contributors, third-party tools, all of these different things that made Terraform arguably what it is today. Obviously, the founders of Terraform, Armon and Mitchell, really smart guys, a ton of respect, and they were able to take this amazing idea that still honestly doesn't really have a rival today, and build it into this massive, massive public company.

But recently, what happened about a couple of weeks ago, is HashiCorp decided to adopt a BUSL license, which is a less permissive license. I'm no lawyer, so I'm just doing my best to interpret this here. But this basically makes it so that there are exceptions to how one can use Terraform, and you're not allowed to use Terraform and offer it in any product offering that might be considered competitive to HashiCorp. It's basically, and again, talk to a lawyer here. But our understanding and interpretation of this is HashiCorp basically gets to decide how that is enforced. This really created strong debates in the open source community, and ruffled a lot of folk’s feathers. Because when you look at how HashiCorp has become the company that it is today, arguably a lot of that has been driven by open source community and contributors, right? Look at the Terraform binaries. Terraform Corp has over 1,700 contributors. The AWS provider has over 2,800. The Azure provider, 1,300. And the vast majority of these contributors do not work at HashiCorp. Right? That's not even counting the thousands of other providers in the Terraform registry. If you look at Terraform modules, there's 14,000 alone in the Terraform registry. The vast majority of these are not built by HashiCorp employees. Look at all the tools out there, the wonderful tools that a lot of companies have relied on for free to make their lives easier, things like Terratest, tfsec, TF for cost, terraform-docs, not built by HashiCorp employees, not to mention the learning resource. Everything from the books Terraform Up and Running, Terraform Best Practices, the courses, right?

So, this is all to say that it's totally Hashi’s right to do this. They're building out profitable business. But it really kind of rankled the community and broke a lot of good trust for folks that are out there. From a HashiCorp perspective, still not a public company, you're wondering, “Hey, is HashiCorp even going to be around in the next three to five years? Is, I don't know, Oracle going to buy them because their business is struggling or they're not able to kind of be profitable?” So, I understand the reasons and rationale for why they change the license. But there's a lot of scary implications from this, and it's gone so far as there's a consortium of over 106 companies, and I believe over 400 individuals that have now signed what is called the OpenTF Manifesto, which is proposing if HashiCorp doesn't change their licensing back to the MPL license, actually fork Terraform and maintain a completely open source, free open source software fork of Terraform, in perpetuity, in order to be able to allow folks keep building on this ecosystem. Because it's kind of sad to see, hey, all these folks made so many contributions, these great companies were started. Now, HashiCorp, hey, we see these as competitors, we're just going to try and shut them down and kind of feels like the big man stop and on open source a little bit. So, I have a lot more thoughts that I can share there. But that's basically a summation of what's happened in the last few weeks, and the community's response to this license change.

[00:21:53] LA: Yes. I was going to say, as you were describing there, it sounds like a fork was coming. We've even seen that happen before. You've seen that with very early in the days, the Red Hat and Linux, and a lot of forks have occurred over licensing reasons. It'd be a shame if that happened here. But, I think, the million-dollar question is, where is the majority of the support for Terraform? Is it in the open source community, or is it through HashiCorp? I think it's in the open source community. But obviously, HashiCorp has a different point of view.

[00:22:32] TK: Yes. I think, imagine if the creators of Linux or Kubernetes suddenly switched to non-open source license that only permitted noncompetitive usage. Imagine what the hell it would do.

I think, to your point, yes, Hashi and their full-time employees are doing a great job of stewarding and moving the project forward. But there's been a lot of contribution, and a lot that the open source community has done. It's just sad to see it come to this. I think that there's plenty of room for everybody to succeed. By virtue in creating this ecosystem, they're at the very center of it, right? They're able to take advantage and do everything that they need to do. But you look at the innovations that were made from Terraform cloud. Companies like, I’ll mention Spacelift, and other Taco back-end systems really came up with those innovations, which HashiCorp is now copying and putting into their product as well. So, it'll be really interesting next few months to see how this shakes out. But just an absolutely earth-shattering announcement that was made a couple of weeks ago in the community.

[00:23:38] LA: Yes, we'll have to come back and do another episode talking just about licensing, to get that covered, because it's – and once we know a little bit more. Because that's obviously going to be a big deal for the industry. This is going to affect a lot of things.

Let's move on from that a little bit. The cloud’s been around since 2005, essentially – I mean, you can argue with the starting date of the cloud. But I agree with you. I think we talked about this before that 2005 with the advent of EC2 in AWS was about the time when the cloud really, really got its start. Over the years, it's changed and grown. Automation was always a central tenant to the cloud. Now, we're talking about automating the automation, right? Can you elaborate more on what the automate the automation means to you and means to AutoCloud?

[00:24:36] TK: Yes, absolutely. It's funny, I think this was before my time, but I think back to the loud cloud days precursory the modern cloud era when Amazon decided to do this. The cloud has been around for a long time, and automation has been around for a long time. I think that when we think about the future of cloud and where things are going, we have the absolute privilege to work with somebody who I would consider one of the foremost experts in cloud today. I met Kelsey Hightower, who was for a very long time, a distinguished engineer at Google, and has a very amazing history prior to that. He actually just left to focus more on things that he likes to do.

But when I was talking with Kelsey a couple of weeks ago, I was talking with him about the future of cloud, and where he sees the industry going overall from an automation perspective. While I think we're in the early days of generative AI, and LLMs writing a code, there are solutions out there like Copilot and various others that are pretty good at doing this stuff. While I definitely would not trust ChatGPT to write production Kubernetes code or any Helm charts, or things of that nature, just due to the subtle mistakes. You won't really understand unless you know what you're doing. I was talking with Kelsey about this and his philosophy, and I think this is really cool, in where we're going in terms of automating the automation, is we're eventually going to get to a place where there's going to be systems that know your organization and your standards so well, that you'll simply be able to state what you want to do, and the risk profile, and other metadata for that given workload that you're looking to create. That will be automatically created.

Again, I don't think we're there yet. I think this is just the very early stage of this thing. But I think when we think of automating the automation, we have all these automation tools like Terraform, Helm, and all of these other things that are making it so we don't have to go and manually make changes and do things in that way. I think the future of automation, is being able to take those tools that we have, those wonderful infrastructure as code tools that we have today, and making them even more powerful. So, I can simply have all these Lego blocks and best practices, standards in my organization, and then come up with a use case, and have my code generated automatically in a way that will work. I think that if I had to throw out a date, I think that by 2026, this is going to be reality. This is going to be a thing where things are at.

In AutoCloud, we're trying to get ahead of the curve here and make sure that we're able to understand everything about your organization in terms of how it works, the policies you care about, the standards you have, and to automatically generate the cloud infrastructure for you. Today, we do that on kind of a use case to use case basis, by coming up with a bespoke collection of infrastructure as code blueprints that anybody can use, regardless of expertise. We're moving to this place where I think the automation is going to be even more powerful in the future, to the extent where we're going to see a giant shift in what DevOps platform and cloud engineers are doing. Instead of going and fulfilling all the service requests, “Hey, I'm one of the app team developer members. Can you build me a pattern? Can you build me a three-tier web stack for GCP to go host my application?” That will all be automated. Now, the really cool thing is all the cloud experts and the platform team and the DevOps engineers can focus on the interesting work, not that monotonous work of having to go and support ticket requests for creation of new infrastructure, but the cutting edge stuff that's going on.

So, I think that we shouldn't be worried about the machine overlords taking our jobs. Ultimately, this is going to be in service of allowing people to do higher level more important. That's where we see that going.

[00:28:30] LA: Yes, I always talk about generative AI as the next step from a standpoint of computing, of developer experiences, the next step up from platforms, the next step up from high level programming languages, which is the next step up from programming languages, which is the next step up from assembly language. None of us, well, I did my very early days, but none of us did do machine language programming and I don't mean assembly language, I mean, machine language, right? We always use tools to make the experience easier for us to develop what we're trying to accomplish, and start relying on those tools to do the grunt work underneath, to turn our thoughts and our ideas and the code that we want to produce into something that's usable.

All the generative AI is going to do is give one level above that to, where we're able to still decide, like you say, the interesting parts of what we're trying to build and let generative AI do the mindless work of actually creating the infrastructure to actually make that happen. One of the things we need for that to happen, I think, you touched on this a little bit is generative AI is 95% good. But the problem is, that 5% is so hard to detect. Finding mistakes in code that's created, finding the wrong concept in an article that was written using generative AI that's stated incorrectly or insulting or it or is just wrong. Same thing in a movie script, I'm sure, the same of – the 5% is going to be so hard to resolve. And the way we do that is through policies, procedures, guidance, that are probably also AI driven, by the way, that help us rein in the generative AI to produce standard results.

So, I think we've got that step yet to do, and that step is in process. But I think, 2026 is a reasonable timeframe time period to be assuming that we're going to be able to depend on this higher-level experience. But as you say, it doesn't get rid of the need for us. It changes the type of work that we do, and changes it to be a higher-level work, just makes the industry that much better.

[00:31:08] TK: Oh, yes. I couldn't agree more and there's clever things that we can do to help make this process even better. It might be 95% correct with a 5% error rate. But what if you could just give an example of the end state that you're wanting to achieve, and have that end state be the thing that is essentially the test, that things are tested, right? So, right now, you don't want your production Kubernetes code generated by ChatGPT for obvious reasons. But over time, as we begin to get more examples of good configurations and what things should look like, we can generate the code and have the code update itself on the fly as it's being generated, be able to match a system that can generate a good end result.

So, I won't get too much into it now. But there's a lot of clever things that we can do from an overall generative AI and training perspective to make these models even more sophisticated, and that's what I'm really excited about as we look to the next couple of years, and all this becoming even more mainstream.

[00:32:09] LA: I absolutely agree with you. Thank you, Tyson. We're at time here now. My, my guest today has been Tyson Kunovsky, the Founder and CEO of AutoCloud. Tyson, once again, thank you very much for joining me today on Software Engineering Daily.

[00:32:26] TK: Lee, it's been an absolute pleasure. Thank you so much for having me.

[END]