EPISODE 1743

[INTRODUCTION]

[00:00:00] ANNOUNCER: Argo is an open-source suite of tools to enhance continuous delivery and workflow orchestration in Kubernetes environments. The project had its start at Applatix and was accepted to the Cloud Native Computing foundation in 2020. Michael Crenshaw and Zach Aller are both lead maintainers for Argo. They join the show with Lee Atchison to talk about the origins of the project, what problems Argo solves, the four core tools in Argo, and more.

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His bestselling book Architecting for Scale is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business. Produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com. And see all his content at leeatchson.com.

[INTERVIEW]

[00:01:15] LA: Michael, Zach, welcome to Software Engineering Daily.

[00:01:18] MC: Thank you. Good to be here.

[00:01:19] ZA: Thanks for having us.

[00:01:20] LA: What led to the creation of Argo? What was the problem that you were trying to solve that caused Argo to come into existence? 

[00:01:28] MC: The creation of Argo actually predates when Zach and I joined into it. There was a startup called Applatix. And the problem that they were trying to solve is orchestrating sort of pipelines in Kubernetes. Kubernetes is really good at stateless applications, handling API requests, things like that. But where it struggled was you have some process that loads in data, transforms it, puts it out on an S3 bucket. Long-running stuff. 

The developers for Applatix started writing Argo workflows. And at the time, it was just called Argo. And as that product matured, Intuit noticed it. And Intuit was busy trying to get into the Kubernetes space. So, they acquired Applatix. And the developers went on to add to the Argo suite. They added Argo CD, then Events, and Argo Rollouts.

[00:02:24] LA: Yeah. Let's go through a little bit more of the overview of Argo itself. You kind of alluded to it. But there's actually four separate tools that make up Argo. I think you said it was Workflows, CD, Rollouts, and Events. Can you tell me what those four tools are at a high- level and how they fit into the overall CI/CD process? 

[00:02:43] MC: Sure. I'll cover the ones I'm most familiar with and then I'll toss it to Zach for the one that he's a lead on. When I first started working with Argo, it was Workflows. That's the one that handles sort of long-running processes. While working on Argo Workflows, the developers real realized that, a lot of times, these workflows were being triggered by events from cloud-based systems. SQS queues, Kafka, etc. And they needed a Kubernetes native event broker to take in those events and then trigger workflows. That's the Argo Events product. 

The third one, the one that I'm a lead developer on is Argo CD. And it is about, let's say, GitOps operator. Its job is to take Kubernetes manifest, which are defined in Git, and apply those to a Kubernetes cluster. And it has all kinds of tooling; UI, CLI, etc., around making that process very, very user-friendly. That's three of them. And then Rollouts is the one that Zach works on.

[00:03:45] ZA: Yeah. Argo Rollouts is kind of meant to be a drop-in replacement for the native Kubernetes deployment object with some more complex deployment pattern. The two big ones being blue/green as a style and then canary, where we can shift traffic over to your new version over a period of time. Well, Argo Rollouts in the background could run things like analysis, querying, various providers to get metrics to determine whether or not to abort that canary or let it continue. Those are kind of the four big projects. 

[00:04:16] LA: Got it. Okay. That actually helps a little bit for me. Because I originally thought that all four were tied into the CI/CD workflow process. But it sounds like only in a two-hour CI/CD process and two of them are operational services. Let's focus on the operational ones first since they're the ones you're the least familiar with. But let's focus on workflows and events. Let's go one level deeper on what each of those do and how they fit into - what's an example application that would make use of Workflows and Events? 

[00:04:51] MC: One that I worked with Zach on actually at a previous job was CI/CD. Argo Workflows is a generic pipeline system. It'll do whatever you tell it to do in a series of steps. And that's a very intuitive way to handle CI/CD pipelines. Build an image, bump an image tag in Kubernetes manifest Git repo, run unit test, things like that. That was a project I worked on with Zach. 

Prior to that, I worked on a project that did computer vision work on satellite imagery. We would use Argo events to see. We have a new image that's been dropped into an S3 bucket. We'd pull it down. And then Argo Workflows has the ability to do MapReduce style stuff. Step one, break that satellite image up into smaller pieces. Because these images are massive. 

And then for each chip, they would call it, of the large image, we would run CV over it. Detect presence of some object. And then at the end, it would collapse all of those detections down into one step. Write it into one JSON file and then push that back up to S3. Basically, anything that takes a long time and needs sort of a directed acyclic graph style processing capability, Argo Workflows is great for them. 

[00:06:10] LA: Cool. And you use CD as an example. But it's not tied to deployments at all. It can be any sort of workflow project that you need. In the case of satellite images, it was processing the images. And it can split into multiple independent tasks and then rejoin them together later. The flow through process is what Workflows deals with. I understand Workflows. That makes sense to me now. How does Events differ from just basic webhooks? What is Events doing for you? 

[00:06:41] ZA: Events was created basically as a - now that we have these workflows and they exist, a lot of people wanted to trigger them with various systems. Whether it's a Kafka stream, an SQS stream. Various event processor or event consuming and producing systems. And Argo Events basically acts as the event broker. It knows how to connect to SQS. And then it knows how to do something with that message. Like start a workflow or do some other task. It can actually do things that aren't related to Argo. But, basically, it sits in the middle as kind of like an event middleman that knows how to talk to various services and knows how to produce various outputs. And so, that tool gets used a lot to either listen to a queue, fire off a workflow, listen to a queue, produce another message somewhere else. And it kind of goes hand-in-hand and is tightly coupled with Workflows because of that. 

[00:07:33] LA: It'll do polling, and then event generation from polling, and things like that.

[00:07:37] ZA: Correct. Yep. It has a bunch of providers that it knows how to talk to. So, it knows how to consume Kafka messages. It knows how to consume NATS messages, RabbitMQ. Various systems it knows how to consume. And then it knows how to produce various outputs as well.

[00:07:54] LA: Would that be a capability that you build into your service? You tie this in as a library? Or is this a standalone service that does this work? 

[00:08:03] ZA: It's a standalone service that is basically configured via Kubernetes Manifests. The idea being you would create an Argo Events manifest file that told it how to say connect to SQS and run a workflow every time it saw a message as an example. 

[00:08:20] LA: It appears as a separate service within Kubernetes.

[00:08:25] ZA: Correct. Yeah. It's its whole own deployment tool.

[00:08:28] LA: Okay. And does Workflows do the same thing? 

[00:08:31] ZA: Workflows has recently added support to accept webhooks. But, no. It doesn't do the same event brokering that Events can handle. 

[00:08:40] LA: What I mean is does it run as a separate Kubernetes service independently? 

[00:08:43] ZA: Oh, yes. Yep. It's its own controller. Its own Kubernetes CRDs. Yeah. All the four Argo projects, you can actually all run them independently of each other. There's really no linking between any of them.

[00:08:55] LA: How does Workflows talk to the rest of your systems then without webhooks or other mechanisms? What type of service calls does it make? 

[00:09:06] ZA: Argo Workflows is very generic in the sense that it just orchestrates communication between pods. If you have a particular step that correlates to a pod that's running some Python code, you basically use Workflows to orchestrate those steps in passing of possibly artifacts between different steps and spinning up the pods, tearing down the pods, and doing the orchestration of your pipeline with the pod as the unit of work. If you want to make a call to some external system, you just do it in your Python code for that particular step. 

[00:09:42] LA: Okay. Okay. That makes sense. I see. I see. Okay. Those are those two. Now, the two that you're the most familiar with are CD and Rollout. Let's talk about CD first. Now, these two are CI/CD-related, very specific to that process. We're now talking about the DevOps workflow here. Michael, I think you were the one that focused on CD. Why don't you start with CD? And how would you use CD? And what features would it give you over any other CD system? 

[00:10:12] MC: Sure. I recently saw someone say that you don't use Argo CD so that you can use Kubernetes. You use Kubernetes so that you can use Argo CD. And that's a pretty strong statement. But I think the argument it's trying to make is that the declarative API that Kubernetes provides is an extremely powerful and intuitive way to manage systems. The fact that you can look at a file and say, "Here's the state of my system." And something else is doing all the hard work of making sure that that state is actualized in servers and processes, that's someone else's job. 

Argo CD takes advantage of that declarative system by saying you can define the desired state of an application, of an infrastructure, whatever you want. We don't really care. But you define it in a GitHub repo as a YAML file. And then Argo CD will do the job of taking that YAML file and applying it to Kubernetes. That's step one. You got to get it out there. That's the process of being a GitOps operator. 

The second stage that Argo provides you is it has a beautiful user interface that says here is the commit that we're currently pointing at in Git. Here is whether we're currently synchronizing it. Whether it's already been synchronized out. And then once your state has been synced to the Kubernetes cluster, Argo CD constantly watches that cluster. And it sees, "Okay. Did someone else change something? If so, we're going to highlight that drift for you." And it'll also just show you the current state of the system. If I've deployed a stateful set, I can drill into Argo CD and see if that stateful is healthy and currently running the way it's supposed to run. Argo CD basically becomes a user interface for Kubernetes. 

How it differs from other CI/CD systems is mostly it's sort of the separation between CI and CD. Most of the prior systems; Jenkins, Spinnaker, etc., they are imperative. They're step-by-step. They say a user made a code change. Now, they've defined all these steps for. We're going to go deploy this to dev. We're going to deploy this to prod. Build images. Run tests. All those are steps. 

Argo CD says we're going to focus on CD and be very good at it. Our job is to take the state that you have in Git and get it onto your Kubernetes cluster. It is another system's job to do things like building images, running unit tests, etc. And decoupling those things makes it much easier for, in my experience, developers to reason about how their changes are getting out onto a live environment. 

[00:13:04] LA: Does CD very well. It doesn't touch CI at all.

[00:13:07] MC: Correct. You can hack CI stuff into Argo CD. But, generally, it doesn't work as well as just separating them cleanly.

[00:13:14] LA: Yeah. Yeah. Does Argo CD work with a variety of different CI systems then? Or is it independent of the CI system and will work with pretty much anything? 

[00:13:25] MC: Argo CD is independent of your CI system. If you're using Jenkins, great. If you're using GitHub Actions, great. If you're using a shell script running on your laptop, we don't care. Just as long as that system ends with pushing something to get that we can then synchronize out to your Kubernetes cluster.

[00:13:40] LA: The final push after you've done all your testing and everything. You need to do all the approval for workflows. All that stuff. The final push-out is an update to Git and you take that and you deploy whatever changes are specified in there. Whether they're infrastructure changes, or code changes, or whatever. 

[00:13:57] MC: Yep. Precisely.

[00:13:59] LA: Cool. Cool. That's great. And I imagine you could probably build the CI system out of workflows if you wanted to. But, again, that's more work. You're more likely in this situation to use something like Jenkins or something to do that. Are there any good declarative CI systems out there? Or are they pretty much all imperative? 

[00:14:20] MC: Declarative CI system, it's an interesting - 

[00:14:23] LA: I'm not even sure what that would look like if I think about it. 

[00:14:26] MC: Right. People think of like Jenkins as being declarative. But a groovy script, it has some declarative characteristics. But, still, you're going step-by-step. If something breaks, you just stop. Honestly, I really like Argo Workflows for CI/CD processes. It is imperative in a sense. it's step-by-step. But you still get to use that declarative expression in Kubernetes of those imperative actions. And you get to take advantage of some of the declarative niceties of Kubernetes, like retries, health check, stuff like that. Yeah. But I don't think I know of a truly 100% declarative CI system.

[00:15:06] LA: That's fine. That's fine. Let's talk about - well, we have CD now, which will make your production system look like the GitHub repository says it should look. But in order to do that, it's got to make changes. And this is where Rollouts comes in. Is that a fair statement? And do you want to move on with that, Zach? 

[00:15:24] ZA: Yeah. Your CI pipeline has, let's say, pushed a new image of code, right? You deployed a new version of your software. You're using Argo CD and Argo Rollouts. Your CI pipeline makes that commit to Git. CD has deployed it. Now, the Argo Rollouts controller is going to kind of notice that this change has happened. 

As an end user, being software developers, you genuinely don't want to have your change rolled out to all your users at the same time. It would be nice if you could catch an error. Catch some problem early with a small percentage of users and then make some action on it. What Argo Rollouts gives you then is the ability to configure your deployment model, let's call it. Your strategy to either be blue/green or canary. 

Let's say we have a canary deployment configured. And we have a set of steps where we can do setweight 10%, pause for 5 minutes, setweight 20%, pause for however time you want until you get to 100% of your traffic moved over to the new version. That's mainly what Argo Rollouts' main task is. While it's doing these incremental traffic shifting operations, you also can configure it to query Prometheus, query various metric providers, AWS's CloudWatch, a whole slew of them, to look for things like error rates. 

If you notice a tick and error rates go up, you can say - if it's above X-percentage, abort the rollout and rollback to the old version. That's kind of where Argo Rollouts sits within the CD pipeline, is kind of a progressive delivery system. 

[00:17:03] LA: Okay. It sounds like Rollouts and CD have to work closely together. 

[00:17:08] ZA: Argo Rollouts makes its state changes in such a way that its runtime state so CD doesn't - we control those runtime configurations in such a way that it doesn't necessarily affect Argo CD. The answer then being that they're kind of unrelated. Or you could use Argo Rollouts without Argo CD. 

[00:17:28] LA: Oh, okay. Okay. Yeah. That makes sense. Let's go through a very small but specific example. You got a Kubernetes cluster running an application. It's hundreds of services. One of the service is this one container. And you've got several nodes running in this one service of this one container. Something really simple. Now, CD detects that it needs a new version of the container. There's a new version available. 

What CD would do, without any sort of rollout process, CD would launch new containers, connect them into the service, terminate old containers, and be done with it. I would presume that's the sort of things that CD would do to set up the declarative state. It's a new version. We need to replace the old with the new. First of all, am I correct with that assumption in general? 

[00:18:21] MC: In a way. CD, it's a bit more naive than maybe that. CD says, "Okay. I see that you want this new container image version. I'm going to tell Kubernetes here is the specification." Say it's an Argo Rollouts spec that contains that container version. And then Kubernetes uses whatever operators are in place to see that declaration and then act on it. Argo CD basically quits after it says here's the new spec. 

[00:18:53] LA: The CD tells Kubernetes what to do. Kubernetes does it by reading the instruction that essentially are a Rollouts script. That's the integration point between the two. 

[00:19:02] MC: Exactly. 

[00:19:04] LA: Okay. I see. Okay. Great. Let's go forward with that. You got a Rollout spec. In my naive example, what Rollouts could do is something like turn on a new set of instances of the new version. And then red/green deploy back from the old set to the new set and then turn off the old set. Or turn on a few of the new containers and then migrate traffic at a service-level, a Kubernetes service-level between the old and the new based on something, which we'll talk about in a second. And then eventually, you move more and more traffic over. You load balance them all appropriately. And eventually, you turn off the old one.

[00:19:41] ZA: Yep.

[00:19:42] LA: Okay. 

[00:19:43] ZA: That's the high-level gist. Yep. 

[00:19:44] LA: And all that was triggered by CD. But CD didn't do the work. Rollouts with Kubernetes help did the work. Okay. That's how those fit together. Let's talk about how you do your deployment strategy detection mechanism. You send traffic to one set of containers or another set of containers in my naive example or whatever it ends up being. How do you make that decision? Is it based on load balancer information from the request? Or what's making the decision? 

[00:20:15] ZA: There's a handful of options there that Rollouts has. At a high-level, we have to kind of step back and take a look at how Rollouts operates. Argo Rollouts always tries to get to the defined state. You basically only ever have a stable version. And whatever the newest version is, whether you're in the middle of a roll out and you deploy a new one, you're always basically only having your new version and your stable version. 

And the handful of methods that you have of controlling traffic are what I call a basic canary. It basically uses pod counts. If you have a service with 10 pods and you start a new rollout and you have setweight 10% as one of your first steps in your rollout spec, we will spin up one new pod and tear down one old pod. Now we have one new canary pod and nine stable pods. 

[00:21:05] LA: How do you route then to the new versus the old? 

[00:21:10] ZA: That just depends on - basically, Kubernetes, the basic canary only uses its Kubernetes built-in networking support. We'll have a Kubernetes service that selects two replica sets, the new replica set and the old replica set which the rollout controller is managing the pod counts on those replica sets. 

The basic canary is just pure Kubernetes services, label selectors on the services. You have a rep service that selects the two replica sets to route traffic. Argo Rollouts also supports what we call traffic routers. If you are a team that has like Istio or some type of service mesh installed in your cluster, you can configure Argo Rollouts to use - let's just use Istio as an example, an Istio virtual service, which then Argo Rollouts will go configure that virtual service to route traffic. Which, in that particular mode of operation, instead of using pod counts, we keep - if we have 10 pods up, we keep 10 pods up at 10%. We'll spin up one new one. But we keep the old 10 around. And then we configure the traffic router to send 10% to just the new pod. And then it's up to Istio to control that traffic. That way, if we need to abort for whatever reason, we can just tell Istio to switch the traffic back right away. And there's capacity there with the old pods to handle that load. 

[00:22:35] LA: The traffic routing is done by the router, such as something that's built into Istio. You don't do that work. And you don't make the decisions about where it routes to. You just make this decision of how much traffic goes where. Not which traffic to go.

[00:22:53] ZA: Yep. Correct. There are some steps within Rollouts where - we've just talked about setweight 10%. There are some steps to do set header and set mirroring where you can't control the percentage that way. But you can say if traffic has some particular header, send all that traffic to the canary. 

[00:23:13] LA: You could send one user to a canary and not other - 

[00:23:17] ZA: You could via headers, yeah. If you have a step there that uses the set header step. And then Argo Rollouts is also coming up here. We're going to add step plugins, which is the feature that coworker, Alex, at Intuit has implemented for 1.8 that will allow people to define custom steps so people could implement their own form of configuring a traffic router to do some particular thing. 

[00:23:42] LA: Got it. Got it. That's all handled within Rollouts integrating with Istio or whatever traffic routers you have. Again, it will use CD - I mean, CD can use it, but it's not tied to CD in any way, shape, or form. It also sounds like it's not tied to work for Events or anything at all.

[00:24:01] ZA: Correct.

[00:24:03] LA: How do the these four tools fit together into one thing? Why are they all under Argo and nothing else is under Argo? And besides historical. Or if that's the answer, that's fine. But what puts these four things together? 

[00:24:19] ZA: Well, I'll probably answer historical. But I'll add some flavor to it. The masterminds behind the four products are all the same. They share similar designs, similar command line interfaces, user interfaces, and some very light integrations. For example, an easy way to link from an application in Argo CD to the associated workflow. Those integrations make them feel familiar when, say, an Argo Workflows user goes and uses Argo CD for the first time. They sort of exist in their own small ecosystem which ties them together. But technically speaking, each of them can be completely independently deployed. They just happen to really harmonize well together, I guess. 

[00:25:11] LA: Do you find most people use all of them? Or is there a lot of pick and choose? 

[00:25:17] ZA: There's a lot of pick and choose. For example, in the satellite imagery use case, we used Argo Events and Argo Workflows. Those two pair quite nicely. But we didn't use Argo CD. We used the Helm CLI to do deployments. And it just really depends on where your developers and your platform team, where they're all at to determine whether they will find all four products compelling and useful at the same time. 

[00:25:44] LA: Now, you are really natively tied to Kubernetes, right? I mean, you require Kubernetes for all four of those services for them to function. Not only for them to run themselves, but their interconnections and how they work with things are all very tied with Kubernetes.

[00:25:58] ZA: 100%.

[00:25:59] LA: That all makes sense. Is that your tie to CNCF? I mean, is that why CNCF became interested in you? 

[00:26:06] ZA: I think that's probably a fair characterization. Yeah. I think that as Intuit was trying to build its presence in the Kubernetes world and starting to use Kubernetes internally, Intuit of course wanted to have a home where the Argo project was happy, and secure, and stable. And CNCF was just a really perfect fit for that.

[00:26:27] LA: It was really Intuit that drove CNCF. Or I may not have said that quite right. But Intuit was the sponsor, the corporate sponsor that really promoted CNCF to adopt Argo. 

[00:26:43] MC: The main one. There are a couple other folks. For example, BlackRock donated the Argo Events project. I may have mischaracterized that earlier. Argo Events is originally completely a BlackRock product. And they donated it to Argo proj. But, yeah. Kind of the way graduation typically works is some organization has a project that they're really proud of and fits well into the cloud native ecosystem. They'll approach CNCF and start the process to get it adopted. 

[00:27:11] LA: Where are you in the adoption process now? 

[00:27:15] MC: All four Argo projects were graduated together. It is the terminal stage aside from sort of being - I suppose some projects can be sort of shuttered at end of life. But we are at the highest level of active project in the CNCF. 

[00:27:30] LA: Got it. Got it. And for the people who aren't familiar, can you go briefly talk about what those steps are within CNCF? There's different levels of adoption within CNCF. And for you to be at the top level, is that important? 

[00:27:44] MC: To the extent that I can remember them. Zach, jump in if I get any wrong. But I think there's like incubating, and then sandbox, and then graduated. I believe there are two stages before you get to graduated. And, basically, at the earliest stage, you're just project. You're just loosely related to cloud-native or closely related. But you're just in the world. That's what gets your foot in the door. 

As your project progresses, it'll move up to the next stage, which basically this is a project that's really showing some life. It's getting some adoption. And then if you want to go to graduated, it's a pretty high bar. You need to show that multiple organizations, big organizations typically, are interested in continuing to maintain this project. You need to show that there's a diversity of maintainers. It can't just be like three organizations. Say they support this project, but all the developments are done by Intuit it or something like that. You need to show that you have a governance process in place. So that when there are big decisions to be made about the project, there's some way to do it without people fighting or anyone just stepping in and being like, "My way or the highway." And there's a lot of security stuff. Argo went through a really rigorous security audit of all four products and produced a report to basically say this is something that makes sense for production use. 

[00:29:03] LA: And the benefits of that, of course, is you're now a mainstream CNCF capability that's promoted and adopted as part of CNCF. Not just the fringe tale of CNCF. It's a core capability of CNCF. 

[00:29:20] MC: Correct. We had a booth at the conference. There's a Jira page that our maintainers can go to if they need some tooling support. Yeah, being graduated comes with some really great perks that really help us stay productive. 

[00:29:35] LA: How involved is the CNCF community in general with Argo? 

[00:29:41] MC: The CNCF community is a really broad term. CNCF itself - well, it involves sort of the corporate side. There's CNCF as an organization. There are people who are employed by it. They're sort of the volunteers who help with certain aspects of CNCF. And there're just people who are out there use CNCF projects. Maybe submit a PR. Something like that. I would say sort of the corporate side, their role is mostly to be available to us as maintainers when we need something. 

For example, suppose we want to try some new AI tool and we need to make sure it's legal to use. We can turn it on our GitHub project. That we have funding to use it. And they'll help with that sort of thing. The rest of the CNCF community; developers, volunteers, users, their main contact point with us is they open an issue. Like Argo CD isn't behaving the way they expect. Or they open a pull request. They need a new feature. Some bugs fixed. And the volume of that contact is extremely high. I think that Argo CD, out of all Linux Foundation projects, is maybe number four in terms of number of authors or contributors. Tons of interaction with that side of the CNCF community. 

[00:30:59] LA: And not just reporting problems. But contributors who are actually fixing issues and improving the product as you go along.

[00:31:07] MC: Yes. Massively. Tons of contributions come from people who have never been part of our Argo ecosystem before. They just show up. They have a problem. They have a fix. And we review it and get it merged for them. 

[00:31:19] LA: Were you both around at the time when CNCF brought it up to the highest level? Let's talk about what it was like before then and after that. What level of involvement did you have in the community before the final promotion versus after? And what level of support you got from the community at large? I'm talking about both from the standpoint of people wanting to use the project as well as people willing to help maintain it. 

[00:31:47] ZA: I'll just give my firsthand experience, which is maybe a little bit different than Michael's. But to be honest, I feel like there was just kind of this gradual growth that has continued from even the beginning. I would say that, personally, there was maybe a small uptick just from graduation. I think companies see a little bit more - are maybe a little bit more confident in adopting the tooling. But, overall, as far as the community, the user side, PR submitted bug fixes, interactions with people, there's just been this kind of steady growth curve that's just kind of continued. That's kind of been my experience, I guess, on the before and after. 

[00:32:29] MC: The one anomaly I guess I'd point out is Argo graduated a few months I think after I joined into it. The before space is fairly small in my mind. But after it graduated and then we had that first CubeCon after graduation, we saw a pretty big spike in pull requests being opened. And I think that's because there were probably organizations that were holding back and they were like, "Do we adopt? Do we not?" They finally adopted when it graduated and realized there are things that we want to fix. Features we want to add. And those sort of piled in all at once. But, overall, I would say, similar to what Zack had said, pretty just steady incline in community involvement. 

[00:33:10] LA: That answers the question. I think what I was trying to get to is was the growth driven by CNCF? Or was CNCF driven by the growth? And it sounds like it was almost the latter, right? Your interest in CNCF and the recognition within CNCF came because it was a highly popular application versus the other way around. 

[00:33:33] MC: I think that's mostly fair. I think it's a bit of a feedback loop. When you have your developers on stages at CubeCon presenting talks that get promoted or you have your project featured on stage during CubeCon, that all goes a long, long way to just getting name recognition and getting contributors just to have their foot in the door, drop into the Slack channel. That's a huge step towards becoming an active contributor. And a lot of that is initiated in those CNCF-promoted events and communications. 

[00:34:09] LA: It makes sense. What's next for Argo? 

[00:34:11] MC: Well, for Argo CD, I've got a big smile on my face and Zach, too. Because we're working on something that we find very exciting. It's related mostly to Argo CD. And it has to do with promoting changes from environment to environment. 

I talked about how Argo CD and Kubernetes generally is all about declaring your system state. And then other systems are in charge of getting you to that state. The problem is, if you want to declare the states of your Dev environment, your test environment, and the prod environment, then something has to be in charge of saying, "I see that coder Billy made a change to the Dev environment. Now we need to run some tests. Okay. Is everything fine? Now we're moving it to the test environment. We're running performance tests maybe. Something more rigorous. Great. We see that that's passed. We're moving on to the prod environment." 

Today, the GitOps ecosystem does not have a really robust way for people to declare that process and let some other system take care of it. People write really, honestly, pretty janky CI pipelines a lot of times. They use what they already know. They'll write a Jenkins pipeline that sees a change made to code. And it writes the change to a Dev folder for your manifests. And then it does Argo CD app sync Dev. Runs test. And it goes and does another YQ edit on another manifest file. It's very flaky. If some step fails, you're just sort of broken. And developers don't like working with it. They don't like reading thousands of lines of Jenkins output to understand whether their change has made it up to the prod environment. 

Zack and I are working on a system that allows you to define in one small YAML file, "Here's the series of environments that I want to promote to." And that system will automatically open PRs and merge PRs of the rendered final state Kubernetes manifests for you obeying any rules, any gates you want to set on quality checks, security checks, whatever else to eventually get the state to your change has made it to prod. And this is all going to be orchestrated in a way that allows users to stay in the Git ecosystem and not rely on some external tool like Jenkins, like GitHub Actions. Everything they want to do, they can see it in the GitHub, GitLab, etc., user interface. We think that's the next big wave for GitOps and Argo CD. 

[00:36:56] LA: And it sounds like that's pretty close to being ready? 

[00:37:00] MC: Oh. Well, that's one way to put it. The design is in really, really good shape. We've had some great help already from people in the community, people at Intuit. The code is out there. I've got a draft PR for the Argo CD piece open. Zach has a new repo for sort of the separate promoter component. All that code's visible. But it's still very POC stage. 

[00:37:24] LA: Got it. Got it. What about new applications beyond the base four? 

[00:37:29] MC: I think we're sitting pretty in terms of Argo projects. Well, you've got the question of how will CNCF feel if we're like, "Oh, there's this fifth thing that we now want to be considered part of Argo." To have to go through the graduation steps. Does it not? The way we typically manage incorporating new things is we'll say, "Okay, you write your standalone project. Put it in our repo called argoproj-labs. Let people tinker with it for a while. If it's looking really good, it's getting adoption, great. We'll just find a way to make it a first-class component of an existing Argo CD product and merge it just like we would any feature PR. 

[00:38:08] LA: As opposed to making a new application within the Argo umbrella. 

[00:38:13] MC: Exactly.

[00:38:14] LA: And so, it sounds like that's mostly just to avoid the perception that it's brand-new and having to go through the whole process all over again to get it approved and all that sort of stuff. I mean, it's an easier way to accomplish the same thing. 

[00:38:28] ZA: The promoter is probably the first largest project that that question would come up with. It's a little bit new for that. But from the CNCF's perspective, it looks like shoehorning a product and is probably frowned upon, right? 

[00:38:44] LA: Got it. It sounds like CI might be another thing that might fit though under the Umbrella. I see some smiles there. I know you can't see it in a podcast. But I see some smiles and a little bit of laughter there. 

[00:38:57] MC: I feel super strongly about this one, too. I think that Argo Workflows is actually a fantastic tool for doing CI. At my last job when I used it for CI, I really enjoyed it. One of the most prolific contributors to Argo Workflows, Alex Collins, I think mostly agrees with that sentiment. Argo Workflows needs polish in order to be a viable CI tool. People are used to being able to go into GitHub Action. You write run:bar, enter, and you type out your bash code. 

Argo Workflows is just syntactically a bit heavier than that. You're like riding a pod spec, which involves a lot more fields. Small syntactic sugar things I think could make Argo workflows a beautiful CI system. Tons of people use it for CI today anyway and really enjoy it and like it. But I would love to see it go that extra last little bit and become like a competitor for a Jenkins or a GitHub Actions.

[00:40:00] ZA: I agree with that. I do feel that having just a little bit of integration on like standard steps would go far to not just the integration step. But there's some patterns within CI that are pretty genericized or building a Docker image, right? Having a built-in library or built-in step that can kind of help ease some of that redundancy that people end up having to do when they use workflows for CI I think would also go long ways. But I share the same sentiment. I think Workflows does have a pretty nice fit for CI. 

[00:40:35] LA: That actually makes me feel better. Because when I started this, I assumed that Workflow was essentially a CI solution. And, yeah. I was thinking that this was all part of the CI/CD system. 

[00:40:46] ZA: At conferences, that question comes up all the time.

[00:40:50] LA: I'm not alone. Okay.

[00:40:51] ZA: No. Not at all. A lot of people do use it for that, too, because it's very - orgs do use Workflows for CI quite a bit.

[00:41:00] LA: Right.

[00:41:01] MC: It's sort of like you probably want a senior engineer writing your CI and Argo Workflows today. Where I'd really love for it to be is if we could have a software engineer to junior engineer writing CI and Argo Workflows. And it's just not simplified enough for that yeah.

[00:41:15] LA: Got it. Okay. If new contributors wanted to get involved with Argo, what should they do? 

[00:41:23] MC: The main point of contact, probably the most active one is CNCF Slack. If you join the CNCF Slack, there are channels for all of the different Argo projects. And there are contributors' channels for all of the different projects. Just starting a conversation there. All of the GitHub repositories have good first issues tagged. And that's a great way to start engaging. Generally, you're going to get the most feedback the most quickly if you're a new developer joining the project and just open a PR. You're like, "Okay. I took a stab at this bug maybe. It's not the best effort. It's not perfect yet." But having a PR open will get that initial engagement. Then, over time, if you continue to contribute, the maintainers will get to know you and you'll just get more interaction there.

[00:42:10] LA: Makes sense. Is there anything else you'd like to tell us about Argo in general or what you're working on in particular? 

[00:42:16] MC: Well, obviously, super excited about the promoter stuff. If you want to ask about that, if you've had trouble with GitOps and doing promotions, hit me up on CNCF Slack. I'd love to send you the proposal. In general, get involved in the Argo ecosystem even if it's just dropping into Slack and chatting with people about the problems they're facing. It's a super welcoming community. It's a super active community of a lot of really smart, really interesting and fun people. Yeah, Argo is just a wonderful space. And we'd love to see everyone get involved in it.

[00:42:47] LA: And if you're not yet on CNCF Slack, you can get information about that on the CNCF website. Is that correct? 

[00:42:54] MC: Yep.

[00:42:54] ZA: Yes.

[00:42:55] LA: Great. Well, thank you very much. Michael Crenshaw and Zach Aller are both lead maintainers for various parts of Argo. And they've been my guest today. Michael, Zach, this was great. A lot of fun. Thank you so much. And thank you for being on Software Engineering Daily. 

[00:43:11] MC: Thanks so much.

[00:43:12] ZA: Thanks for having us.

[END]