Towards EngOps: Scaling Engineering Orgs with Data
Most engineering organizations are full of highly analytical people with STEM degrees. This is why it’s not at all surprising that the most data-driven organizations in any company are … Finance, Sales, and Marketing. Right? No, but seriously, when was the last time your engineering organization used data to make a decision?
When we were building the Einstein machine learning platform at Salesforce, we experienced all the regular struggles of a rapidly growing engineering org. We went from a small team of five people one day, to dozens of teams and hundreds of engineers in the span of a couple of years. With this growth came all the typical growing pains. Some teams ground to a halt as tech debt piled up; some teams became the central bottleneck for everyone else; others were overwhelmed with on-call duties. As leaders, we struggled to get a grasp of our operations, and ensure that our teams had the support they needed when they needed it.
Even simple process changes that would make everyone happier were difficult to uncover. One time, an accidental configuration change in our github organization more than tripled our time to merge pull requests, and it was only after weeks of low-level grumblings from the engineers that we realized there was a problem and fixed it.
While we struggled with visibility, we noted that our counterparts in Sales, Marketing and Finance were incredibly data-informed about their operations, and were generally pretty good at modeling and measuring the impact of changes.
Engineering, on the other hand, was flying blind. Seemingly simple questions about engineering velocity, security, compliance, or cost required non-trivial effort cobbling data from various sources, digging through logs, writing ad hoc scripts, and more. Relevant data would take weeks to compile, and by the time analyses were complete, the data would be stale. We were not alone. When we talked to other teams in other organizations, it was the same story everywhere.
And so we built Faros.
A new norm necessitates new tools
The extreme fragmentation of the tech stack is primarily to blame for this struggle that engineering organizations face. The explosion in developer tooling has increased operational surface area 100x. Every organization’s tech stack has a unique fingerprint. Tech stacks typically spin out of control as organizations grow.
Simultaneously, with COVID, remote engineering is the new norm and accelerating. Opportunities for informal data collection and correlation are lost along with the communal water cooler.
Engineering teams simply do not have the right tools to deal with this new reality. Bottlenecks in processes take a long time to discover. Hiring more engineers is an expensive solution that often hurts productivity more than it helps. Decisions rely on the loudest voices in the room (or zoom) — or gut feel, rather than data. It shouldn’t be this way.
Unlocking EngOps
We believe that with the right tools, engineering leaders will finally be able to scale their operations in a more data-informed way — using data to identify bottlenecks, measure progress towards organizational goals, better support teams with the right resources, and accurately assess the impact of interventions over time. Further, any solution that truly unlocks a data-informed culture in engineering will provide value by
1. Connecting the dots
2. Maximizing flexibility
3. Highlighting what’s important
Introducing Faros AI
The Faros Platform has been designed from the ground-up with these three tenets in mind to provide immediate visibility, no matter the tech stack. The Faros platform is:
1. Connected: Faros connects with dozens of different engineering systems across source control, task-management, incident-management, CI/CD, and HR systems. Not only does it connect to these systems, but it also infers connections between them – correlating events and identities to provide holistic visibility across the organization. It can trace changes from idea to production and beyond; incidents from discovery to recovery to resolution; and reconcile identities across the different systems.
2. Extensible: The Faros APIs were designed with customizability and extensibility as a first-class concern. In addition to known vendors, connecting custom home-grown systems to Faros is easy with the Faros SDK. We also embedded a full-blown BI tool within the platform, to allow teams to measure what matters most to them. This, together with APIs to inspect the data and even export it, allows engineering teams to integrate Faros into their regular workflows, without change to their existing processes.
3. Intelligent: Faros correlates events, resolves identities, and infers team attribution to power operational metrics around software delivery (DORA metrics), engineering velocity, program management, and onboarding; with more to come around security, compliance, and cost optimization. For instance, Faros can measure the lead time for changes to go from idea to production and every stage in between – broken down by team, by application, and over time. But metrics are just the beginning, as we design towards fully automated insights with anomaly detection and root cause analysis to help teams quickly make sense of their data.
In the weeks to come, stay tuned for more blog posts on how we designed the Faros platform to deliver on its values at scale.
Why should you care?
Your engineering teams need to quickly, efficiently, and reliably create and deliver quality software, and that’s where your engineers should be spending their time. Better visibility allows you to effectively scale your operations, identify frustrating bottlenecks and resolve issues before they become fires. Fewer fires and bottlenecks make for happier teams that can focus on what’s most important.