Towards EngOps: Scaling Engineering Orgs with Data

Most engineering organizations are full of highly analytical people with STEM degrees. This is why it’s not at all surprising that the most data-driven organizations in any company are … Finance, Sales, and Marketing. Right? No, but seriously, when was the last time your engineering organization used data to make a decision?

When we were building the Einstein machine learning platform at Salesforce, we experienced all the regular struggles of a rapidly growing engineering org. We went from a small team of five people one day, to dozens of teams and hundreds of engineers in the span of a couple of years. With this growth came all the typical growing pains. Some teams ground to a halt as tech debt piled up; some teams became the central bottleneck for everyone else; others were overwhelmed with on-call duties. As leaders, we struggled to get a grasp of our operations, and ensure that our teams had the support they needed when they needed it.

Even simple process changes that would make everyone happier were difficult to uncover. One time, an accidental configuration change in our github organization more than tripled our time to merge pull requests, and it was only after weeks of low-level grumblings from the engineers that we realized there was a problem and fixed it.

While we struggled with visibility, we noted that our counterparts in Sales, Marketing and Finance were incredibly data-informed about their operations, and were generally pretty good at modeling and measuring the impact of changes.

Engineering, on the other hand, was flying blind. Seemingly simple questions about engineering velocity, security, compliance, or cost required non-trivial effort cobbling data from various sources, digging through logs, writing ad hoc scripts, and more. Relevant data would take weeks to compile, and by the time analyses were complete, the data would be stale. We were not alone. When we talked to other teams in other organizations, it was the same story everywhere.

And so we built Faros.

A new norm necessitates new tools

The extreme fragmentation of the tech stack is primarily to blame for this struggle that engineering organizations face. The explosion in developer tooling has increased operational surface area 100x. Every organization’s tech stack has a unique fingerprint. Tech stacks typically spin out of control as organizations grow.

Simultaneously, with COVID, remote engineering is the new norm and accelerating. Opportunities for informal data collection and correlation are lost along with the communal water cooler.

Engineering teams simply do not have the right tools to deal with this new reality. Bottlenecks in processes take a long time to discover. Hiring more engineers is an expensive solution that often hurts productivity more than it helps. Decisions rely on the loudest voices in the room (or zoom) — or gut feel, rather than data. It shouldn’t be this way.

 

Unlocking EngOps

We believe that with the right tools, engineering leaders will finally be able to scale their operations in a more data-informed way — using data to identify bottlenecks, measure progress towards organizational goals, better support teams with the right resources, and accurately assess the impact of interventions over time. Further, any solution that truly unlocks a data-informed culture in engineering will provide value by

1. Connecting the dots

For data to be at the core of an organization’s decision-making processes, data needs to be easily accessible and cannot live in silos. This requires a platform that brings all engineering data in one place and connects the dots. It should collate data and metadata from all different operational sources, into a standardized data model that can give leaders a single pane view of their engineering operations.

 

2. Maximizing flexibility

Every engineering organization is unique and an EngOps platform should be able to adapt to the organization’s needs rather than the other way around. Engineers love using best-of-breed software, and this is never going to change. Therefore any EngOps solution must allow engineers to continue using the tools they love and meet them where they are. In other words, the platform needs to be super easy to customize, extend, and integrate with. For example, adding new data sources (whether external vendors or homegrown) should be a breeze, the canonical data model needs to be easy to extend, the analytics need to be customizable, and the entire platform needs to be API-driven, so that engineers can integrate it into their regular workflows, querying the data they need from wherever it’s needed.

3. Highlighting what’s important

There is a massive amount of data that flows through engineering organizations, and the amount of metrics and insights that can be derived from that data is overwhelming. The ideal platform would be intelligent, highlighting what is relevant and explaining why it is important. It would point out trends to follow and anomalies to explore. It would correlate events from disparate systems to help with root cause analysis. It would allow leaders to concentrate on the most important insights their data can provide and take action, instead of getting lost in the weeds.

 

Introducing Faros AI

The Faros Platform has been designed from the ground-up with these three tenets in mind to provide immediate visibility, no matter the tech stack. The Faros platform is:

1. Connected: Faros connects with dozens of different engineering systems across source control, task-management, incident-management, CI/CD, and HR systems. Not only does it connect to these systems, but it also infers connections between them – correlating events and identities to provide holistic visibility across the organization. It can trace changes from idea to production and beyond; incidents from discovery to recovery to resolution; and reconcile identities across the different systems.

2. Extensible: The Faros APIs were designed with customizability and extensibility as a first-class concern. In addition to known vendors, connecting custom home-grown systems to Faros is easy with the Faros SDK. We also embedded a full-blown BI tool within the platform, to allow teams to measure what matters most to them. This, together with APIs to inspect the data and even export it, allows engineering teams to integrate Faros into their regular workflows, without change to their existing processes.

3. Intelligent: Faros correlates events, resolves identities, and infers team attribution to power operational metrics around software delivery (DORA metrics), engineering velocity, program management, and onboarding; with more to come around security, compliance, and cost optimization. For instance, Faros can measure the lead time for changes to go from idea to production and every stage in between – broken down by team, by application, and over time. But metrics are just the beginning, as we design towards fully automated insights with anomaly detection and root cause analysis to help teams quickly make sense of their data.

In the weeks to come, stay tuned for more blog posts on how we designed the Faros platform to deliver on its values at scale.

 

Why should you care?

Your engineering teams need to quickly, efficiently, and reliably create and deliver quality software, and that’s where your engineers should be spending their time. Better visibility allows you to effectively scale your operations, identify frustrating bottlenecks and resolve issues before they become fires. Fewer fires and bottlenecks make for happier teams that can focus on what’s most important.

 

See Faros in Action

Request a demo and we will be happy to set up time to walk you through the platform.
Unlock the power of data-driven EngOps at Faros.ai.
Shubha Nabar

Shubha Nabar is the Co-Founder of Faros AI, an engineering operations platform. Prior to Faros AI, she was part of the founding team of the Einstein machine learning platform at Salesforce and built data products at companies like LinkedIn and Microsoft. Shubha has a PhD in Computer Science from Stanford University and was recognized by Forbes as one of the top 20 women in AI.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.