Stripe Observability Pipeline with Cory Watson

Stripe processes payments for thousands of businesses. A single payment could involve 10 different networked services. If a payment fails, engineers need to be able to diagnose what happened. The root cause could lie in any of those services.

Distributed tracing is used to find the causes of failures and latency within networked services. In a distributed trace, each period of time associated with a request is recorded as a span. The spans can be connected together because they share a trace ID.

The spans of a distributed trace are one element of observability. Others include metrics and logs. Each of these components of observability make their way into services like Lightstep and Datadog. The path traveled by different elements of observability is called the observability pipeline.

In an episode last year, Cory Watson explained how observability works at Stripe. In today’s episode, Cory describes how observability is created and aggregated. It’s a useful discussion for anyone working at a company that is figuring out how to instrument their systems for better monitoring.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


The TEALs program is looking for engineers from across the country to volunteer to teach computer science in high schools. Work with a computer science teacher in the classroom to bring development concepts to life through teamwork and determination. If you’d like to learn more about the Microsoft’s TEALs program or submit your volunteer application, go to tealsk12.org/sedaily.


There’s no need to reinvent the wheel when it comes to making your app “realtime.” PubNub makes it simple, enabling you to build immersive and interactive experiences on the web, on mobile phones, embedded into hardware, and any other device connected to the Internet. With powerful APIs, and a robust global infrastructure, you can stream geolocation data, send chat messages, turn on your sprinklers, or rock your baby’s crib when they start crying (PubNub literally powers IoT cribs). 70 SDKs for web, mobile, IoT, and more means you can start streaming data in realtime without a ton of compatibility headaches, and no need to build your own SDKs from scratch. Go to PubNub.com/sedaily to get started. They offer a generous sandbox tier that’s free forever (until your app takes off).



Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.


GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out gocd.org/sedaily and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.