Tag monitoring

High Volume Logging with Steve Newman

http://traffic.libsyn.com/sedaily/Scalyr.mp3Podcast: Play in new window | Download Google Docs is used by millions of people to collaborate on documents together. With today’s technology, you could spend a weekend coding and build a basic version of a collaborative text editor. But in 2004 it was not so easy. In 2004 Steve Newman built a product called Writely, which allowed users to collaborate on documents together. Initially, Writely was hosted on a

Continue reading…

High Volume Event Processing with John-Daniel Trask

http://traffic.libsyn.com/sedaily/HighVolumeEventProcessing.mp3Podcast: Play in new window | Download A popular software application serves billions of user requests. These requests could be for many different things. These requests need to be routed to the correct destination, load balanced across different instances of a service, and queued for processing. Processing a request might require generating a detailed response to the user, or making a write to a database, or the creation of a

Continue reading…

Alerting and Metrics with Clement Pang

http://traffic.libsyn.com/sedaily/ClementPang.mp3Podcast: Play in new window | Download An alert is a signal of problematic application behavior. When something unusual happens to your application, an alert can bring that anomaly to your attention. In order to detect unusual events, you need to define the norm. In order to define both normal and problematic behavior, you need metrics. Metrics are measurements of the behavior in your application. Metrics get created from logs

Continue reading…

Dashboarding and Query Latency with Tom O’Neill

http://traffic.libsyn.com/sedaily/PeriscopeData.mp3Podcast: Play in new window | Download A dashboard is a data visualization that aggregates metrics in a way that we can quickly understand. In a modern software company, everyone uses dashboards–from salespeople to DevOps to HR. Each dashboard represents a query that must be updated frequently, so that anyone looking at it is getting up-to-date information. The data set being queried might be getting updated quickly in the case

Continue reading…

Error Diagnosis with James Smith

http://traffic.libsyn.com/sedaily/ErrorDiagnosis.mp3Podcast: Play in new window | Download When a user experiences an error in an application, the engineers who are building that application need to find out why that error occurred. The root cause of that error may be on the user’s device, or within a piece of server-side logic, or hidden behind a black box API. To fix a complex error, we need a stack trace of contextual information

Continue reading…

Stripe Observability with Cory Watson

http://traffic.libsyn.com/sedaily/stripe_observability_edited.mp3Podcast: Play in new window | Download Observability allows engineers to understand what is going on inside their systems. In its most raw form, observability comes from log data. Modern systems have many layers of logs–virtualized cloud infrastructure, container orchestration, the container runtime itself, and the application logic running within the container. With all of these layers, it is not practical for a developer to have to sift through layers

Continue reading…

Parse and Operations with Charity Majors

http://traffic.libsyn.com/sedaily/OperationswithCharityMajors.mp3Podcast: Play in new window | Download Parse was a backend as a service company built in 2011 before being acquired by Facebook in 2013. Building a backend as a service for developers requires walking a thin line between giving engineers lots of control and preventing those engineers from shooting themselves in the foot. While she was at Parse, Charity Majors learned about the operational burdens of managing a service

Continue reading…

Performance Monitoring with Andi Grabner

http://traffic.libsyn.com/sedaily/monitoring_edited.mp3Podcast: Play in new window | Download Application performance monitoring helps an engineer understand what is going on with an application. An application on a single machine is often monitored by inserting bytecode instructions into the application after it has been interpreted. Distributed cloud applications with functionality broken up across multiple servers often use distributed tracing. Andi Grabner from Dynatrace joins today’s show to explain how monitoring software is built,

Continue reading…

Monitoring Architecture with Theo Schlossnagle

http://traffic.libsyn.com/sedaily/monitoring_theo_edited_fixed.mp3Podcast: Play in new window | Download Building a monitoring system is a complex distributed systems problem. Events are produced from different points in an application and must be aggregated in order to form metrics. These events are often ingested by a time series database, which forms the backbone of our monitoring system. Theo Schlossnagle is the CEO of Circonus, where he has been working on architecting the company’s monitoring

Continue reading…

Prometheus Monitoring with Brian Brazil

http://traffic.libsyn.com/sedaily/prometheusbrian_edited_fixed.mp3Podcast: Play in new window | Download Prometheus is a tool for monitoring our distributed applications. It allows us to focus on the services we are deploying rather than the individual machines that make up instances of that service.   The monitoring service itself is a portion of a distributed system that is treated differently than the services we are monitoring. We don’t want to use a consensus-based tool like

Continue reading…

Prometheus with Julius Volz

http://traffic.libsyn.com/sedaily/Prometheus_Edited.mp3Podcast: Play in new window | Download Prometheus is an open-source monitoring tool built at SoundCloud. It can be used to produce detailed time-series data about a distributed architecture. Prometheus is based on the monitoring system inside Google’s infrastructure, called Borgmon.   Julius Volz is the creator of Prometheus, and he joins the show to explain why he built Prometheus and how it differs from previous monitoring tools. Prometheus is

Continue reading…

The Art of Monitoring with James Turnbull

http://traffic.libsyn.com/sedaily/The_Art_of_Monitoring_Edited.mp3Podcast: Play in new window | Download Monitoring translates machine data into actionable business metrics, and is a key component of a modern software company. James Turnbull’s new book “The Art of Monitoring” describes how organizations can build their monitoring infrastructure.     James joins the show today to outline the strategies that a company can use to proactively monitor their systems. We talk about pull- vs. push-based monitoring, events,

Continue reading…