Incident Response Machine Learning with Chris Riley

Software bugs cause unexpected problems at every company. 

Some problems are small. A website goes down in the middle of the night, and the outage triggers a phone call to an engineer who has to wake up and fix the problem. Other problems can be significantly larger. When a major problem occurs, it can cause millions of dollars in losses and requires hours of work to fix.

When software unexpectedly breaks, it is called an incident. To triage these incidents, an engineer uses a combination of tools, including Slack, GitHub, cloud providers, and continuous deployment systems. These different tools emit updates that can be received by an incident response platform, which allow the on-call engineer to have the information they need centralized to more easily work through the incident.

On-call rotation means that different people will be responsible for dealing with different incidents that occur. When an incident happens, the current engineer who is on-call may not be aware that a similar incident happened last week. It might be easier for the new engineer to triage the issue if they have insights about how the incident was managed during the first time.

Chris Riley is a DevOps advocate with Splunk. He joins the show to discuss the application of machine learning to incident response. We discuss the different data points that are created during an incident, and how that data can be used to build models for different types of incidents, which can generate information to help the engineer respond appropriately to an incident. Full disclosure: Splunk is a sponsor of Software Engineering Daily.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Sumo Logic is a continuous intelligence platform that builds tools for operations, security, and cloud-native infrastructure. The company has studied thousands of businesses to get an understanding of modern continuous intelligence, and then compiled that information into the Continuous Intelligence Report, which is available at softwareengineeringdaily.com/sumologic.

Monday.com is a team management platform that brings all your work, external tools and communication into one place, making cross-team collaboration easy. You can try Monday.com and get a 14 day trial by going to monday.com/sedaily. And if you decide to become a customer, you will get 10% off by using coupon code SEDAILY.

There is probably a way that Zapier could make your software run more smoothly. And if you are a technical person, you probably have enough spreadsheets, Gmail accounts, and social media management to do that Zapier could save you some time. So check out Zapier.com/sedaily right now through November, and learn how your API integrations could be managed more easily.

Jaspersoft offers embeddable reports, dashboards, and data visualizations that developers love. Give users intuitive access to data in the ideal place for them to take action—within your application. To check out a sample application with embedded analytics, go to softwareengineeringdaily.com/jaspersoft

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.