On January 31st 2017, GitLab experienced a major outage of their online repository hosting service. The primary database server experienced data loss due to a combination of malicious spam attacks and engineering mistakes that occurred while trying to respond to those spam attacks.
GitLab responded to the event transparently. The company put up a postmortem describing the event in detail. In subsequent posts, GitLab expressed sympathy for the employee who made engineering mistakes that led to the deletion of data. The employee was not judged or disciplined for an understandable error.
The response from the developer community was very positive. Engineers know that building cloud services is hard. Engineering is as much about avoiding errors as it is about appropriately responding to the inevitable mistakes.
GitLab is a developer platform that combines repository hosting with several other features–issue tracking, code review, and CD. Today’s guest is Pablo Carranza, who works on infrastructure at GitLab. In this episode, he walks us through GitLab’s product, the engineering stack, and a postmortem of the outage. We also discuss working at Amazon, and the importance of postmortems, which I first encountered at Amazon.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Oracle Dyn provides DNS that is as dynamic and intelligent as your applications. Dyn DNS gets your users to the right cloud service, CDN, or data center, using intelligent response to steer traffic based on business policies, as well as real-time internet conditions, like the security and performance of the network path. Get started with a free 30-day trial for your application by going to dyn.com/sedaily
. After the free trial, Dyn’s developer plans start at just $7 a month for world-class DNS. Rethink DNS. Go to dyn.com/sedaily
to learn more and get your free trial of Dyn DNS.
Datadog brings you visibility into every part of your infrastructure, plus APM for monitoring your application’s performance. Dashboarding, collaboration tools, and alerts let you develop your own workflow for observability and incident response. Datadog integrates seamlessly with all of your apps and systems, from Slack to Amazon Web Services, so you can get visibility in minutes. Go to softwareengineeringdaily.com/datadog
to get started with Datadog and get a free t-shirt.
Deep learning promises to dramatically improve how our world works. To make deep learning easier and faster, we need new kinds of hardware and software–which is why Intel acquired Nervana Systems, a platform for deep learning. Intel Nervana is hiring engineers to help develop a full stack for AI, from chip design to software frameworks. Go to softwareengineeringdaily.com/intel
to apply for a job at Intel Nervana. If you know don’t know much about the company, check out the interviews I have conducted with engineers from the company. You can find these at softwareengineeringdaily.com/intel