Database Chaos with Tammy Butow

Tammy Butow has worked at Digital Ocean and Dropbox, where she built out infrastructure and managed engineering teams. At both of these companies, the customer base was at a massive scale.

At Dropbox, Tammy worked on the database that holds metadata used by Dropbox users to access their files. To call this metadata system simply a “database” is an understatement–it is actually a multi-tiered system of caches and databases. This metadata is extremely sensitive–this is metadata that tells you where the objects across Dropbox are located–so it has to be highly available.

To encourage that reliability, Tammy helped institute chaos engineering–inducing random failures across the Dropbox infrastructure, and making sure that the Dropbox systems could automatically respond to those failures. If you are unfamiliar with the topic, we have covered chaos engineering in two previous episodes of Software Engineering Daily.

Tammy now works at Gremlin, a company that does chaos engineering as a service. In this show we talked about her experiences at Dropbox, and how to institute chaos engineering across databases. We also explored how her work at Gremlin–a smaller startup–compares to Dropbox and Digital Ocean, which are larger companies.

Show Notes

Tammy Butow Chaos Engineering Bootcamp

Information to run your own Chaos Day

How to Create a Kubernetes Cluster on Ubuntu 16.04 with kudeadm and Weave Net | Gremlin Community


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Speaking of reliability, do you find yourself worrying about system downtime or missing an alert while you’re on-call? If so, VictorOps is THE incident management tool you need. VictorOps integrates with a large number of the monitoring, alerting, and messaging tools you already have in place to help your DevOps teams communicate better, diagnose incidents, and resolve any problems that come up. All in one place, on both your smartphone and your computer, you can view highly contextual, detailed alerts that will help your on-call engineers to understand and respond to incidents more quickly and effectively. Head to to see how VictorOps can help you. Be victorious with VictorOps!

Sumo Logic is a cloud-native, machine data analytics service that helps you Run and Secure your Modern Application. If you are feeling the pain of managing your own log, event, and performance metrics data, check out Even if you have tools already, it’s worth checking out Sumo Logic and seeing if you can leverage your data even more effectively, with real-time dashboards and monitoring, and improved observability – to improve the uptime of your application and keep your day-to-day runtime more secure. Check out for a free 30-day Trial of Sumo Logic, to find out how Sumo Logic can improve your productivity and your application observability–wherever you run your applications. That’s

There’s no need to reinvent the wheel when it comes to making your app “realtime.” PubNub makes it simple, enabling you to build immersive and interactive experiences on the web, on mobile phones, embedded into hardware, and any other device connected to the Internet. With powerful APIs, and a robust global infrastructure, you can stream geolocation data, send chat messages, turn on your sprinklers, or rock your baby’s crib when they start crying (PubNub literally powers IoT cribs). 70 SDKs for web, mobile, IoT, and more means you can start streaming data in realtime without a ton of compatibility headaches, and no need to build your own SDKs from scratch. Go to to get started. They offer a generous sandbox tier that’s free forever (until your app takes off).

LiveRamp is one of the fastest growing companies in data connectivity in the Bay Area, and they are looking for senior level talent to join their team. LiveRamp helps the world’s largest brands activate their data to improve customer interactions on any channel or device. The infrastructure is at a tremendous scale: a 500-billion node identity graph generated from over a thousand data sources, running an 85PB hadoop cluster; and application servers that process over 20 billion HTTP requests per day. The LiveRamp team thrives on mind-bending technical challenges. LiveRamp members value entrepreneurship, humility, and constant personal growth. If this sounds like a fit for you, check out

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.