An Introduction to Database Reliability
Gone are the days of the monolithic SQL database application. Startups and enterprise are leveraging distributed datastores spread across the cloud to solve their data problems. The business requires faster velocity and the data requirements are skyrocketing in complexity. Today’s data problems require a new approach to planning, managing, and scaling our datastores.
Enter the Database Reliability Engineer (DBRE): a modern data administrator that enables a company to innovate with their data while making it safe and accessible. The DBRE uses software and tooling to automate manual tasks. A DBRE is an enabler. They allow engineers to move fast without breaking things! They evaluate service level objectives and use their strong domain knowledge to anticipate capacity needs.
Laine Campbell joins SE Daily to discuss the role and field of Database Reliability Engineering. Laine has a strong background in data engineering. Laine works at Fastly on one of the largest distributed key-value stores in the world: the Fastly CDN. Laine and co-author Charity Majors coined the term “Database Reliability Engineering” in their book Database Reliability Engineering.
Data is an organization’s most important asset, but the critical nature often brings fear and reservation when working with it. An organization will often restrict itself from innovation to prevent risk. The DBRE allows the organization to innovate with their data instead of being stifled by gatekeeping and risk-averse policies.
One way a DBRE alleviates fear and increases velocity is through automation. A DBRE will often wear multiple hats, including the hat of an SRE. Laine says, “The job of the SRE is about automating yourself out of any manual work if possible.”
Automating crucial operations like failovers, backups, and resource provisions is an important responsibility of a DBRE. Automation enables software engineers to move fast without concern of losing data. Listen to the conversation with Mike Hiraga to learn more about Site Reliability Management.
Laine frequently mentions the importance of backups and failover automation. These are the tasks that often save an organization from catastrophic data loss. Automating data safety nurtures a culture of innovation. Listen to the episode on data backups with Kenny To to learn more.
For example: DBREs are comfortable applying chaos engineering techniques because of the confidence in failovers and back-pressure mechanisms. SE Daily covers chaos engineering more in depth with Tammy Butow.
Risk is an important aspect of evaluating new tools, databases, and even more automation. A DBRE uses service level objectives (SLOs) to help make decisions, analyze tradeoffs, and plans for capacity. The DBRE chooses their problems wisely and automates the rest.
“The DBRE should be focusing on eliminating the gatekeeping, and that means building guardrails, providing education, collaboration, and teaching.”
The DBRE educates other data and software engineers on best practices, database technologies, and the organization’s domain. Knowledge sharing is what enables a single DBRE to support a large organization.
Laine points out that newer “Databases as a Service” (DBaaS) are enticing to software engineers, but will often abstract away important information that helps engineers and DBREs measure constraints on data solutions. Nonetheless, proliferation of DBaaS indicates a shift towards better data development experience. See Eliot Horowitz’s talk on Mongo DB Atlas to learn more.
Laine ends the talk with a positive declaration: engineers should be able to work within their datastores. Guardrails are needed, but there should be no fear of experimenting and innovating with the lifeblood of the company.