DataOps with Christopher Bergh
Podcast: Play in new window | Download
Subscribe: Apple Podcasts | RSS
Every company with a large set of customers has a large set of data–whether that company is 5 years old or 50 years old. That data is valuable whether you are an insurance company, a soft drink manufacturer, or a ridesharing company. All of these large companies know that their data is valuable, but some of them are not sure how to standardize the access patterns of that data, or build a culture around data.
The larger the company is, the more the data is spread throughout the company, and the more heterogeneous the data sources are. Older companies often have older pieces of data infrastructure, and it might not be well documented.
It is hard to make data driven decisions when an organization cannot effectively query their own data. For example, consider a simple question about marketing. An insurance company wants to know how their spending on TV advertising correlates with sales in California over the last 25 years.
The VP of marketing sends an email to a business analyst, asking for a historical report of this marketing data. The business analyst knows how to present the data with a business intelligence tool, but the analyst needs to ask the data scientist for how to make that query. The data scientist needs to ask the data engineer where to find those records in a large Hadoop distributed file system cluster. And the data engineer joined the company last week and has no idea where anything is.
These are the problems of DataOps. Similarly to DevOps, DataOps is the recognition that a set of problems have crept into organizations over time and slowed down productivity.
The story of the DevOps movement is that old infrastructure, lack of testing, and complicated monolithic backends slowed down everyone in an old, big enterprise. The slow pace of change destroys morale and erodes trust. The DevOps movement is about revamping organizations through tooling and organizational behavior. We have covered this in lots of episodes, such as in a great episode with Gene Kim who wrote “The Phoenix Project.”
When an organization wants to reinvent itself with DevOps, it often begins with testing and continuous delivery. DataOps encourages data driven organizations to begin with a similar practice of testing their data pipelines to build trust and evolve best practices. There are other similarities between DataOps and DevOps, such as continuous delivery and the breaking down of siloes between different organizational roles.
Chris Bergh joins the show to talk about the data problems encountered by large companies, the practices of DataOps, and his company Data Kitchen, which builds tools to help companies move towards more productive data practices.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.