DataOps with Christopher Bergh
Every company with a large set of customers has a large set of data–whether that company is 5 years old or 50 years old. That data is valuable whether you are an insurance company, a soft drink manufacturer, or a ridesharing company. All of these large companies know that their data is valuable, but some of them are not sure how to standardize the access patterns of that data, or build a culture around data.
The larger the company is, the more the data is spread throughout the company, and the more heterogeneous the data sources are. Older companies often have older pieces of data infrastructure, and it might not be well documented.
It is hard to make data driven decisions when an organization cannot effectively query their own data. For example, consider a simple question about marketing. An insurance company wants to know how their spending on TV advertising correlates with sales in California over the last 25 years.
The VP of marketing sends an email to a business analyst, asking for a historical report of this marketing data. The business analyst knows how to present the data with a business intelligence tool, but the analyst needs to ask the data scientist for how to make that query. The data scientist needs to ask the data engineer where to find those records in a large Hadoop distributed file system cluster. And the data engineer joined the company last week and has no idea where anything is.
These are the problems of DataOps. Similarly to DevOps, DataOps is the recognition that a set of problems have crept into organizations over time and slowed down productivity.
The story of the DevOps movement is that old infrastructure, lack of testing, and complicated monolithic backends slowed down everyone in an old, big enterprise. The slow pace of change destroys morale and erodes trust. The DevOps movement is about revamping organizations through tooling and organizational behavior. We have covered this in lots of episodes, such as in a great episode with Gene Kim who wrote “The Phoenix Project.”
When an organization wants to reinvent itself with DevOps, it often begins with testing and continuous delivery. DataOps encourages data driven organizations to begin with a similar practice of testing their data pipelines to build trust and evolve best practices. There are other similarities between DataOps and DevOps, such as continuous delivery and the breaking down of siloes between different organizational roles.
Chris Bergh joins the show to talk about the data problems encountered by large companies, the practices of DataOps, and his company Data Kitchen, which builds tools to help companies move towards more productive data practices.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Citus is worry-free Postgres that is built to scale out. Made for SaaS and enterprises, Citus is an extension to Postgres that transforms Postgres into a distributed database. Whether you need to scale out a multi-tenant app—or are building real-time analytics dashboards that require sub-second responses—Citus makes it simple to shard Postgres. Go to citusdata.com/sedaily to learn more about how Citus can scale your Postgres database.
ActiveState gives your engineers a way to bake security right in your languages’ runtime. You identify security vulnerabilities, out-of-date packages and restrictive licenses (e.g. GPL, LPGL). Get more info at activestate.com/sedaily
DoiT International helps startups optimize the costs of their workloads across Google Cloud and AWS, so that they can spend more time building new software–and less time reducing cost. DoiT International helps clients optimize their costs–and if your cloud bill is over $10,000 per month, you can get a free cost-optimization assessment by going to doit-intl.com/sedaily.
GoCD is a continuous delivery tool created by ThoughtWorks. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.