Rockset Data Platform with Venkat Venkataramani
At Facebook, Venkat Venkataramani saw how large volumes of data were changing software infrastructure. Applications such as logging servers and advertising were creating fast moving, semi-structured data. The user base was growing, the traffic was growing, and the volume of data was growing. And the popular methods for managing this data were insufficient for the applications that developers wanted to build on top.
In previous episodes about data platforms, we have covered similar difficulties as experienced by Uber and Doordash. Incoming data is often in JSON, which is hard to query. The data is transformed to a file format like Parquet, which requires an ETL job. Once it is in a Parquet file on disk in a data lake, the access time is slow. To query the data efficiently, it must be loaded into a data warehouse, which loads the data into memory, often in a columnar format that is easy to aggregate.
Imagine being a developer at Facebook, Uber, or Doordash, and trying to build a simple dashboard, or a machine learning application on top of this data platform. Where do you find the right data? How do you know it is up to date? And what if you don’t know the shape of your queries ahead of time, and you haven’t defined indexes over your data? The access speed will be too slow to do exploratory analysis.
There are so many steps in this process, and each of these steps creates friction for application developers that want to build on top of “big data”. Since even Facebook was having trouble managing this problem of the data platform, Venkat figured there was an opportunity to build a company around solving the data platform for other software companies.
Venkat is the CEO of Rockset, a data system that is built to make it easy for developers to build data-driven apps. In Rockset, data can be ingested from data streams, data lakes, and databases. Rockset creates multiple indexes and schemas across the data. Because there are multiple models for querying, Rockset can analyze an incoming query and create an intelligent query plan for serving it.
Venkat joins the show to discuss his time working on data at Facebook, the untapped opportunities of using that data, and the architecture of Rockset.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Transifex is a SaaS-based localization and translation platform that easily integrates with your agile development process. If you’re a developer who is ready to reach a global audience, check out Transifex by visiting transifex.com/sedaily and sign up for a free 15-day trial.
Manifold makes your life easier by providing a single workflow to organize your services, connect your integrations, and share with your team. While Manifold is completely free to use, if you head over to manifold.co/sedaily you’ll get a coupon code for $10 which you can use to try out any service on the Manifold marketplace.
Triplebyte is a company that connects engineers with top tech companies. We’re running an experiment and our hypothesis is that Software Engineering Daily listeners will do well above average on the quiz. Go to triplebyte.com/sedaily.
GoCD is a continuous delivery tool created by ThoughtWorks. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.