Postgres Sharding and Scalability with Marco Slot
Relational databases have been popular since the 1970s, but in the last 20 years the amount of data that applications need to collect and store has skyrocketed. The raw cost to store that data has decreased. There is a common phrase in software companies: “it costs you less to save the data than to throw it away.”
Saving the data is cheap, but accessing that data in a useful way can be expensive. Developers still need rapid row-wise and column-wise access to the data. Accessing an individual row of a database can be useful if a user is logging in and you want to load all of that user’s data, or if you want to update a banking system with a new financial transaction. Accessing an entire column of a database can be useful if you want to aggregate summaries of all of the entries in a system–like the sum of all financial transactions in a bank.
These different kinds of transactions are nothing new, but with the growing scale of data, companies are changing their mentality from thinking in terms of individual databases to thinking about distributed “data platforms.”
In a data platform, the data across a company might be put into a variety of storage systems–distributed file systems, databases, in-memory caches, search indexes–but the API for the developer is kept simple. And the simplest, most commonly understood language is SQL.
Marco Slot is an engineer with Citus Data, a company that makes Postgres scalable. Postgres is one of the most common relational databases, and in this episode Marco describes how Postgres can be used to service almost all of the needs of a data platform.
This isn’t easy to do, as it requires sharding your growing relational database into clusters and orchestrating distributed queries between those shards. In this show, Marco and I discuss Citus’s approach to the distributed systems problems of a sharded relational database. This episode is a nice complement to previous episodes we have done with Ozgun and Craig from Citus, in which they gave a history of relational databases, and explained how Postgres compares to the wide variety of relational databases out there. Full disclosure: Citus Data is a sponsor of Software Engineering Daily.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
VictorOps empowers teams to minimize downtime. Use VictorOps to manage on-call schedules, contextually alert teams when something goes wrong, and collaborate around an for incident. Head over to victorops.com/sedaily to learn how VictorOps helps teams continuously deliver and maintain uptime.
Datadog is a platform for cloud-scale infrastructure and applications. With rich dashboards, algorithmic alerts, and collaboration tools, Datadog provides your team with the tools they need to quickly troubleshoot and optimize modern applications. See for yourself – start a 14-day free trial today and Datadog will send you a free T-shirt! softwareengineeringdaily.com/datadog
Segment allows us to gather customer data from anywhere and send that data to any analytics tool. To get a free 90-day trial, signup for Segment at segment.com and enter SEDaily in the “How did you hear about us box?” during signup.
Rookout Rapid Production Debugging allows developers to track down issues in production without any additional coding, re-deployment or restarting the app. Go to rookout.com/sedaily to start a free trial and see how much debugging time you can save.