TiDB: Distributed NewSQL with Kevin Xu

When a user interacts with an application to order a ride with a ridesharing app, the data for that user interaction is written to a “transactional” database. A transactional database is a database where specific rows need to be written to and read from quickly and consistently.

Speed and consistency are important for applications like a user ordering a car, and riding around in that car, because the user’s client is frequently communicating with the database to update their session. Other applications of a transactional database would include a database that backs a messaging system, a banking application, or document editing software.

The data from a transactional database is often reused in “analytic” databases. An analytic database can be used for performing large scale analysis, aggregations, averages, and other data science queries.

The requirements for an analytic database are different from a transactional database because the data is not being used for an active user session. To fill the data in an analytic database, the transactional data gets copied from the transactional database in a process called ETL.

The separation of the transaction data store from the analytic data store causes problems for data engineering. To address these problems, some newer databases combine transactional and analytic functionality in the same database. These databases are often called “NewSQL”.

TiDB is an open source database built on RocksDB and Kubernetes. TiDB is widely used in China by high volume applications such as bike sharing and massively multiplayer online games. Kevin Xu works at PingCAP, a company built around TiDB. He joins the show to talk about modern databases, distributed systems, and the architecture for TiDB.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

OpenShift is a Kubernetes platform from Red Hat. OpenShift takes the Kubernetes container orchestration system and adds features that let you build software more quickly. OpenShift includes service discovery, CI/CD, built-in monitoring and health management, and scalability. With OpenShift, you avoid getting locked into any particular cloud provider. Check out OpenShift from RedHat, by going to softwareengineeringdaily.com/redhat.

Triplebyte is a company that connects engineers with top tech companies. We’re running an experiment and our hypothesis is that Software Engineering Daily listeners will do well above average on the quiz. Go to triplebyte.com/sedaily.

Digital Ocean is the easiest cloud platform to run and scale your application. Try it out today and get a free $100 credit–go to do.co/sedaily. Digital Ocean is a complete cloud platform to help developers and teams save time when running and scaling their applications.

Deploy infrastructure faster; simplify life cycle maintenance for your servers; give IT the ability to deliver infrastructure to developers as a service like the public cloud. Go to softwareengineeringdaily.com/hpe and learn about how HPE OneView can improve your infrastructure operations.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.