Data Intensive Applications with Martin Kleppmann

A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do multiple things well, and often have overlapping functionality. Application architecture becomes less straightforward.

The applications we are building today are data-intensive rather than compute-intensive. Netflix needs to know how to store and cache large video files, and stream them to users quickly. Twitter needs to update user news feeds with a fanout of the president’s latest tweet. These operations are simple with small amounts of data, but become complicated with a high volume of users.

Martin Kleppmann is the author of Data Intensive Applications, an O’Reilly book about how to use modern data tools to solve modern data problems. His book includes high-level discussions about architectural strategy, and lower level discussions like how leader election algorithms can create problems for a data intensive application.

If you are interested in hosting a show for Software Engineering Daily, we are looking for engineers, journalists, and hackers who want to work with us on content. It is a paid opportunity. Go to softwareengineeringdaily.com/host to find out more.

The Software Engineering Daily store is now open if you want to buy a Software Engineering Daily branded t-shirt, hoodie, or mug and support the show. 

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Software engineers know that saving time means saving money. Save time on your accounting solution — use FreshBooks cloud accounting software. FreshBooks makes easy accounting software with a friendly UI that transforms how entrepreneurs and small business owners deal with day-to-day paperwork. Get ready for the simplest way to be more productive and organized, and most importantly, get paid quickly. FreshBooks is offering a 30-day, unrestricted free trial to Software Engineering Daily listeners. To claim it, just go to FreshBooks.com/SED and enter SOFTWARE ENGINEERING DAILY in the “How Did You Hear About Us?” section.


Don’t let your database be a black box–drill down into the metrics of your database with 1-second granularity. VividCortex provides database monitoring for MySQL, Postgres, Redis, MongoDB, and Amazon Aurora. Database uptime, efficiency, and performance can all be measured using VividCortex. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can be proactive, and fix performance problems before customers are impacted. If you have a database that you would like to monitor more closely, check out vividcortex.com/sedaily. Github, DigitalOcean, and Yelp all use VividCortex to understand database performance. Learn more at vividcortex.com/sedaily, and request a demo!


Oracle Dyn provides DNS that is as dynamic and intelligent as your applications. Dyn DNS gets your users to the right cloud service, CDN, or data center, using intelligent response to steer traffic based on business policies, as well as real-time internet conditions, like the security and performance of the network path. Get started with a free 30-day trial for your application by going to dyn.com/sedaily.  After the free trial, Dyn’s developer plans start at just $7 a month for world-class DNS. Rethink DNS. Go to dyn.com/sedaily to learn more and get your free trial of Dyn DNS.

  • Pingback: Last week in Stream Processing & Analytics – 10.5.2017 | Enjoy IT - SOA, Java, Event-Driven Computing and Integration()

  • Tomer Ben David

    Hi this was a really excellent episode. Question please the linkedin profile was said to best be modelled by a document because a single linked in profile has multiple jobs. However I see it as relational because they uniquely identify each job so it’s like a reference to a company and a reference to university what do you think?

    • Taylor Murphy

      Document is probably easier in this case. You can’t know beforehand how many jobs a person has, so instead of having to do `job_1`, `job_2`, etc in a relational db, you can just specify a `jobs` array in the document and add as many items as needed.

      • Shogunz

        In this case you would normally (imo, not a dba) have a job table where one field points back to the person table and one field points to company table… Standard stuff really.