Podcast: Play in new window | Download
http://traffic.libsyn.com/sedaily/cosmosdb_edited.mp3Podcast: Play in new window | Download Different databases have different access patterns. Key-value, document, graph, and columnar databases are useful under different circumstances. For example, if you are a bank, and you have a database of customers and the transactions they have performed, the ideal access pattern for aggregating the total amount of all transactions might be a columnar store. If the transaction amounts are all in one column,
http://traffic.libsyn.com/sedaily/uber_database_edited.mp3Podcast: Play in new window | Download When Uber’s engineering team published a blog post about moving to MySQL from Postgres, Markus Winand started receiving lots of email. Markus writes about databases on his blog “Use The Index, Luke,” a guide to database performance for developers. The people emailing Markus wanted to know–if Postgres doesn’t work well for Uber, is it safe to use for anyone? Markus wrote a detailed
http://traffic.libsyn.com/sedaily/Ringpop_edited.mp3Podcast: Play in new window | Download Uber has a software architecture with unique requirements. Uber does not have the firehose of user engagement data that Twitter or Facebook has, but each transaction on Uber is both high value and time-sensitive. Users are paying for transportation that they expect to be available and reasonably close by. When Uber’s system is trying to match a rider with a driver, availability is
http://traffic.libsyn.com/sedaily/Troy_Hunt_Edited_2.mp3Podcast: Play in new window | Download When you hear about massive data breaches like the recent ones from LinkedIn, MySpace, or Ashley Madison, how can you find out whether your own data was compromised? Troy Hunt created the website HaveIBeenPwned.com to answer this question. When a major data breach occurs, Troy acquires a copy of the stolen data and provides a safe way for individuals to check if
http://traffic.libsyn.com/sedaily/Scaphold.io_Edited.mp3Podcast: Play in new window | Download GraphQL was open sourced out of Facebook, and gave developers a way to unify their different data sources into a single endpoint. Although the promise of GraphQL is appealing, the process of setting up a GraphQL server that can communicate with each disparate data source can prove to be complex. Scaphold.io provides GraphQL as a service, and today’s guests are the creators of
http://traffic.libsyn.com/sedaily/database_crisis_edited_fixed.mp3Podcast: Play in new window | Download Breakthroughs in modern data research tend to come from companies like Google, Facebook, and Amazon, with projects like MapReduce, Cassandra, and Dynamo. Twenty years ago, this types of breakthroughs would be happening in academia, which causes today’s guest Peter Bailis to ask: is the academic data community having an identity crisis? Peter is an assistant professor at Stanford University, where he
http://traffic.libsyn.com/sedaily/cockroachdb_Edited.mp3Podcast: Play in new window | Download “Eventual consistency is really kind of a marketing term from some of these NoSQL systems – it’s not really consistent in any strong sense of the term.” Google has published papers on distributed systems such as BigTable, Chubby, and the Google File System. During this episode, we focus on a product that takes inspiration from Google’s Spanner project, a database that is built
http://traffic.libsyn.com/sedaily/Filodb_Edited.mp3Podcast: Play in new window | Download “The world is becoming more and more interactive, and people want answers right away, so you’re seeing the rise of stream processing and real-time.” Big data is yesterday–fast data is now. FiloDB is a reactive columnar OLAP database that is built on Cassandra and Spark. Today’s guest is Evan Chan, creator of FiloDB. In our discussion today, we talk about the use cases
From Eric Tschetter’s answer via Quora: The difference you are asking about though is ParAccel vs. Druid. ParAccel is the software that Amazon is licensing for RedShift. Aside from just potential differences in performance, there are some functional differences (these are all based on a cursory understanding of what ParAccel does, I’ve read what I could find on it, but a lot of my understanding is extracted from interpretations of marketing
http://traffic.libsyn.com/sedaily/Cassandra_Edited.mp3Podcast: Play in new window | Download “There isn’t any central node in Cassandra. Every node is a peer, there is no master – there is no single point of failure.” Apache Cassandra can serve as both the real-time data store for online transactional applications, as well as the read-intensive database for data warehousing operations. In order to combine these two use cases into a single database, Apache Cassandra required
http://traffic.libsyn.com/sedaily/Airbnb_Edited.mp3Podcast: Play in new window | Download “One big transformation we’re seeing right now is the slow agonizing death of MapReduce.” When a company gets big enough, there is so much data to be processed that an entire data engineering team becomes responsible for managing this data and making it available to other teams. Airbnb is one such company. Max Beauchemin works on the data engineering team at Airbnb, where
http://traffic.libsyn.com/sedaily/Voltdb_Edited.mp3Podcast: Play in new window | Download “There’s a lot of value in moving logic to the data rather than moving data to the logic. And the issue here is the data is a lot bigger than the logic.” NewSQL is a class of modern relational databases that seek to provide the same scalable performance of NoSQL systems for OLTP, while still maintaining the ACID guarantees of a traditional database
From Chris Schrader’s answer via Quora: Someone could write a 5000 page book on this subject but I’ll do my best at a high level. SQL Databases I break these down into to three basic groups: Traditional, MPP, columnar, and an emerging technology called NewSQL. Traditional These are the usual databases that we’ve seen for years. Some vendors might includeMySQL, PostgreSQL, SQL Server (product), Sybase, Oracle Database, etc. They comply with
“The world is increasingly disconnected, if you think about dealing with things like mobile devices that flap in and out of connectedness.”
“The more you’re comfortable with this idea that everything is going to fail, the more you realize that it’s a natural process of distributed systems, and it helps you write and architect better code.”
“Everybody that sees SQL thinks its ugly and dirty and they want to try and rewrite it to be better. There’s a bazillion attempts to do this – I’ve tried it several times myself. But somehow, everybody always comes back to SQL.”
“Creativity never comes to you – she will only meet you halfway.”
Derek Sivers is a programmer, musician, and writer. He has created several companies and products, including CD Baby, which became the largest seller of independent music online.
Data science is saving and improving lives by leveraging sensor data and machine learning. Pivotal makes software platforms and database products to enable enterprises to make use of their data.
Sarah Aerni is principal data scientist at Pivotal.
Databases Week began with a set of fundamental questions. What is a database? Every interviewee during Database Week has given a different answer to the question of "What is a database?" — SE Daily (@software_daily) August 21, 2015 One definition: “an application component for storing and retrieving data”. All of the different databases companies have this functionality. But similarities end there. RethinkDB pushes data to the application MemSQL is a faster, proprietary version
http://traffic.libsyn.com/sedaily/voltdb_rbetts.mp3Podcast: Play in new window | DownloadStreaming pipelines and in-memory analytics are difficult to support with old database systems. VoltDB provides streaming analytics with transactions. Questions How does VoltDB exemplify Michael Stonebraker’s thesis that one size does not fit all? What is the difference between OLTP and Streaming? How does VoltDB serve the common Zookeeper-Kafka-Storm-Cassandra stack? What trends and requirements among OLTP and OLAP systems are changing most