Podcast: Play in new window | Download
“The more you’re comfortable with this idea that everything is going to fail, the more you realize that it’s a natural process of distributed systems, and it helps you write and architect better code.”
“Everybody that sees SQL thinks its ugly and dirty and they want to try and rewrite it to be better. There’s a bazillion attempts to do this – I’ve tried it several times myself. But somehow, everybody always comes back to SQL.”
“Creativity never comes to you – she will only meet you halfway.”
Derek Sivers is a programmer, musician, and writer. He has created several companies and products, including CD Baby, which became the largest seller of independent music online.
Data science is saving and improving lives by leveraging sensor data and machine learning. Pivotal makes software platforms and database products to enable enterprises to make use of their data.
Sarah Aerni is principal data scientist at Pivotal.
Databases Week began with a set of fundamental questions. What is a database? Every interviewee during Database Week has given a different answer to the question of "What is a database?" — SE Daily (@software_daily) August 21, 2015 One definition: “an application component for storing and retrieving data”. All of the different databases companies have this functionality. But similarities end there. RethinkDB pushes data to the application MemSQL is a faster, proprietary version
http://traffic.libsyn.com/sedaily/voltdb_rbetts.mp3Podcast: Play in new window | DownloadStreaming pipelines and in-memory analytics are difficult to support with old database systems. VoltDB provides streaming analytics with transactions. Questions How does VoltDB exemplify Michael Stonebraker’s thesis that one size does not fit all? What is the difference between OLTP and Streaming? How does VoltDB serve the common Zookeeper-Kafka-Storm-Cassandra stack? What trends and requirements among OLTP and OLAP systems are changing most
http://traffic.libsyn.com/sedaily/neo4j_ryan.mp3Podcast: Play in new window | DownloadGraph databases use graph structures for semantic queries. Ryan Boyd is a developer advocate for Neo4j, an open-source graph database. Questions Why does Monsanto use graph databases? In a social network graph, how would you query for “people you may know”? What CAP tradeoffs does Neo4j make? Why isn’t BASE good enough? Links Hadoop and Graph Databases for Bioinformatics Neo4j availability discussion (explores ZooKeeper option)
http://traffic.libsyn.com/sedaily/influxdb_pauldix.mp3Podcast: Play in new window | DownloadInfluxDB is an open-source time-series database. Time-series data can be used by for metrics and analytics. Paul Dix is the CEO of InfluxDB. Questions What differentiates InfluxDB from a regular database with a timestamp on every entry? What is the full-stack architecture of a typical user of InfluxDB? Why are distributed time series databases so hard? What CAP tradeoffs does InfluxDB make? Does Go’s
http://traffic.libsyn.com/sedaily/pipelinedb_derek.mp3Podcast: Play in new window | DownloadPipelineDB is a streaming SQL database. Derek Nelson is the CEO of PipelineDB. Questions What are continuous views? Why is PipelineDB a good fit for the Kafka+Storm+HBase-type architecture? How does PipelineDB affect the application tier or the browser tier? What are the latency guarantees for how long it takes raw data streams to be converted into the refined queries provided by a continuous view?
http://traffic.libsyn.com/sedaily/rethinkdb_slava.mp3Podcast: Play in new window | DownloadRethinkDB is an open-source database for the realtime web. RethinkDB pushes changes to the application rather than waiting for a request. Slava Akhmechet is the CEO of RethinkDB. Questions RethinkDB supports a “push” model rather than request handling. Why? What are some use cases for pushing data? What does the full-stack architecture look like when the database has push? What did you learn from the
http://traffic.libsyn.com/sedaily/memsql_nikita_2.mp3Podcast: Play in new window | DownloadMemSQL is a high-performance, in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL. Nikita Shamgunov is co-founder and CTO of MemSQL. Questions What types of data does a user want to keep on disk versus on an in-memory database? How does MemSQL compare to MySQL? How do MemSQL users leverage Apache Spark? How does a user onboard with
Database Week is the fourth theme of Software Engineering Daily. A database is an organized collection of data. It is the collection of schemes, tables, queries, reports, views and other objects. Some modern databases are doing much more than this. As applications grow to have new types of responsibilities, common patterns and functionality are being folded into the database layer. Other new databases adhere to the classic description, and provide the classic, desired
Fundamental questions as big as data itself loomed at the beginning of Big Data Week. Some answers: How do customers of multiple managed big data companies deal with the heterogeneity? Confluent provides Kafka, Rocana provides ops, Databricks gives you data science, Cloudera and Hortonworks give you everything else. Each company has a proprietary layer meshed with open-source software. Generally, the more proprietary software you are running, the more you will need
The result of high number of database products is due to the amount of Data we generate. Yad Faeq via Quora You’ve hinted to the term long tail for databases, which leads to a very interesting discussion. Chris Anderson explains the long tail among the entertainment industry in this talk, the same basis may apply to technology and specifically data. Here are just a few applications of data that I
http://traffic.libsyn.com/sedaily/presto_chris.mp3Podcast: Play in new window | DownloadPresto is a low latency SQL language built for interactive analysis. Christopher Berner works on Presto at Facebook. Questions: Is Presto for data scientists, developers, or everyone? What are the problems with Hive? How does Hive break a query into mapreduces? How do the clients, coordinators, and workers interact? Is Presto both fast and cheap? How does Presto tune Java to get speed