Category Data

Instacart Data Science with Jeremy Stanley

http://traffic.libsyn.com/sedaily/InstacartDataScience.mp3Podcast: Play in new window | Download Instacart is a grocery delivery service. Customers log onto the website or mobile app and pick their groceries. Shoppers at the store get those groceries off the shelves. Drivers pick up the groceries and drive them to the customer. This is an infinitely complex set of logistics problems, paired with a rich data set given by the popularity of Instacart. Jeremy Stanley is

Continue reading…

Data Teams with Rya Sciban

http://traffic.libsyn.com/sedaily/datateams_edited.mp3Podcast: Play in new window | Download A data-driven organization is more efficient because the company can learn what to focus on. In this episode, Edaena Salinas from The Women in Tech Show interviews Rya Sciban, Product Manager at Periscope Data, who explains the needs of data teams in an organization. We talked about what data analysis is and how this changes as the amount of data grows. Rya explained what

Continue reading…

CosmosDB with Andrew Hoh

http://traffic.libsyn.com/sedaily/cosmosdb_edited.mp3Podcast: Play in new window | Download Different databases have different access patterns. Key-value, document, graph, and columnar databases are useful under different circumstances. For example, if you are a bank, and you have a database of customers and the transactions they have performed, the ideal access pattern for aggregating the total amount of all transactions might be a columnar store. If the transaction amounts are all in one column,

Continue reading…

Data Skepticism with Kyle Polich

http://traffic.libsyn.com/sedaily/dataskeptic_edited.mp3Podcast: Play in new window | Download With a fast-growing field like data science, it is important to keep some amount of skepticism. Tools can be overhyped, buzzwords can be overemphasized, and people can forget the fundamentals. If you have bad data, you will get bad results in your experimentation. If you don’t know what statistical approach you want to take to your data, it doesn’t matter how well you

Continue reading…

Data Intensive Applications with Martin Kleppmann

http://traffic.libsyn.com/sedaily/dataintensive_edited_fixed.mp3Podcast: Play in new window | Download A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do multiple things well, and often have overlapping functionality. Application architecture becomes less straightforward. The applications we are building today are data-intensive rather than compute-intensive. Netflix needs to

Continue reading…

RealmDB with Brian Munkholm

http://traffic.libsyn.com/sedaily/realmdb_edited.mp3Podcast: Play in new window | Download Expectations for mobile apps have gone up steadily since the iPhone was released. But the choice of databases built for mobile apps has remained limited mostly to SQLite. RealmDB was created as a new option for mobile developers on iOS, Android, or any other mobile platform.   Realm is not just a database. It is a database platform, offering a variety of systems

Continue reading…

Columnar Data: Apache Arrow and Parquet with Julien Le Dem and Jacques Nadeau

http://traffic.libsyn.com/sedaily/columnardata_edited_fixed.mp3Podcast: Play in new window | Download Column-oriented data storage allows us to access all of the entries in a database column quickly and efficiently. Columnar storage formats are mostly relevant today for performing large analytics jobs. For example, if you are a bank, and you want to get the sum of all of the financial transactions that took place on your system in the last week, you don’t want

Continue reading…

Data Engineering with Pete Soderling

http://traffic.libsyn.com/sedaily/hakkalabs_edited.mp3Podcast: Play in new window | Download In the last five years, companies started hiring data engineers. A data engineer creates the systems that manage and access the huge volumes of data that are accumulating on cheap cloud servers. As the saying goes, “it’s more expensive to throw out the data than to store it.” Pete Soderling joins the show to discuss the rise of the data engineer, and how

Continue reading…

Database as a Service with Eliot Horowitz

http://traffic.libsyn.com/sedaily/mongoservice_editedfixed1.mp3Podcast: Play in new window | Download Eight years ago, MongoDB was an internal project at 10gen, a company that was trying to build a platform-as-a-service out of open-source components. The team at 10gen realized that the platform-as-a-service play would be too complex, and difficult to build. Since MongoDB was the most valuable component of that project, they narrowed their focus to this new document-oriented database. In today’s episode, MongoDB

Continue reading…

Database Choices and Uber with Markus Winand

http://traffic.libsyn.com/sedaily/uber_database_edited.mp3Podcast: Play in new window | Download When Uber’s engineering team published a blog post about moving to MySQL from Postgres, Markus Winand started receiving lots of email. Markus writes about databases on his blog “Use The Index, Luke,” a guide to database performance for developers. The people emailing Markus wanted to know–if Postgres doesn’t work well for Uber, is it safe to use for anyone? Markus wrote a detailed

Continue reading…

Uber’s Postgres Problems with Evan Klitzke

http://traffic.libsyn.com/sedaily/Uber_DBs.mp3Podcast: Play in new window | Download When a company switches the relational database it uses, you wouldn’t expect the news of the switch to go viral. Most engineers are not interested in the subtle differences between MySQL and Postgres, right?   Uber recently switched from having Postgres as its main relational database to using MySQL. Evan Klitzke wrote a detailed blog post about the migration, and post got very

Continue reading…

Relational Databases with Craig Kerstiens

http://traffic.libsyn.com/sedaily/RelationalDBs.mp3Podcast: Play in new window | Download Relational databases are used by most applications. MySQL, Postgres, Microsoft SQL Server, and other products implement the core features of a relational database in different ways. A developer who has never studied this space in detail may not know the differences between these databases, and in this episode we describe some tradeoffs that relational databases can make. Craig Kerstiens is an engineer at

Continue reading…

Peter Bailis on the Data Community’s Identity Crisis

http://traffic.libsyn.com/sedaily/database_crisis_edited_fixed.mp3Podcast: Play in new window | Download Breakthroughs in modern data research tend to come from companies like Google, Facebook, and Amazon, with projects like MapReduce, Cassandra, and Dynamo.   Twenty years ago, this types of breakthroughs would be happening in academia, which causes today’s guest Peter Bailis to ask: is the academic data community having an identity crisis?   Peter is an assistant professor at Stanford University, where he

Continue reading…

Apache Arrow with Uwe Korn

http://traffic.libsyn.com/sedaily/arrow_edited_fixed.mp3Podcast: Play in new window | Download In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in Python–each of these different technologies has a different format for how data is represented.   Serialization and deserialization between these different formats causes significant latency across the overall system. Apache Arrow is a tool for improving

Continue reading…

Cassandra Data Modeling with Jon Haddad

http://traffic.libsyn.com/sedaily/Cassandra_jon_haddad_Edited.mp3Podcast: Play in new window | Download Data modeling is the process of creating relationships and rules about objects, so that we can decide how to store them in a database. Data modeling defines how we store and query our database systems.   Today’s episode features a discussion of data modeling in Cassandra with Jon Haddad, an evangelist at Datastax. The distributed nature of Cassandra creates some unique rules around

Continue reading…

Cassandra Compliant ScyllaDB with Dor Laor

http://traffic.libsyn.com/sedaily/ScyllaDB_Edited.mp3Podcast: Play in new window | Download Apache Cassandra is a distributed database that can handle large amounts of data with no single point of failure. Since 2008, Cassandra has been widely adopted and the software and the community around it have grown steadily. A software developer interacting with Cassandra uses CQL, the Cassandra Query Language. ScyllaDB is another open-source database that has been created to be totally compatible with

Continue reading…

Scaling PostgreSQL with Citus Data’s Ozgun Erdogan

http://traffic.libsyn.com/sedaily/Citus_Data.mp3Podcast: Play in new window | Download Ten years ago, databases were much simpler. Most companies would only have one or two types of databases in production. Today, the age of one-size-fits-all is over. Companies have multiple databases to deal with different types of use cases, and databases have become distributed to multiple nodes in order to be scalable. Ozgun Erdogan of Citus Data joins the show to give us

Continue reading…

Kafka, Storm, and Cassandra: Keen IO’s Analytics Architecture with Dan Kador

http://traffic.libsyn.com/sedaily/Keen_Edited_3.mp3Podcast: Play in new window | Download The process of building a software project requires us to make so many architectural decisions. Which programming languages should be used? Which cloud service provider? Which database? A newer type of building block is the analytics platform. Companies need to track events, aggregate metrics, and change the user’s experience based on aggregated data. Dan Kador is a co-founder of Keen IO, and he

Continue reading…

Netflix’s Data Pipeline with Steven Wu

http://traffic.libsyn.com/sedaily/Netflix.mp3Podcast: Play in new window | Download At Netflix, 500 billion events and 1.3 petabytes of data are ingested by the system per day.  This includes video viewing activities, error logs, and performance events. On today’s episode, we dive deep into the data pipeline of Netflix, and how it evolved from their 1.0 version to the modern 2.0 version. Before listening to this episode, check out the blog post that

Continue reading…

Crate.io and Distributed SQL with Jodok Batlogg

http://traffic.libsyn.com/sedaily/Crate.mp3Podcast: Play in new window | Download Distributed databases are difficult to operate, and Crate.io wants to change that. Crate is a fast, scalable, easy-to-use SQL database that is built to run in containerized environments. An average software company runs several databases–MySQL for relational store, MongoDB for a document database, HDFS for blob storage and data warehouse, elastic search for search. On today’s show, Jodok Batlogg from Crate discuss the

Continue reading…

  • 1 2 4