Category Data

Spring Data with John Blum

http://traffic.libsyn.com/sedaily/SpringData.mp3Podcast: Play in new window | Download In the 1980s and the 1990s, most applications used only a relational database for their data management. In the early 2000s, software projects started to use an ever increasing number of data sources. MongoDB popularized the document database, which allows storage of objects that do not have a consistent schema. The Hadoop distributed file system enabled the redundant storage and efficient querying of

Continue reading…

Protocol Buffers with Kenton Varda

http://traffic.libsyn.com/sedaily/ProtocolBuffers.mp3Podcast: Play in new window | Download When engineers are writing code, they are manipulating objects. You might have a user object represented on your computer, and that user object has several different fields—a name, a gender, and an age. When you want to send that object across the network to a different computer, the object needs to be turned into a sequence of 1s and 0s that will travel

Continue reading…

Data Science Mindset with Zacharias Voulgaris

http://traffic.libsyn.com/sedaily/DataScienceMindset.mp3Podcast: Play in new window | Download A company’s approach to data can make or break the business. In the past, data was static. There was not much data, it sat in Excel, and it was interacted with on a nightly or monthly basis. Now, data is dynamic, real time and huge. To tap into available data, many industries have oriented themselves to becoming data intensive. With many new industry

Continue reading…

BigQuery with Jordan Tigani

http://traffic.libsyn.com/sedaily/BigQuery.mp3Podcast: Play in new window | Download Large-scale data analysis was pioneered by Google, with the MapReduce paper. Since then, Google’s approach to analytics has evolved rapidly, marked by papers such as Dataflow and Dremel. Dremel combined a column-oriented, distributed file system with a novel way of processing queries. A single Dremel query is distributed into a tree of servers, starting with the root server, splitting into the intermediate servers,

Continue reading…

Kafka at NY Times with Boerge Svingen

http://traffic.libsyn.com/sedaily/KafkaatNYT.mp3Podcast: Play in new window | Download The New York Times is a newspaper that evolved into a digital publication. Across its 166 year history, The Times has been known for longform journalistic quality, in addition to its ability to quickly churn out news stories. Some content on the New York Times is old but timeless “evergreen” content. Readers of the New York Times website are not only looking for the

Continue reading…

Dremio with Tomer Shiran

http://traffic.libsyn.com/sedaily/Dremio.mp3Podcast: Play in new window | Download The MapReduce paper was published by Google in 2004. MapReduce is an algorithm that describes how to do large-scale data processing on large clusters of commodity hardware. The MapReduce paper marked the beginning of the “big data” movement. The Hadoop project is an open source implementation of the MapReduce paper. Doug Cutting and Mike Cafarella wrote software that allowed anybody to use MapReduce,

Continue reading…

Alerting and Metrics with Clement Pang

http://traffic.libsyn.com/sedaily/ClementPang.mp3Podcast: Play in new window | Download An alert is a signal of problematic application behavior. When something unusual happens to your application, an alert can bring that anomaly to your attention. In order to detect unusual events, you need to define the norm. In order to define both normal and problematic behavior, you need metrics. Metrics are measurements of the behavior in your application. Metrics get created from logs

Continue reading…

Dashboarding and Query Latency with Tom O’Neill

http://traffic.libsyn.com/sedaily/PeriscopeData.mp3Podcast: Play in new window | Download A dashboard is a data visualization that aggregates metrics in a way that we can quickly understand. In a modern software company, everyone uses dashboards–from salespeople to DevOps to HR. Each dashboard represents a query that must be updated frequently, so that anyone looking at it is getting up-to-date information. The data set being queried might be getting updated quickly in the case

Continue reading…

Sales Software with Jean-Baptiste Escoyez

http://traffic.libsyn.com/sedaily/SalesSoftware.mp3Podcast: Play in new window | Download Most products do not sell themselves. Salespeople bridge the gap between a product creation and a customer who purchases it. People can make a good living on the internet selling niche products–if they can find their customers. The process of taking a large group of potential customers and narrowing it down to only the subset of those customers who will buy your product

Continue reading…

CosmosDB with Andrew Hoh

http://traffic.libsyn.com/sedaily/cosmosdb_edited.mp3Podcast: Play in new window | Download Different databases have different access patterns. Key-value, document, graph, and columnar databases are useful under different circumstances. For example, if you are a bank, and you have a database of customers and the transactions they have performed, the ideal access pattern for aggregating the total amount of all transactions might be a columnar store. If the transaction amounts are all in one column,

Continue reading…

Data Skepticism with Kyle Polich

http://traffic.libsyn.com/sedaily/dataskeptic_edited.mp3Podcast: Play in new window | Download With a fast-growing field like data science, it is important to keep some amount of skepticism. Tools can be overhyped, buzzwords can be overemphasized, and people can forget the fundamentals. If you have bad data, you will get bad results in your experimentation. If you don’t know what statistical approach you want to take to your data, it doesn’t matter how well you

Continue reading…

Data Intensive Applications with Martin Kleppmann

http://traffic.libsyn.com/sedaily/dataintensive_edited_fixed.mp3Podcast: Play in new window | Download A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do multiple things well, and often have overlapping functionality. Application architecture becomes less straightforward. The applications we are building today are data-intensive rather than compute-intensive. Netflix needs to

Continue reading…

RealmDB with Brian Munkholm

http://traffic.libsyn.com/sedaily/realmdb_edited.mp3Podcast: Play in new window | Download Expectations for mobile apps have gone up steadily since the iPhone was released. But the choice of databases built for mobile apps has remained limited mostly to SQLite. RealmDB was created as a new option for mobile developers on iOS, Android, or any other mobile platform.   Realm is not just a database. It is a database platform, offering a variety of systems

Continue reading…

Columnar Data: Apache Arrow and Parquet with Julien Le Dem and Jacques Nadeau

http://traffic.libsyn.com/sedaily/columnardata_edited_fixed.mp3Podcast: Play in new window | Download Column-oriented data storage allows us to access all of the entries in a database column quickly and efficiently. Columnar storage formats are mostly relevant today for performing large analytics jobs. For example, if you are a bank, and you want to get the sum of all of the financial transactions that took place on your system in the last week, you don’t want

Continue reading…

Data Engineering with Pete Soderling

http://traffic.libsyn.com/sedaily/hakkalabs_edited.mp3Podcast: Play in new window | Download In the last five years, companies started hiring data engineers. A data engineer creates the systems that manage and access the huge volumes of data that are accumulating on cheap cloud servers. As the saying goes, “it’s more expensive to throw out the data than to store it.” Pete Soderling joins the show to discuss the rise of the data engineer, and how

Continue reading…

Database as a Service with Eliot Horowitz

http://traffic.libsyn.com/sedaily/mongoservice_editedfixed1.mp3Podcast: Play in new window | Download Eight years ago, MongoDB was an internal project at 10gen, a company that was trying to build a platform-as-a-service out of open-source components. The team at 10gen realized that the platform-as-a-service play would be too complex, and difficult to build. Since MongoDB was the most valuable component of that project, they narrowed their focus to this new document-oriented database. In today’s episode, MongoDB

Continue reading…

Database Choices and Uber with Markus Winand

http://traffic.libsyn.com/sedaily/uber_database_edited.mp3Podcast: Play in new window | Download When Uber’s engineering team published a blog post about moving to MySQL from Postgres, Markus Winand started receiving lots of email. Markus writes about databases on his blog “Use The Index, Luke,” a guide to database performance for developers. The people emailing Markus wanted to know–if Postgres doesn’t work well for Uber, is it safe to use for anyone? Markus wrote a detailed

Continue reading…

Uber’s Postgres Problems with Evan Klitzke

http://traffic.libsyn.com/sedaily/Uber_DBs.mp3Podcast: Play in new window | Download When a company switches the relational database it uses, you wouldn’t expect the news of the switch to go viral. Most engineers are not interested in the subtle differences between MySQL and Postgres, right?   Uber recently switched from having Postgres as its main relational database to using MySQL. Evan Klitzke wrote a detailed blog post about the migration, and post got very

Continue reading…

Relational Databases with Craig Kerstiens

http://traffic.libsyn.com/sedaily/RelationalDBs.mp3Podcast: Play in new window | Download Relational databases are used by most applications. MySQL, Postgres, Microsoft SQL Server, and other products implement the core features of a relational database in different ways. A developer who has never studied this space in detail may not know the differences between these databases, and in this episode we describe some tradeoffs that relational databases can make. Craig Kerstiens is an engineer at

Continue reading…

Peter Bailis on the Data Community’s Identity Crisis

http://traffic.libsyn.com/sedaily/database_crisis_edited_fixed.mp3Podcast: Play in new window | Download Breakthroughs in modern data research tend to come from companies like Google, Facebook, and Amazon, with projects like MapReduce, Cassandra, and Dynamo.   Twenty years ago, this types of breakthroughs would be happening in academia, which causes today’s guest Peter Bailis to ask: is the academic data community having an identity crisis?   Peter is an assistant professor at Stanford University, where he

Continue reading…

  • 1 2 4