Tag data lakes

Spark and Streaming with Matei Zaharia

http://traffic.libsyn.com/sedaily/2018_02_26_SparkDelta.mp3Podcast: Play in new window | Download Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory for fast, iterative processing. Matei Zaharia created Spark with two goals: to provide a composable, high-level set of APIs for performing distributed processing; and to provide a unified engine for running

Continue reading…

Streaming Architecture with Ted Dunning

http://traffic.libsyn.com/sedaily/2018_02_19_TedDunning.mp3Podcast: Play in new window | Download Streaming architecture defines how large volumes of data make their way through an organization. Data is created at a user’s smartphone, or on a sensor inside of a conveyor belt at a factory. That data is sent to a set of backend services that aggregate the data, organizing it and making it available to business analysts, application developers, and machine learning algorithms. The

Continue reading…