data lakes

Sort by:

Spark and Streaming with Matei Zaharia

Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory

Streaming Architecture with Ted Dunning

Streaming architecture defines how large volumes of data make their way through an organization. Data is created at a user’s smartphone, or on a sensor inside of a conveyor belt at a