Apache Spark

Sort by:

Notebooks at Netflix with Matthew Seal

Netflix has petabytes of data and thousands of workloads running across that data every day. These workloads generate movie recommendations for users, create dashboards for data analysts

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

This article was originally written by Reza Shiftehfar on Uber’s Engineering Blog. Reposted with permission from Uber Engineering. Uber is committed to delivering safer and more

Spark Geospatial Analytics with Ram Sriharsha

Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also

Spark and Streaming with Matei Zaharia

Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory

MemSQL with Nikita Shamgunov

MemSQL is a high-performance, in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL. Nikita Shamgunov is co-founder and CTO of