Apache Spark

Sort by:

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

This article was originally written by Reza Shiftehfar on Uber’s Engineering Blog. Reposted with permission from Uber Engineering. Uber is committed to delivering safer and more

Spark Geospatial Analytics with Ram Sriharsha

Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also

Spark and Streaming with Matei Zaharia

Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory

MemSQL with Nikita Shamgunov

MemSQL is a high-performance, in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL. Nikita Shamgunov is co-founder and CTO of