Spark

Sort by:

Apache Beam with Frances Perry

Unbounded data streams create difficult challenges for our application architectures. The data never stops coming, and we are forced to assume that we will never know if or when we have

Peter Bailis on the Data Community’s Identity Crisis

Breakthroughs in modern data research tend to come from companies like Google, Facebook, and Amazon, with projects like MapReduce, Cassandra, and Dynamo.   Twenty years ago, this

Apache Arrow with Uwe Korn

In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in

Stream Processing at Uber with Danny Yuan

“Be aggressive in vision, but conservative in operation.” Uber is a transportation company with a high volume of temporal spacial data, constantly being collected from the devices of
uber-eng

Alluxio and Memory-centric Distributed Storage with Haoyuan Li

“Its not really about removing disk from the picture per se – it’s more like saying, ‘how do we leverage more and more resources from DRAM?’ ” Memory is king. The cost of
alluxio