Apache Arrow with Uwe Korn

In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in Python–each of these different technologies has a different format for how data is represented.

 

Serialization and deserialization between these different formats causes significant latency across the overall system. Apache Arrow is a tool for improving performance of in-memory analytics systems, and today’s guest Uwe Korn explains how Arrow enables these systems with interoperability.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.