Apache Arrow with Uwe Korn

In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in Python–each of these different technologies has a different format for how data is represented.

 

Serialization and deserialization between these different formats causes significant latency across the overall system. Apache Arrow is a tool for improving performance of in-memory analytics systems, and today’s guest Uwe Korn explains how Arrow enables these systems with interoperability.

Sponsors

If you want to learn about Kubernetes, go to Apprenda.com/sedaily for a free webinar series about Kubernetes. Apprenda provides cloud platform software that works with your existing applications and infrastructure. 
wealthfront-logo Wealthfront is the automated investment service that manages your investments online. Check out wealthfront.com/sedaily to get your first $15,000 managed for free, as a listener of Software Engineering Daily.

Comments