Apache Arrow with Uwe Korn
In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in Python–each of these different technologies has a different format for how data is represented.
Serialization and deserialization between these different formats causes significant latency across the overall system. Apache Arrow is a tool for improving performance of in-memory analytics systems, and today’s guest Uwe Korn explains how Arrow enables these systems with interoperability.
|If you want to learn about Kubernetes, go to Apprenda.com/sedaily for a free webinar series about Kubernetes. Apprenda provides cloud platform software that works with your existing applications and infrastructure.|
|Wealthfront is the automated investment service that manages your investments online. Check out wealthfront.com/sedaily to get your first $15,000 managed for free, as a listener of Software Engineering Daily.|