Apache Arrow with Uwe Korn

Podcast Sunday, July 17 2016

Podcast: Play in new window | Download

Subscribe: RSS

In a typical data analytics system, there are a variety of technologies interacting. HDFS for storing files, Spark for distributed machine learning, pandas for data analysis in Python–each of these different technologies has a different format for how data is represented.

Serialization and deserialization between these different formats causes significant latency across the overall system. Apache Arrow is a tool for improving performance of in-memory analytics systems, and today’s guest Uwe Korn explains how Arrow enables these systems with interoperability.

Jeff

Exclusive Articles

VMware Tanzu GemFire and Next-Generation Real-Time Application Development

Uber’s LedgerStore and its Trillions of Indexes with Kaushik Devarajaiah

GraphQL vs. REST: What Are They, and Which Is Better for You?

Cloud Engineering

CodeRabbit and RAG for Code Review with Harjot Gill

Building Chess.com with Jay Severson

Mastodon with Eugen Rochko

Business and Philosophy

Startup Investing with George Mathew

KubeCon Special: Docker with Justin Cormack

Software Architecture with Josh Prismon

Greatest Hits

Hardening C++ with Bjarne Stroustrup

Surviving ChatGPT with Christian Hubicki

Special Episode with George Hotz

Hackers

Making React 70% faster with Aiden Bai of Million.js

Cross-functional Incident Management with Ashley Sawatsky and Niall Murphy

SDKs for your API with Sagar Batchu

Data

Hyperscaling SQL with Sam Lambert

Spring AI and Java in 2024

Iceberg at Netflix and Beyond with Ryan Blue