Spark in Practice with Holden Karau

Podcast Thursday, January 21 2016

Subscribe: RSS

holden-karau

“I found Spark and I was really excited because I’m a functional programming nerd, and it was written in Scala.”

Apache Spark has skyrocketed in popularity lately, arguably surpassing Hadoop as the hottest big data technology as of late. Even IBM has thrown its weight behind the framework, calling it the “most important new open source project in a decade”. In this episode, Holden joins Software Engineering Daily to discuss why Spark is growing in popularity and how developers can begin learning the framework.

Holden Karau is a principal engineer at IBM working with Apache Spark. She is also an author of Learning Spark, a technical guide for developers new to the data processing framework.

Questions

How do RDDs achieve fault tolerance?
How does the Spark API compare to the MapReduce API?
What is like to work with DataFrames in Spark?
Are there significant differences between what a data scientist should learn when working with Spark versus an engineer?
When did you originally get involved with Spark and what got you excited about it?
How does the perception of Spark and its ecosystem vary across the technology world?