Spark in Practice with Holden Karau

holden-karau

“I found Spark and I was really excited because I’m a functional programming nerd, and it was written in Scala.”

Apache Spark has skyrocketed in popularity lately, arguably surpassing Hadoop as the hottest big data technology as of late. Even IBM has thrown its weight behind the framework, calling it the “most important new open source project in a decade”. In this episode, Holden joins Software Engineering Daily to discuss why Spark is growing in popularity and how developers can begin learning the framework.

Holden Karau is a principal engineer at IBM working with Apache Spark. She is also an author of Learning Spark, a technical guide for developers new to the data processing framework.

Questions

  • How do RDDs achieve fault tolerance?
  • How does the Spark API compare to the MapReduce API?
  • What is like to work with DataFrames in Spark?
  • Are there significant differences between what a data scientist should learn when working with Spark versus an engineer?
  • When did you originally get involved with Spark and what got you excited about it?
  • How does the perception of Spark and its ecosystem vary across the technology world?

Links

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.