Alluxio and Memory-centric Distributed Storage with Haoyuan Li

haoyuan-li

“Its not really about removing disk from the picture per se – it’s more like saying, ‘how do we leverage more and more resources from DRAM?’ ”

Memory is king. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing.

Alluxio is an open source, memory-centric, distributed, and reliable storage system enabling data sharing across clusters at memory speed. Alluxio was formerly known as Tachyon. Haoyuan Li is the creator of Alluxio. Haoyuan was a member of the Berkeley AMPLab, which is the same research facility from which Apache Mesos and Apache Spark were born. In this episode, we discuss Alluxio, Spark, Hadoop, and the evolution of the data center software architecture.

Questions

  • Why is the growing throughput of memory so important to the big data stack?
  • How has memory hierarchy evolved over time?
  • Should we start migrating all of the functionality of disk to RAM?
  • What are the problems with needing to replicate to disk?
  • What is underFS?
  • How often do nodes fail in a typical cluster?
  • What is lineage based storage?
  • How does the workflow of a data scientist or data engineer change with the addition of Alluxio?

Links

Sponsors

hired-logo Hired.com is the job marketplace for software engineers. Go to hired.com/softwareengineeringdaily to get a $2000 bonus upon landing a job through Hired.
wealthfront-logo Wealthfront is the automated investment service that manages your investments online. Check out wealthfront.com/sedaily to get your first $15,000 managed for free, as a listener of Software Engineering Daily.

Comments