Tag Apache Spark

Spark Geospatial Analytics with Ram Sriharsha

http://traffic.libsyn.com/sedaily/2018_05_04_GeospatialAnalytics.mp3Podcast: Play in new window | Download Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also called “geospatial data.” The amount of geospatial data is rapidly increasing, and there is a growing demand for software to perform operations over that data. Geospatial data sets are often massive–so

Continue reading…

Spark and Streaming with Matei Zaharia

http://traffic.libsyn.com/sedaily/2018_02_26_SparkDelta.mp3Podcast: Play in new window | Download Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory for fast, iterative processing. Matei Zaharia created Spark with two goals: to provide a composable, high-level set of APIs for performing distributed processing; and to provide a unified engine for running

Continue reading…

MemSQL with Nikita Shamgunov

http://traffic.libsyn.com/sedaily/memsql_nikita_2.mp3Podcast: Play in new window | DownloadMemSQL is a high-performance, in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL. Nikita Shamgunov is co-founder and CTO of MemSQL. Questions What types of data does a user want to keep on disk versus on an in-memory database? How does MemSQL compare to MySQL? How do MemSQL users leverage Apache Spark? How does a user onboard with

Continue reading…