Spark Geospatial Analytics with Ram Sriharsha

Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also called “geospatial data.”

The amount of geospatial data is rapidly increasing, and there is a growing demand for software to perform operations over that data. Geospatial data sets are often massive–so it is non-trivial to perform operations over this data.

Geospatial data can consist of something as simple as a set of latitude/longitude data points. A single lat/long coordinate pair can be enriched with information about what ZIP code it is in, how far that data point is from the other data points in the set, and where the nearest coffee shop is in relation to that data point.

Ram Sriharsha created Magellan, a geospatial analytics library for Spark. In today’s show, Ram describes the set of problems within the domain of geospatial analytics engineering. Ram also works as a product manager for Apache Spark at Databricks.


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Software Daily

Software Daily

Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.