Spark Geospatial Analytics with Ram Sriharsha

Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also called “geospatial data.”

The amount of geospatial data is rapidly increasing, and there is a growing demand for software to perform operations over that data. Geospatial data sets are often massive–so it is non-trivial to perform operations over this data.

Geospatial data can consist of something as simple as a set of latitude/longitude data points. A single lat/long coordinate pair can be enriched with information about what ZIP code it is in, how far that data point is from the other data points in the set, and where the nearest coffee shop is in relation to that data point.

Ram Sriharsha created Magellan, a geospatial analytics library for Spark. In today’s show, Ram describes the set of problems within the domain of geospatial analytics engineering. Ram also works as a product manager for Apache Spark at Databricks.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Casper combines supportive memory foams for a sleep surface that’s got just the right sink and just the right bounce. Plus, its breathable design sleeps cool to help you regulate your temperature through the night. And, buying a Casper mattress is completely risk free. Casper offers free delivery and free returns with a 100-night home trial. If you don’t love it, they’ll pick it up and give you a full refund. As a special offer to Software Engineering Daily listeners, get $50 toward select mattresses by visiting casper.com/sedaily and using code SEDAILY at checkout. Terms and conditions apply.


Triplebyte is a company that connects engineers with top tech companies. We’re running an experiment and our hypothesis is that Software Engineering Daily listeners will do well above average on the quiz. Go to triplebyte.com/sedaily and take the multiple-choice quiz, and in a few episodes we’ll share some stats about how you all did. Try it yourself at triplebyte.com/sedaily.



Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.

 


GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out gocd.org/sedaily and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.