SafeGraph with Auren Hoffman

Machine learning tools are rapidly maturing. TensorFlow gave developers an open source version of Google’s internal machine learning framework. Cloud computing provides a cost effective, accessible way of training models. Edge computing allows for low latency deployments of models.

But even if you are a kid with a laptop who has learned all the machine learning algorithms, read all of the deep learning textbooks, and figured out how to use AWS, all of the tooling and education in the world doesn’t change the fact that you still need data to build models.

This illustrates why we need data-as-a-service.

A kid with a laptop has access to infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. As these tools build on each other, there has been an explosion of high-leverage software products. But the world of data sets remains crude and underdeveloped.

Think about some data sets you could take advantage of: the number of emergency room patients that come into a hospital with chest pain; the size of the average coffee mug; the principal component breakdown of sidewalk concrete in San Francisco.

SafeGraph is a company that offers data sets as a service. Auren Hoffman is the CEO of SafeGraph, and he joins the show to discuss why he started building SafeGraph and how he thinks about the state of publicly accessible data.

Auren was previously on the podcast, and I always enjoy talking to him–this was a great episode and I think you will like it as well. Full disclosure: LiveRamp is a sponsor of Software Engineering Daily, LiveRamp being the company that Auren created prior to SafeGraph.

Show Notes

Raj Chetty economic papers

Paul Graham “Keep Your Identity Small”

Auren Hoffman on Quora


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


There’s a new open source project called Dremio that is designed to simplify analytics. It’s also designed to handle some of the hard work, like scaling performance of analytical jobs. Dremio is the team behind Apache Arrow, a new standard for in-memory columnar data analytics. Arrow has been adopted across dozens of projects – like Pandas – to improve the performance of analytical workloads on CPUs and GPUs. It’s free and open source, designed for everyone, from your laptop, to clusters of over 1,000 nodes. At you can find all the necessary resources to get started with Dremio for free. If you like it, be sure to tweet @dremiohq and let them know you heard about it from Software Engineering Daily. Thanks again to Dremio, and check out to learn more.

A thank you to our sponsor, Datadog, a cloud monitoring platform bringing full visibility to dynamic infrastructure and applications. Create beautiful dashboards, set powerful, machine learning–based alerts, and collaborate with your team to resolve performance issues. Datadog integrates seamlessly with more than 200 technologies, including Google Cloud Platform, AWS, Docker, PagerDuty, and Slack. With fast installation and setup, plus APIs and open source libraries for custom instrumentation, Datadog makes it easy for teams to monitor every layer of their stack in one place. But don’t take our word for it—start a free trial today & Datadog will send you a free T-shirt! Visit to get started.  

Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at

There’s no need to reinvent the wheel when it comes to making your app “realtime.” PubNub makes it simple, enabling you to build immersive and interactive experiences on the web, on mobile phones, embedded into hardware, and any other device connected to the Internet. With powerful APIs, and a robust global infrastructure, you can stream geolocation data, send chat messages, turn on your sprinklers, or rock your baby’s crib when they start crying (PubNub literally powers IoT cribs). 70 SDKs for web, mobile, IoT, and more means you can start streaming data in realtime without a ton of compatibility headaches, and no need to build your own SDKs from scratch. Go to to get started. They offer a generous sandbox tier that’s free forever (until your app takes off).


Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.