Pinterest is a visual feed of ideas, products, clothing, and recipes. Millions of users browse Pinterest to find images and text that are tailored to their interests.
Like most companies, Pinterest started with a large monolithic application that served all requests. As Pinterest’s engineering resources expanded, some of the architecture was broken up into microservices and Dockerized, which make the system easier to reason about.
To serve users with better feeds, Pinterest built a machine learning pipeline using Kafka, Spark, and Presto. User events are generated from the frontend, logged onto Kafka, and aggregated to build machine learning models. These models are deployed into Docker containers much like the production microservices.
Kinnary Jangla is a senior software engineer at Pinterest, and she joins the show to talk about her experiences at the company–breaking up the monolith, architecting a machine learning pipeline, and deploying those models into production.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Sumo Logic is a cloud-native, machine data analytics service that helps you Run and Secure your Modern Application. If you are feeling the pain of managing your own log, event, and performance metrics data, check out sumologic.com/sedaily. Even if you have tools already, it’s worth checking out Sumo Logic and seeing if you can leverage your data even more effectively, with real-time dashboards and monitoring, and improved observability – to improve the uptime of your application and keep your day-to-day runtime more secure. Check out sumologic.com/sedaily for a free 30-day Trial of Sumo Logic, to find out how Sumo Logic can improve your productivity and your application observability–wherever you run your applications. That’s sumologic.com/sedaily.
There’s a new open source project called Dremio that is designed to simplify analytics. It’s also designed to handle some of the hard work, like scaling performance of analytical jobs. Dremio is the team behind Apache Arrow, a new standard for in-memory columnar data analytics. Arrow has been adopted across dozens of projects – like Pandas – to improve the performance of analytical workloads on CPUs and GPUs. It’s free and open source, designed for everyone, from your laptop, to clusters of over 1,000 nodes. At dremio.com/sedaily you can find all the necessary resources to get started with Dremio for free. If you like it, be sure to tweet @dremiohq and let them know you heard about it from Software Engineering Daily. Thanks again to Dremio, and check out dremio.com/sedaily to learn more.
Amazon Redshift powers the analytics of your business–and Intermix.io powers the analytics of your Redshift. Intermix.io gives you the tools you need to analyze your Amazon Redshift performance and improve the toolchain of everyone downstream from your data warehouse. The team at Intermix has seen so many Redshift clusters, they are confident they can solve whatever performance issues you are having. Go to intermix.io/sedaily
to get a free 30-day trial. Intermix collects all your Redshift logs and makes it easy to figure out what’s wrong so you can take action. All in a nice, intuitive dashboard. Go to intermix.io/sedaily
to start your free 30-day trial.