Thumbtack Infrastructure with Nate Kupp

Thumbtack is a marketplace for real-world services. On Thumbtack, people get their house painted, their dog walked, and their furniture assembled. With 40,000 daily marketplace transactions, the company handles significant traffic.

On yesterday’s episode, we explored how one aspect of Thumbtack’s marketplace recently changed, going from asynchronous matching to synchronous “instant” matching. In this episode, we zoom out to the larger architecture of Thumbtack, and how the company has grown through its adoption of managed services from both AWS and Google Cloud.

The word “serverless” has a few definitions. In the context of today’s episode, serverless is all about managed services like Google BigQuery, Google Cloud PubSub, and Amazon ECS. The majority of infrastructure at Thumbtack is built using services that automatically scale up and down. Application deployment, data engineering, queueing, and databases are almost entirely handled by cloud providers.

For the most part, Thumbtack is a “serverless” company. And it makes sense–if you are building a high-volume marketplace, you are not in the business of keeping servers running. You are in the business of improving your matching algorithms, your user experience, and your overall architecture. Paying for lots of managed services is more expensive than running virtual machines–but Thumbtack saves money from not having to hire site reliability engineers.

Nate Kupp leads the technical infrastructure team, and we met at QCon in San Francisco to talk about how to architect a modern marketplace. This was my third time attending QCon and as always I was impressed by the quality of presentations and conversations I had there. They were also kind enough to set up some dedicated space for podcasters like myself.

The most widely used cloud provider is AWS, but more and more companies that come on the show are starting to use some of the managed services from Google. The great news for developers is that integration between these managed services is pretty easy.

At Thumbtack, the production infrastructure on AWS serves user requests. The log of transactions that occur get pushed from AWS to Google Cloud, where the data engineering occurs. On Google Cloud, the transaction records are queued in Cloud PubSub, a message queueing service. Those transactions are pulled off the queue and stored in BigQuery, a system for storage and querying of high volumes of data.

BigQuery is used as the data lake to pull from when orchestrating machine learning jobs. These machine learning jobs are run in Cloud Dataproc, a managed service that runs Apache Spark. After training a model in Google Cloud, that model is deployed on the AWS side, where it serves user traffic. On the Google Cloud side, the orchestration of these different managed services is done by Apache Airflow, an open source tool that is one of the few pieces of infrastructure that Thumbtack does have to manage themselves on Google Cloud.

To find out more about the Thumbtack infrastructure, check out the video of the talk Nate gave at QCon San Francisco, or check out the Thumbtack Engineering Blog.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Women 2.0 is a company with a vision of gender equality in the tech world. Women 2.0 is a community, a media company, and a jobs platform that connects top female talent with engineering jobs around the world. At the new Women 2.0 jobs platform, find vetted jobs for women engineers, data scientists, and product managers. To find a job that is right for you, go to women2.com/sedaily. And if you are an engineering company, you can connect with top female tech talent on Women 2.0. Companies like Twitter, MongoDB, and Craigslist use Women 2.0 to find new hires. Go to women2.com/sedaily to find out how to post your company’s jobs to Women 2.0. Thanks to Women 2.0 for being a new sponsor of Software Engineering Daily.


Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.


Apica System helps companies with their end-user experience, focusing on availability and performance. Test, monitor, and optimize your applications with Apica System. Apica is hosting an upcoming webinar about API basics for big data analytics. You can also find past webinars, such as how to optimize websites for fast load time.