High Volume Logging with Steve Newman

Google Docs is used by millions of people to collaborate on documents together. With today’s technology, you could spend a weekend coding and build a basic version of a collaborative text editor. But in 2004 it was not so easy.

In 2004 Steve Newman built a product called Writely, which allowed users to collaborate on documents together. Initially, Writely was hosted on a single server that Steve managed himself. All of the reads and writes to the documents went through that single server. Writely rapidly grew in popularity, and Steve went through a crash course in distributed systems as he tried to keep up with the user base.

In 2006, Writely was acquired by Google, and Steve spent his next four years turning Writely into Google Docs. Eventually he moved onto other projects within Google—“Cosmo” and “Megastore Replication.” When Steve left the company in 2010, he took with him the lessons of logging and monitoring that keep Google’s infrastructure observable.

Large organizations have terabytes of log data to manage. This data streams off the servers that are running our applications. That log data gets processed in a “metrics pipeline” and turned into monitoring data. Monitoring data aggregates log data in a more presentable format.

Most of the log messages that get created will never be seen with human eyes. These logs get aggregated into metrics, then compressed, and (in many cases) eventually thrown away. Different companies have different sensitivity around their logs, so some companies may not garbage collect any of their logs!

When a problem occurs in our infrastructure, we need to be able to dig into our terabytes of log data and quickly find the root cause of a problem. If our log data is compressed and stored on disk, it will take longer to access it. But if we keep all of our logs in memory, it could get expensive.

To review: if I want to build a logging system from scratch today I need to build: a metrics pipeline for converting log data into monitoring data; a complicated caching system, a way to store and compress logs; a query engine that knows how to ask questions to the log storage system; a user interface so I don’t have to inspect these logs via command line…

The list of requirements goes on and on—which is why there is a huge industry around log management. And logging keeps evolving!  One example we covered recently is distributed tracing, which is used to diagnose requests that travel through multiple endpoints.

After Steve Newman left Google, he started Scalyr, a product that allows developers to consume, store, and query log messages. I was looking forward to talking to Steve about data engineering, and the query engine that Scalyr has architected, but we actually spent most of our conversation talking about the early days of Writely, and his time at Google—particularly the operational challenges of Google’s infrastructure. Full disclosure: Scalyr is a sponsor of Software Engineering Daily.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Amazon Redshift powers the analytics of your business–and Intermix.io powers the analytics of your Redshift. Intermix.io gives you the tools you need to analyze your Amazon Redshift performance and improve the toolchain of everyone downstream from your data warehouse. The team at Intermix has seen so many Redshift clusters, they are confident they can solve whatever performance issues you are having. Go to intermix.io/sedaily to get a free 30-day trial. Intermix collects all your Redshift logs and makes it easy to figure out what’s wrong so you can take action. All in a nice, intuitive dashboard. Go to intermix.io/sedaily to start your free 30-day trial.


IBM Cloud gives you all the tools you need to build cloud native applications. Use IBM Cloud Container service to easily manage the deployment of your Docker containers. For serverless applications, use IBM Cloud Functions for low cost, event-driven, scalability. If you like to work with a fully managed platform as a service, IBM Cloud Foundry gives you a cloud operating system to control your distributed application. IBM Cloud is built on top of open source tools, and integrates with all the third party services that you need to build, deploy, and manage your application. To start building with IBM today, go to softwareengineeringdaily.com/IBM and sign up for a free Lite account. With the Lite account, you can start building apps for free, and try numerous cloud services with no time restrictions. Check it out at softwareengineeringdaily.com/IBM.


ConsenSys is the largest blockchain company focused on building software on the Ethereum platform. They’ve developed Truffle, the most popular Ethereum development framework.Truffle is your Ethereum Swiss army knife and it is available for free by going to softwareengineeringdaily.com/consensys. Nearly 200,000 developers are working with Truffle and you can download it today and start building your own software on Ethereum. Learn about Truffle and download it directly from softwareengineeringdaily.com/consensys to get going on Ethereum development. And if you want to hear a show about one of these topics, send me a tweet @software_daily and tag @consensys with the topic you would like to hear about.


Who do you use for log management? I want to tell you about Scalyr, the first purpose built log management tool on the market. Most tools on the market utilize text indexing search, which is great… for indexing a book. But if you want to search logs, at scale, fast… it breaks down. Scalyr built their own database from scratch: the system is fast. Most searches take less than 1 second. In fact, 99% of their queries execute in <1 second.  Companies like OKCupid, Giphy and CareerBuilder use Scalyr. It was built by one of the founders of Writely (aka Google Docs). Scalyr has consumer grade UI, that scales infinitely. You can monitor key metrics, trigger alerts, and integrate with PagerDuty. It’s easy to use and did we mention: lightning fast. Give it a try today. It’s free for 90 days at softwareengineeringdaily.com/scalyr.