Streaming Analytics with Scott Kidder

Podcast Friday, February 16 2018

Subscribe: RSS

When you go to a website where a video is playing, and your video lags, how does the website know that you are having a bad experience?

Problems with video are often not complete failures–maybe part of the video loads, and plays just fine, and then the rest of the video is buffering. You have probably experienced sitting in front of a video, waiting for it to load as the loading wheel mysteriously spins.

Since problems with video are often not complete failures, troubleshooting a problem with a user’s video playback is not as straightforward as just logging in whenever a crash occurs. You need to continuously monitor the video playback on every client device and aggregate it in a centralized system for analysis.

The centralized logging system will allow you to separate problems with a specific user from problems with the video service itself. A single user could have bad wifi, or have 50 tabs open with different videos. To identify problems that are caused by the video player rather than the user, you need to capture the playback from every video and every user.

Scott Kidder works at Mux, where he builds a streaming analytics system for video monitoring. In this episode, Scott explains how events make it from a video player onto the backend analytics system running on Kinesis and Apache Flink.

Events from the browser are constantly added to Kinesis (which is much like Kafka). Apache Flink reads those events off of Kinesis and maps reduces them to discover anomalies. For example, if 100 users watch a 20-minute cat video, and the video stops playing at minute 12 for all 100 users, there is probably some data corruption in that video. You would only be able to discover that by assessing all users.

Scott and I discussed the streaming infrastructure that he works on at Mux, as well as other streaming systems like Spark, Apache Beam, and Kafka.

This episode is the first in a short series about streaming data infrastructure. I wanted to do some shows in preparation for Strata Data conference in March in San Jose, which I will be attending thanks to a complimentary ticket from O’Reilly. O’Reilly has been kind enough to give me free tickets since Software Engineering Daily started and did not have the money to attend any conferences. If you want to attend Strata, you can use promo code PCSED to get 20% off.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.