Slack Messaging Architecture with Keith Adams

Podcast Wednesday, November 28 2018

Subscribe: RSS

Slack is a real-time messaging system for work communication. On Slack, chat rooms as big as 100,000 people have productive conversations. This might sound like the same problem solved by social networks like Facebook, where billions of users communicate over a newsfeed. But the engineering constraints of a messaging system are different than that of a social network.

On a newsfeed, the order in which events appear is not chronological. Events can be out of order. You can miss events. When a user posts a message to a social network, there are not strict guarantees around when other people will see that message.

On Slack, messages have strong guarantees around arrival. When I send a message, everyone else who is in the room and connected should quickly receive that message as well. The messages need to be ordered and delivered exactly once. All messages on Slack are persisted.

We have covered the architecture and security model of Slack in previous shows. In today’s show, Keith Adams returns to discuss how messages are processed and broadcast in Slack. The problem of Slack’s messaging system is similar to the distributed systems problem of “atomic broadcast”, in which a single process broadcasts a message which needs to be received by all other processes correctly–or else received by none of them.

In Keith’s last show, he talked through the benefits of building a large system on PHP. He worked on infrastructure at Facebook, which was also a PHP application. It’s worth noting that both Slack and Facebook have scaled a monolithic architecture.

Show Notes

SE Daily

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.