Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2qFyFKh.
Bing Wei examines the limitations that Slack's backend ran into and how they overcame them to scale from supporting small teams to serving large organizations of hundreds and thousands of users. She tells stories about the edge cache service, real-time messaging system and how they evolved for major product efforts including Grid and Shared Channels. Filmed at qconsf.com.
Bing Wei is a software engineer on the infrastructure team at Slack, working on its edge cache service. Before Slack, she was at Twitter, where she contributed to the open source RPC library Finagle, worked on core services for Tweets and Timelines, and led the migration of Tweet writes from the monolithic Rails application to the JVM-based microservices.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
slack-scalability
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
6. Our Mission:
To make people’s working
lives simpler, more pleasant,
and more productive.
4
7. From supporting small teams
To serving gigantic organizations of
hundreds of thousands of users
5
8. Slack Scale
◈ 6M+ DAU, 9M+ WAU
5M+ peak simultaneously connected
◈ Avg 10+ hrs/weekday connected
Avg 2+ hrs/weekday in active use
◈ 55% of DAU outside of US
6
14. Login flow in 2015
User
1. HTTP POST
with user’s token
2. HTTP Response:
a snapshot of the team &
websocket url
WebApp
Messaging Server3. Websocket:
real-time events
MySql
12
15. Real-time Events on WebSocket
User
Messaging Server
WebSocket:
100+ types of events
e.g. chat messages,
typing indicator,
files uploads,
files comments,
threads replies,
user presence changes,
user profile changes,
reactions, pins, stars,
channel creations,
app installations,
etc.
13
16. Login Flow in 2015
◈ Clients Architecture
○ Download a snapshot of entire team
○ Updates trickle in through the WebSocket
○ Eventually consistent snapshot of whole team
14
30. Flannel: Edge Cache Service
A query engine backed by cache
on edge locations
28
31. What are in Flannel’s cache
◈ Support big objects first
○ Users
○ Channels Membership
○ Channels
29
32. Login and Message Flow with Flannel
User Messaging Server
WebApp MySQL
Flannel
3. WebSocket:
Stream Json events
1. WebSocket
connection
2. HTTP Post:
download a snapshot
of the team
30
33. A Man in the Middle
User
Messaging Server
Flannel
Use real-time events to
update its cache
E.g. user creation,
user profile change,
channel creation,
user joins a channel,
channel convert to private
WebSocket WebSocket
31
44. Web Client Iterations
Flannel Just-In-Time Annotation
Right before Web clients are about to access
an object, Flannel pushes that object to clients.
42