SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
1
Erlang at Facebook
Eugene Letuchy
Apr 30, 2009
2
1 Facebook ... and Erlang
2 Story of Facebook Chat
3 Facebook Chat Architecture
4 Key Erlang Features
5 Then and Now
Agenda
3
Facebook ... and Erlang
4
The Facebook Environment
▪ The Site
▪ More than 200 million active users
▪ More than 3.5 billion minutes are spent on Facebook each day
▪ Fewer than 900 employees
▪ The Engineering Team
▪ Fast iteration: code gets out to production within a week
▪ Polyglot programming: interoperability is key
▪ Practical: high-leverage tools win
5
Erlang Projects
▪ Chat: the biggest and best known user
▪ AIM Presence: a JSONP validator
▪ Chat Jabber support (ejabberd)
6
Facebook Chat
7
2007: Facebook needs Chat
Messages, Wall, Links aren’t enough
8
Enter a Hackathon (Jan 2007)
▪ Chat started in one night of coding
▪ Floating conversation windows
▪ No buddy list
▪ One server (no distribution)
▪ Erlang was there!
9
Enter Eugene (Feb 2007)
▪ I joined Facebook after Chat Hackathon
▪ What is this Erlang?
▪ Spring 2007:
▪ Learning Erlang from Joe Armstrong's thesis
▪ Lots of prototyping
▪ Evaluating infrastructure needs
▪ Summer 2007:
▪ Chris Piro works on Erlang Thrift bindings
10
Let’s do this!
▪ Mid-Fall 2007: Chat becomes a “real” project
▪ 4 engineers, 0.5 designer
▪ Infrastructure components get built and improved
▪ Feb 2008: “Dark launch” testing begins
▪ Simulates load on the Erlang servers ... they hold up
▪ Apr 6, 2008: First real Chat message sent
▪ Apr 23, 2008: 100% rollout (Facebook has 70M users at the time)
11
Launch: April 2008
▪ Apr 6, 2008: gradual live rollout starts
▪ First message: "msn chat?"
▪ Apr 23, 2008: 100% rollout (to Facebook’s 70M users)
▪ Graph of sends in the first days of launch
0
3
6
9
12
15
Tue 00:00 12:00 Wed 00:00 12:00
millions of sends per hour
12
Chat ... one year later
▪ Facebook has 200M active users
▪ 800+ million user messages / day
▪ 7+ million active channels at peak
▪ 1GB+ in / sec at peak
▪ 100+ channel machines
▪ ~9-10 times the work at launch;
~2 as many machines
13
Chat Architecture
14
System challenges
▪ How does synchronous messaging work on the Web?
▪ “Presence” is hard to scale
▪ Need a system to queue and deliver messages
▪ Millions of connections, mostly idle
▪ Need logging, at least between page loads
▪ Make it work in Facebook’s environment
15
System overview
16
System overview - User Interface
Chat in the browser?
▪ Chat bar affixed to the bottom of each Facebook page
▪ Mix of client-side Javascript and server-side PHP
▪ Works around transport errors, browser differences
▪ Regular AJAX for sending messages, fetching conversation history
▪ Periodic AJAX polling for list of online friends
▪ AJAX long-polling for messages (Comet)
17
System Overview - Back End
How does the back end service requests?
▪ Discrete responsibilities for each service
▪ Communicate via Thrift
▪ Channel (Erlang): message queuing and delivery
▪ Queue messages in each user’s “channel”
▪ Deliver messages as responses to long-polling HTTP requests
▪ Presence (C++): aggregates online info in memory (pull-based presence)
▪ Chatlogger (C++): stores conversations between page loads
▪ Web tier (PHP): serves our vanilla web requests
18
System overview
19
Message send
Me:
Lunch?
Eugene:
Lunch?
1 - ajax
2a - thrift
2b - thrift
3 - long poll
20
Channel servers (Erlang)
21
Channel servers
Architectural overview
▪ One channel per user
▪ Web tier delivers messages for that user
▪ Channel State: short queue of sequenced messages
▪ Long poll for streaming (Comet)
▪ Clients make an HTTP request
▪ Server replies when a message is ready
▪ One active request per browser tab
22
channel application
messages
authentication
online list messages
23
Channel servers
Architectural details
▪ Distributed design
▪ User id space is partitioned (division of labor)
▪ Each partition is serviced by a cluster (availability)
▪ Presence aggregation
▪ Channel servers are authoritative
▪ Periodically shipped to presence servers
▪ Open source: Erlang, Mochiweb, Thrift, Scribe, fb303,et al.
24
Key Erlang Features we love
25
Concurrency
▪ Cheap parallelism at massive scale
▪ Simplifies modeling concurrent interactions
▪ Chat users are independent and concurrent
▪ Mapping onto traditional OS threads is unnatural
▪ Locality of reference
▪ Bonus: carries over to non-Erlang concurrent programming
26
Distribution
▪ Connected network of nodes
▪ Remote processes look like local processes
▪ Any node in a channel server cluster can route requests
▪ Naive load balancing
▪ Distributed Erlang works out-of-the-box (all nodes are trusted)
27
Fault Isolation
▪ Bugs in the initial versions of Chat:
▪ Process leaks in the Thrift bindings
▪ Unintended multicasting of messages
▪ Bad return state for presence aggregators
▪ (Horrible) bugs don’t kill a mostly functional system:
▪ C/C++ segfault takes down the OS process and your server state
▪ Erlang badmatch takes down an Erlang process
▪ ... and notifies linked processes
28
Error logging (Crash Reports)
▪ Any proc_lib-compliant process generates crash reports
▪ Error reports can be handled out of band (not where generated)
▪ Stacktraces point the way to bugs (functional languages win big here)
▪ ... but they could be improved with source line numbers
▪ Writing error_log handlers is simple:
▪ gen_event behavior
▪ Allows for massaging of the crash and error messages (binaries!)
▪ Thrift client in the error log
▪ WARNING: error logging can OOM the Erlang node
29
Hot code swapping
▪ Restart-free upgrades are awesome (!)
▪ Pushing new functional code for Chat takes ~20 seconds
▪ No state is lost
▪ Test on a running system
▪ Provides a safety net ... rolling back bad code is easy
▪ NOTE: we don’t use the OTP release/upgrade strategies
30
Monitoring and Error Recovery
▪ Supervision hierarchies
▪ Organize (and control) processes
▪ Organize thoughts
▪ Systematize restarts and error recovery
▪ simple_one_for_one for dynamic child processes
▪ net_kernel (Distributed Erlang)
▪ sends nodedown, nodeup messages
▪ any process can subscribe
▪ heart: monitors and restarts the OS process
31
Remote Shell
▪ To invoke:
> erl -name hidden -hidden -remsh <node_name> -setcookie <cookie>
Eshell V5.7.1 (abort with ^G)
(<node_name>)1>
▪ Ad-hoc inspection of a running node
▪ Command-and-control from a console
▪ Combines with hot code loading
32
Erlang top (etop)
▪ Shows Erlang processes, sorted by
reductions, memory and message
queue
▪ OS functionality ... for free
33
Hibernation
▪ Drastically shrink memory usage with erlang:hibernate/3
▪ Throws away the call stack
▪ Minimizes the heap
▪ Enters a wait state for new messages
▪ “Jumps” into a passed-in function for a received message
▪ Perfect for a long-running, idling HTTP request handler
▪ But ... not compatible with gen_server:call (and gen_server:reply)
▪ gen_server:call has its own receive() loop
▪ hibernate() doesn’t support have an explicit timeout
▪ Fixed with a few hours and a look at gen.erl
34
Symmetric MultiProcessing (SMP)
▪ Take advantage of multi-core servers
▪ erl -smp runs multiple scheduler threads inside the node
▪ SMP is emphasized in recent Erlang development
▪ Added to Erlang R11B
▪ Erlang R12B-0 through R13B include fixes and perf boosts
▪ Smart people have been optimizing our code for a year (!)
▪ Upgraded to R13B last night with about 1/3 less load
35
hipe_bifs
Cheating single assignment
▪ Erlang is opinionated:
▪ Destructive assignment is hard because it should be
▪ hipe_bifs:bytearray_update() allows for destructive array assignment
▪ Necessary for aggregating Chat users’ presence
▪ Don’t tell anyone!
36
Then and now Erlang in Progress
37
Then ... a steep learning curve
▪ Start of 2007:
▪ Few industry-focused English-language resources
▪ Few blogs (outside of Yariv’s and Joel Reymont’s)
▪ Code examples spread out and disorganized
▪ U.S. Erlang community limited in number and visibility
38
Now ...
▪ Programming Erlang (Jun 2007)
▪ Erlang Programming (upcoming...)
▪ More blogs and blog aggregators:
▪ Planet Erlang, Planet TrapExit
▪ Erlang Factory aggregates Erlang developments
▪ More code available:
▪ GitHub, CEAN
▪ More general-purpose Open Source Libraries
▪ U.S. -located conference and ErlLounges
39
(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
40

Contenu connexe

Tendances

Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Microservices: The Right Way
Microservices: The Right WayMicroservices: The Right Way
Microservices: The Right WayDaniel Woods
 
Deploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic BeanstalkDeploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic BeanstalkAmazon Web Services
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...Amazon Web Services
 
Hashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseHashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseStenio Ferreira
 
Introducing Cloudflare Workers
Introducing Cloudflare WorkersIntroducing Cloudflare Workers
Introducing Cloudflare WorkersMeghan Weinreich
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...Roberto Pérez Alcolea
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Amazon Web Services
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 

Tendances (20)

Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Microservices: The Right Way
Microservices: The Right WayMicroservices: The Right Way
Microservices: The Right Way
 
Deploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic BeanstalkDeploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic Beanstalk
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
 
Hashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseHashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs Enterprise
 
Introducing Cloudflare Workers
Introducing Cloudflare WorkersIntroducing Cloudflare Workers
Introducing Cloudflare Workers
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 

Similaire à Facebook chat architecture

Eugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookEugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookDario Salvelli
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environmentEuropean Collaboration Summit
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirBarry Jones
 
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel....NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...Karel Zikmund
 
Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14GABeech
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesNETWAYS
 
Building a Small Datacenter
Building a Small DatacenterBuilding a Small Datacenter
Building a Small Datacenterssuser4b98f0
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar seriesOpen Mainframe Project
 
Building a Small DC
Building a Small DCBuilding a Small DC
Building a Small DCAPNIC
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.Rainer Gerhards
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemAnnika Wickert
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Alan Quayle
 
ROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfAmine Bendahmane
 
Devit - forget about http requests
Devit  -  forget about http requestsDevit  -  forget about http requests
Devit - forget about http requestsIrina Scurtu
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 

Similaire à Facebook chat architecture (20)

Eugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookEugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat Facebook
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with Elixir
 
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel....NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
How we use Twisted in Launchpad
How we use Twisted in LaunchpadHow we use Twisted in Launchpad
How we use Twisted in Launchpad
 
Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
Building a Small Datacenter
Building a Small DatacenterBuilding a Small Datacenter
Building a Small Datacenter
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series
 
Building a Small DC
Building a Small DCBuilding a Small DC
Building a Small DC
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
 
ROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdf
 
Devit - forget about http requests
Devit  -  forget about http requestsDevit  -  forget about http requests
Devit - forget about http requests
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 

Plus de Udaya Kiran

Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPScalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPUdaya Kiran
 
Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber
       Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber       Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber
Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & CucumberUdaya Kiran
 
Whatsapp's Architecture
Whatsapp's ArchitectureWhatsapp's Architecture
Whatsapp's ArchitectureUdaya Kiran
 

Plus de Udaya Kiran (6)

Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPScalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
 
Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber
       Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber       Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber
Behavioural Testing Ruby/Rails Apps @ Scale - Rspec & Cucumber
 
Whatsapp's Architecture
Whatsapp's ArchitectureWhatsapp's Architecture
Whatsapp's Architecture
 
Test
TestTest
Test
 
Sample
SampleSample
Sample
 
Assert select
Assert selectAssert select
Assert select
 

Dernier

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Dernier (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Facebook chat architecture

  • 1. 1
  • 2. Erlang at Facebook Eugene Letuchy Apr 30, 2009 2
  • 3. 1 Facebook ... and Erlang 2 Story of Facebook Chat 3 Facebook Chat Architecture 4 Key Erlang Features 5 Then and Now Agenda 3
  • 4. Facebook ... and Erlang 4
  • 5. The Facebook Environment ▪ The Site ▪ More than 200 million active users ▪ More than 3.5 billion minutes are spent on Facebook each day ▪ Fewer than 900 employees ▪ The Engineering Team ▪ Fast iteration: code gets out to production within a week ▪ Polyglot programming: interoperability is key ▪ Practical: high-leverage tools win 5
  • 6. Erlang Projects ▪ Chat: the biggest and best known user ▪ AIM Presence: a JSONP validator ▪ Chat Jabber support (ejabberd) 6
  • 8. 2007: Facebook needs Chat Messages, Wall, Links aren’t enough 8
  • 9. Enter a Hackathon (Jan 2007) ▪ Chat started in one night of coding ▪ Floating conversation windows ▪ No buddy list ▪ One server (no distribution) ▪ Erlang was there! 9
  • 10. Enter Eugene (Feb 2007) ▪ I joined Facebook after Chat Hackathon ▪ What is this Erlang? ▪ Spring 2007: ▪ Learning Erlang from Joe Armstrong's thesis ▪ Lots of prototyping ▪ Evaluating infrastructure needs ▪ Summer 2007: ▪ Chris Piro works on Erlang Thrift bindings 10
  • 11. Let’s do this! ▪ Mid-Fall 2007: Chat becomes a “real” project ▪ 4 engineers, 0.5 designer ▪ Infrastructure components get built and improved ▪ Feb 2008: “Dark launch” testing begins ▪ Simulates load on the Erlang servers ... they hold up ▪ Apr 6, 2008: First real Chat message sent ▪ Apr 23, 2008: 100% rollout (Facebook has 70M users at the time) 11
  • 12. Launch: April 2008 ▪ Apr 6, 2008: gradual live rollout starts ▪ First message: "msn chat?" ▪ Apr 23, 2008: 100% rollout (to Facebook’s 70M users) ▪ Graph of sends in the first days of launch 0 3 6 9 12 15 Tue 00:00 12:00 Wed 00:00 12:00 millions of sends per hour 12
  • 13. Chat ... one year later ▪ Facebook has 200M active users ▪ 800+ million user messages / day ▪ 7+ million active channels at peak ▪ 1GB+ in / sec at peak ▪ 100+ channel machines ▪ ~9-10 times the work at launch; ~2 as many machines 13
  • 15. System challenges ▪ How does synchronous messaging work on the Web? ▪ “Presence” is hard to scale ▪ Need a system to queue and deliver messages ▪ Millions of connections, mostly idle ▪ Need logging, at least between page loads ▪ Make it work in Facebook’s environment 15
  • 17. System overview - User Interface Chat in the browser? ▪ Chat bar affixed to the bottom of each Facebook page ▪ Mix of client-side Javascript and server-side PHP ▪ Works around transport errors, browser differences ▪ Regular AJAX for sending messages, fetching conversation history ▪ Periodic AJAX polling for list of online friends ▪ AJAX long-polling for messages (Comet) 17
  • 18. System Overview - Back End How does the back end service requests? ▪ Discrete responsibilities for each service ▪ Communicate via Thrift ▪ Channel (Erlang): message queuing and delivery ▪ Queue messages in each user’s “channel” ▪ Deliver messages as responses to long-polling HTTP requests ▪ Presence (C++): aggregates online info in memory (pull-based presence) ▪ Chatlogger (C++): stores conversations between page loads ▪ Web tier (PHP): serves our vanilla web requests 18
  • 20. Message send Me: Lunch? Eugene: Lunch? 1 - ajax 2a - thrift 2b - thrift 3 - long poll 20
  • 22. Channel servers Architectural overview ▪ One channel per user ▪ Web tier delivers messages for that user ▪ Channel State: short queue of sequenced messages ▪ Long poll for streaming (Comet) ▪ Clients make an HTTP request ▪ Server replies when a message is ready ▪ One active request per browser tab 22
  • 24. Channel servers Architectural details ▪ Distributed design ▪ User id space is partitioned (division of labor) ▪ Each partition is serviced by a cluster (availability) ▪ Presence aggregation ▪ Channel servers are authoritative ▪ Periodically shipped to presence servers ▪ Open source: Erlang, Mochiweb, Thrift, Scribe, fb303,et al. 24
  • 25. Key Erlang Features we love 25
  • 26. Concurrency ▪ Cheap parallelism at massive scale ▪ Simplifies modeling concurrent interactions ▪ Chat users are independent and concurrent ▪ Mapping onto traditional OS threads is unnatural ▪ Locality of reference ▪ Bonus: carries over to non-Erlang concurrent programming 26
  • 27. Distribution ▪ Connected network of nodes ▪ Remote processes look like local processes ▪ Any node in a channel server cluster can route requests ▪ Naive load balancing ▪ Distributed Erlang works out-of-the-box (all nodes are trusted) 27
  • 28. Fault Isolation ▪ Bugs in the initial versions of Chat: ▪ Process leaks in the Thrift bindings ▪ Unintended multicasting of messages ▪ Bad return state for presence aggregators ▪ (Horrible) bugs don’t kill a mostly functional system: ▪ C/C++ segfault takes down the OS process and your server state ▪ Erlang badmatch takes down an Erlang process ▪ ... and notifies linked processes 28
  • 29. Error logging (Crash Reports) ▪ Any proc_lib-compliant process generates crash reports ▪ Error reports can be handled out of band (not where generated) ▪ Stacktraces point the way to bugs (functional languages win big here) ▪ ... but they could be improved with source line numbers ▪ Writing error_log handlers is simple: ▪ gen_event behavior ▪ Allows for massaging of the crash and error messages (binaries!) ▪ Thrift client in the error log ▪ WARNING: error logging can OOM the Erlang node 29
  • 30. Hot code swapping ▪ Restart-free upgrades are awesome (!) ▪ Pushing new functional code for Chat takes ~20 seconds ▪ No state is lost ▪ Test on a running system ▪ Provides a safety net ... rolling back bad code is easy ▪ NOTE: we don’t use the OTP release/upgrade strategies 30
  • 31. Monitoring and Error Recovery ▪ Supervision hierarchies ▪ Organize (and control) processes ▪ Organize thoughts ▪ Systematize restarts and error recovery ▪ simple_one_for_one for dynamic child processes ▪ net_kernel (Distributed Erlang) ▪ sends nodedown, nodeup messages ▪ any process can subscribe ▪ heart: monitors and restarts the OS process 31
  • 32. Remote Shell ▪ To invoke: > erl -name hidden -hidden -remsh <node_name> -setcookie <cookie> Eshell V5.7.1 (abort with ^G) (<node_name>)1> ▪ Ad-hoc inspection of a running node ▪ Command-and-control from a console ▪ Combines with hot code loading 32
  • 33. Erlang top (etop) ▪ Shows Erlang processes, sorted by reductions, memory and message queue ▪ OS functionality ... for free 33
  • 34. Hibernation ▪ Drastically shrink memory usage with erlang:hibernate/3 ▪ Throws away the call stack ▪ Minimizes the heap ▪ Enters a wait state for new messages ▪ “Jumps” into a passed-in function for a received message ▪ Perfect for a long-running, idling HTTP request handler ▪ But ... not compatible with gen_server:call (and gen_server:reply) ▪ gen_server:call has its own receive() loop ▪ hibernate() doesn’t support have an explicit timeout ▪ Fixed with a few hours and a look at gen.erl 34
  • 35. Symmetric MultiProcessing (SMP) ▪ Take advantage of multi-core servers ▪ erl -smp runs multiple scheduler threads inside the node ▪ SMP is emphasized in recent Erlang development ▪ Added to Erlang R11B ▪ Erlang R12B-0 through R13B include fixes and perf boosts ▪ Smart people have been optimizing our code for a year (!) ▪ Upgraded to R13B last night with about 1/3 less load 35
  • 36. hipe_bifs Cheating single assignment ▪ Erlang is opinionated: ▪ Destructive assignment is hard because it should be ▪ hipe_bifs:bytearray_update() allows for destructive array assignment ▪ Necessary for aggregating Chat users’ presence ▪ Don’t tell anyone! 36
  • 37. Then and now Erlang in Progress 37
  • 38. Then ... a steep learning curve ▪ Start of 2007: ▪ Few industry-focused English-language resources ▪ Few blogs (outside of Yariv’s and Joel Reymont’s) ▪ Code examples spread out and disorganized ▪ U.S. Erlang community limited in number and visibility 38
  • 39. Now ... ▪ Programming Erlang (Jun 2007) ▪ Erlang Programming (upcoming...) ▪ More blogs and blog aggregators: ▪ Planet Erlang, Planet TrapExit ▪ Erlang Factory aggregates Erlang developments ▪ More code available: ▪ GitHub, CEAN ▪ More general-purpose Open Source Libraries ▪ U.S. -located conference and ErlLounges 39
  • 40. (c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0 40