SlideShare une entreprise Scribd logo
1  sur  22
Messaging Architecture @FB

      Joydeep Sen Sarma
Background
• WAS:
  – key participant design team (10/2009-04/2010)
  – vocal proponent of final consensus architecture

• WAS NOT:
   – Part of actual implementation
   – Involved in FB chat backend

• IS NOT: Facebook Employee since 08/2011
Problem
• 1Billion Users

• Volume:
  – 25 messages/day * 4KB (exclude attachments)
  – 10 TB per day


• Indexes/Summaries:
  – Keyword/Threadid/Label Index
  – Label/Thread message counts
Problem – Cont.
• Must have: Cross continent copy
  – Ideally concurrent updates across regions
  – At least disaster recoverable


• No Single Failure Points for Entire Service
  – FB has downtime on a few MySql databases/day
  – No one cares


• Cannot Cannot Cannot lose Data
Solved Problem
• Attachment Store
  – HayStack
  – Stores FB Photos
  – Optimized for Immutable Data


• Hiring best programmers available 
  – Choose best design, not implementation
  – But get things done Fast
Write Throughput
• Disk:
  – Need Log Structured container
  – Can store small messages inline
  – Can store keyword index as well
  – What about read performance?


• Flash/Memory
  – Expensive
  – Only metadata
LSM Trees




• High write throughput

• Recent Data Clustered
   – Nice! Fits a mailbox access pattern

• Inherently Snapshotted
   – Backups/DR should be easy
Reads?
• Write-Optimized => Read-Penalty

• Cache working set in App Server
  – At-Most one App Server per User.
  – All mailbox updates via Application Server
  – Serve directly from cache

• Cold-Start
  – LSM tree clustering should make retrieving recent
    messages/threads fast.
SPOF?
• Single Hbase/HDFS cluster?

• NO!
  – Lots of 100 node clusters
  – HDFS Namenode HA
Cassandra vs. HBase (abridged)
• Tested it out (c. 2010)
  – HBase held up, (FB Internal) Cassandra didn’t

• Tried to understand internals
  – HBase held up, Cassandra didn’t

• Really Really trusted HDFS
  – Stored PB of data for years with no loss

• Missing features in Hbase/HDFS can be added
Disaster Recovery (HBase)
1. Ship HLog to Remote Data Center real-time
2. Every-day update Remote Snapshot
3. Reset remote HLog

• No need to synchronize #2 and #3 perfectly
  – HLog replay is idempotent
Test!




    Try to avoid writing a cache in Java
What about Flash?
• In HBase:
  – Store recent LSM tree segments in Flash
  – Store HBase block cache
  – Inefficient in Cassandra! (3x LSM trees/cache)


• In App Server
  – Page /in out User cache from Flash
Lingering Doubts
• Small Components vs. Big Systems
  – Small Components are better
  – Is HDFS too big?
     • Separate DataNode, BlockManager, NameNode
     • HBase doesn’t need NameNode


• Gave up on Cross-DC concurrency
  – Partition Users if required
  – Global user->DC registry needs to deal with
    partitions and conflict resolution
  – TBD
Cassandra vs. HBase
Cassandra: Flat Earth
• The world is hierarchical
  – PCI Bus, Rack, Data Center, Region, Continent ..
  – Odds of Partitioning differ


vs.

• Symmetric hash ring spanning continents
  – Odds of partitioning considered constant
Cassandra – No Centralization
• The world has central (but HA) tiers:
  – DNS servers, Core-Switches, Memcache-Tier, …


• Cassandra: all servers independent
  – No authoritative commit log or snapshot
  – Do Repeat Your Reads (DRYR) paradigm
Philosophies have Consequences
• Consistent Reads are expensive
  – N=3, R=2, W=2
  – Ugh: why are reads expensive in write optimized
    system?


• Is Consistency foolproof ?
  – Edge cases with failed writes
  – Internet still debating
  – If Science has Bugs – then imagine Code!
Distributed Storage vs. Database
• How to recover failed block or disk?

• Distributed Storage (HDFS):
  – Simple - Find other replicas for that block.


• Distributed Database (Cassandra):
  – A ton of my databases lived on that drive
  – Hard: Let’s merge all the affected databases
Eventual Consistency
• Read-Modify-Write pattern problematic
  1. Read value
  2. Apply Business Logic
  3. Write value
  Stale Read leads to Junk


• What about atomic increments?
Conflict Resolution
• Easy to resolve conflicts in Increments

• Imagine multi-row transactions
  – Pointless resolving conflicts at row level


Solve conflicts at highest possible layer
  – Transaction Monitor
How did it work out?
• Ton of missing Hbase/HDFS features added
  – Bloom Filters, Namenode HA
  – Remote Hlog shipping
  – Modified Block Placement Policy
  – Sticky Regions
  – Improved Block Cache
  –…
• User -> AppServer via Zookeeper
• App Server worked out

Contenu connexe

Tendances

HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 

Tendances (20)

HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
Hadoop - How It Works
Hadoop - How It WorksHadoop - How It Works
Hadoop - How It Works
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 

En vedette

Facebook Messenger: The Path to Monetization
Facebook Messenger: The Path to Monetization Facebook Messenger: The Path to Monetization
Facebook Messenger: The Path to Monetization Alan Alden
 
Facebook plateform architecture presentation
Facebook plateform architecture   presentationFacebook plateform architecture   presentation
Facebook plateform architecture presentationInam Soomro
 
WhatsApp architecture
WhatsApp architectureWhatsApp architecture
WhatsApp architectureMahesh Bitla
 
9 Examples of the Inspiring Architecture of Zaha Hadid
9 Examples of the Inspiring Architecture of Zaha Hadid9 Examples of the Inspiring Architecture of Zaha Hadid
9 Examples of the Inspiring Architecture of Zaha HadidDr. Ehsan Bayat
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Paulo Gandra de Sousa
 

En vedette (6)

Facebook Messenger: The Path to Monetization
Facebook Messenger: The Path to Monetization Facebook Messenger: The Path to Monetization
Facebook Messenger: The Path to Monetization
 
Facebook plateform architecture presentation
Facebook plateform architecture   presentationFacebook plateform architecture   presentation
Facebook plateform architecture presentation
 
WhatsApp architecture
WhatsApp architectureWhatsApp architecture
WhatsApp architecture
 
9 Examples of the Inspiring Architecture of Zaha Hadid
9 Examples of the Inspiring Architecture of Zaha Hadid9 Examples of the Inspiring Architecture of Zaha Hadid
9 Examples of the Inspiring Architecture of Zaha Hadid
 
Zaha hadid
Zaha hadidZaha hadid
Zaha hadid
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 

Similaire à Messaging architecture @FB (Fifth Elephant Conference)

Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014Hassan Islamov
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase IntroductionHanborq Inc.
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 

Similaire à Messaging architecture @FB (Fifth Elephant Conference) (20)

Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase Introduction
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
No sql databases
No sql databasesNo sql databases
No sql databases
 

Dernier

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Messaging architecture @FB (Fifth Elephant Conference)

  • 1. Messaging Architecture @FB Joydeep Sen Sarma
  • 2. Background • WAS: – key participant design team (10/2009-04/2010) – vocal proponent of final consensus architecture • WAS NOT: – Part of actual implementation – Involved in FB chat backend • IS NOT: Facebook Employee since 08/2011
  • 3. Problem • 1Billion Users • Volume: – 25 messages/day * 4KB (exclude attachments) – 10 TB per day • Indexes/Summaries: – Keyword/Threadid/Label Index – Label/Thread message counts
  • 4. Problem – Cont. • Must have: Cross continent copy – Ideally concurrent updates across regions – At least disaster recoverable • No Single Failure Points for Entire Service – FB has downtime on a few MySql databases/day – No one cares • Cannot Cannot Cannot lose Data
  • 5. Solved Problem • Attachment Store – HayStack – Stores FB Photos – Optimized for Immutable Data • Hiring best programmers available  – Choose best design, not implementation – But get things done Fast
  • 6. Write Throughput • Disk: – Need Log Structured container – Can store small messages inline – Can store keyword index as well – What about read performance? • Flash/Memory – Expensive – Only metadata
  • 7. LSM Trees • High write throughput • Recent Data Clustered – Nice! Fits a mailbox access pattern • Inherently Snapshotted – Backups/DR should be easy
  • 8. Reads? • Write-Optimized => Read-Penalty • Cache working set in App Server – At-Most one App Server per User. – All mailbox updates via Application Server – Serve directly from cache • Cold-Start – LSM tree clustering should make retrieving recent messages/threads fast.
  • 9. SPOF? • Single Hbase/HDFS cluster? • NO! – Lots of 100 node clusters – HDFS Namenode HA
  • 10. Cassandra vs. HBase (abridged) • Tested it out (c. 2010) – HBase held up, (FB Internal) Cassandra didn’t • Tried to understand internals – HBase held up, Cassandra didn’t • Really Really trusted HDFS – Stored PB of data for years with no loss • Missing features in Hbase/HDFS can be added
  • 11. Disaster Recovery (HBase) 1. Ship HLog to Remote Data Center real-time 2. Every-day update Remote Snapshot 3. Reset remote HLog • No need to synchronize #2 and #3 perfectly – HLog replay is idempotent
  • 12. Test!  Try to avoid writing a cache in Java
  • 13. What about Flash? • In HBase: – Store recent LSM tree segments in Flash – Store HBase block cache – Inefficient in Cassandra! (3x LSM trees/cache) • In App Server – Page /in out User cache from Flash
  • 14. Lingering Doubts • Small Components vs. Big Systems – Small Components are better – Is HDFS too big? • Separate DataNode, BlockManager, NameNode • HBase doesn’t need NameNode • Gave up on Cross-DC concurrency – Partition Users if required – Global user->DC registry needs to deal with partitions and conflict resolution – TBD
  • 16. Cassandra: Flat Earth • The world is hierarchical – PCI Bus, Rack, Data Center, Region, Continent .. – Odds of Partitioning differ vs. • Symmetric hash ring spanning continents – Odds of partitioning considered constant
  • 17. Cassandra – No Centralization • The world has central (but HA) tiers: – DNS servers, Core-Switches, Memcache-Tier, … • Cassandra: all servers independent – No authoritative commit log or snapshot – Do Repeat Your Reads (DRYR) paradigm
  • 18. Philosophies have Consequences • Consistent Reads are expensive – N=3, R=2, W=2 – Ugh: why are reads expensive in write optimized system? • Is Consistency foolproof ? – Edge cases with failed writes – Internet still debating – If Science has Bugs – then imagine Code!
  • 19. Distributed Storage vs. Database • How to recover failed block or disk? • Distributed Storage (HDFS): – Simple - Find other replicas for that block. • Distributed Database (Cassandra): – A ton of my databases lived on that drive – Hard: Let’s merge all the affected databases
  • 20. Eventual Consistency • Read-Modify-Write pattern problematic 1. Read value 2. Apply Business Logic 3. Write value Stale Read leads to Junk • What about atomic increments?
  • 21. Conflict Resolution • Easy to resolve conflicts in Increments • Imagine multi-row transactions – Pointless resolving conflicts at row level Solve conflicts at highest possible layer – Transaction Monitor
  • 22. How did it work out? • Ton of missing Hbase/HDFS features added – Bloom Filters, Namenode HA – Remote Hlog shipping – Modified Block Placement Policy – Sticky Regions – Improved Block Cache –… • User -> AppServer via Zookeeper • App Server worked out