SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Building Mission Critical Messaging System
              On Top Of HBase
              Guoqiang Jerry Chen, Liyin Tang, Facebook




Hadoop China 2011, Beijing
Facebook Messages

 Messages   Chats   Emails   SMS
Facebook Messages: brief history
▪   Project started in Dec 2009
▪   Oct 2010, selected users are in production
▪   Jan 2011, meaningful number of users are in production.
▪   Mid 2011, fully rolled out to all users (800M+), legacy messages for
    1B+ accounts are migrated over to HBase as well.
Facebook Messages: Quick Facts
▪   Unstable vs. stable storage:
    ▪   This is the permanent storage system for all private messages of
        users of Facebook, and it is the single source of truth  Stable
        Storage system, backup!
▪   Online vs. offline storage:
    ▪   It is online! Any problems with HBase are visible to the users  High
        availability, low latency.
Facebook Messages: Quick Stats
▪   8B+ messages/day


▪   Traffic to HBase
     ▪ 75+ Billion R+W ops/day

     ▪ At peak: 1.5M ops/sec

     ▪ ~ 55% Read vs. 45% Write ops

     ▪ Avg write op inserts ~16 records across multiple

       column families.
Facebook Messages: Quick Stats (contd.)

▪   2PB+ of online data in HBase (6PB+ with replication;
    excludes backups)
    ▪ message data, metadata, search index

▪   All data LZO compressed
▪   Growing at 250TB/month
Why we chose HBase
▪   High write throughput
▪   Good random read performance
▪   Horizontal scalability
▪   Automatic Failover
▪   Strong consistency
▪   Benefits of HDFS
    ▪   Fault tolerant, scalable, checksums, MapReduce
    ▪   internal dev & ops expertise
What do we store in HBase
▪   HBase
    ▪   Small messages
    ▪   Message metadata (thread/message indices)
    ▪   Search index


▪   Haystack (our photo store)
    ▪   Attachments
    ▪   Large messages
How do we make it work?
▪   Cell based architecture
▪   Schema Design/Iteration
▪   Stabilize HBase
▪   Monitoring and operations
Facebook Messages Architecture
                                                           User Directory Service
           Clients
   (Front End, MTA, etc.)
                            What’s the cell for
                               this user?         Cell 2
        Cell 1                                                   Cell 3

                                  Cell 1   Application Server
 Application Server                                  Application Server

                                                  Attachments
                                             HBase/HDFS/Z
  HBase/HDFS/Z                                Message, Metadata,
                                             K Search HBase/HDFS/Z
                                                        Index
  K                                                    K


                            Haystack
Cluster Layout For a HBase Cell
 ▪   Multiple clusters/cells for messaging
     ▪   20 servers/rack; 5 or more racks per cluster
 ▪   Controllers (master/Zookeeper) spread across racks
 ZooKeeper Peer       ZooKeeper Peer     ZooKeeper Peer    ZooKeeper Peer     ZooKeeper Peer
 HDFS Namenode        Backup Namenode    Job Tracker       HBase Master     Backup HBase Master


Region Server        Region Server      Region Server     Region Server      Region Server
Data Node            Data Node          Data Node         Data Node          Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker       Task Tracker



19x...                19x...            19x...            19x...             19x...


Region Server        Region Server      Region Server     Region Server      Region Server
Data Node            Data Node          Data Node         Data Node          Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker       Task Tracker

         Rack #1          Rack #2            Rack #3           Rack #4            Rack #5
Why multiple cells?

▪   If HBase goes down, it affects only a portion of the user base
    ▪   Upgrades that need a full cluster restart
    ▪   Bugs resulting in cluster outages


▪   Easy to move to new datacenters
▪   Enable us to do shadow testing – a cell with same load as production
    cells.
Messages Schema - Row Keys
•       Each user is one row:
    •    The entire messaging history of a user is in one HBase row. For
         some users, this can be as big as a few G.
    •    Use the row atomicity of HBase well!
    •    The message search index is also in the same row. This enables
         us to atomically update search index.
•       The row key is: md5(userid) + userid
    •    Randomly distribute users and avoid hot region problem.
    •    This also enables us to presplit the table.
Messages Schema & Evolution
▪   “Actions” (data) Column Family the source of truth
    ▪   Log of all user actions (addMessage, markAsRead, etc.)

▪   Metadata (thread index, message index, search index) etc. in other
    column families
▪   Metadata portion of schema underwent 3 changes:
        ▪   Coarse grained snapshots (early development; rollout up to 1M users)
        ▪   Hybrid (up to full rollout – 1B+ accounts; 800M+ active)
        ▪   Fine-grained metadata (after rollout)
▪   MapReduce jobs against production clusters!
        ▪   Ran in throttled way
        ▪   Heavy use of HBase bulk import features
Get/Put Latency (Before)
Get/Put Latency (After)
Stabilizing HBase
▪   Right before launch (Oct 2010), we created a production branch from
    Apache trunk, close to 0.89.20100924. Now the code base is on
    Apache under 0.89-fb.
▪   The refactored master code was coming in around the same time and
    we decided not to take it due to imminent product release.
▪   Continuously iterate on this branch and contribute back patches to
    Apache trunk.
▪   Sometime, knowing what to avoid can also help
    ▪   disable automatic split, instead presplit your table and set a huge
        split limit.
Stabilizing HBase
HBase has never been put into test at this scale.
The only way is to roll it out and iterate rapidly.
Shadow Testing: Before Rollout
▪   Both product and infrastructure were changing.
▪   Shadows the old messages product + chat while the new one was
    under development

           Messages                    Chat          Backfilled data from
          V1 Frontend                Frontend        MySQL for % of users and
                                                     mirrored their production
                                                     traffic


                                                      Application Servers



      Cassandra         MySQL           ChatLogger
                                                     HBase/HDFS Cluster


                        Production                        Shadow
Shadow Testing: After Rollout
▪   Shadows the live traffic for a production cell to a shadow cell.
▪   All backend changes go through shadow cluster before prod push

                      Messages
                      Frontend




                  Application Servers                 Application Servers




                                                     HBase/HDFS Cluster
                  HBase/HDFS Cluster


                                                          Shadow
                    Production
Shadow Testing
▪   A high % of issues caught in shadow/stress testing
▪   Sometimes, tried things which hadn’t been tested in shadow cluster!
     ▪   Added a rack of servers to help with performance issue
         ▪   Pegged top of the rack network bandwidth!
             ▪   Had to add the servers at much slower pace. Very manual .
             ▪   Intelligent load balancing needed to make this more automated.
Monitoring and Operation
▪   Monitoring/ops is a huge part of HBase reliability.
    ▪   Alerts (HBCK, memory alerts, perf alerts, health alerts)
    ▪   auto detecting/decommissioning misbehaving machines
    ▪   Dashboards
▪   HBase is not very DBA/application developer friendly yet. We’ve spent
    quite bit of effort there (slow query log, job monitor, etc). Have to go
    through master/region server logs in order to identify issues. Even
    worse, these logs do not make much sense unless you know the inner
    details of HBase.
▪   We are learning along the way.
Scares & Scars!
▪   Not without our share of scares and incidents:
    ▪   1/3 of region servers in many cells went down around the same time due to a bug in
        config management.
    ▪   Multiple region servers died around the same time due to DFS time out. It turns out, the
        NFS filer where we save the namenode log to is slow.
    ▪   HDFS Namenode – SPOF. It happens more often than you thought it would.
    ▪   s/w bugs. (e.g., deadlocks, incompatible LZO used for bulk imported data, etc.)
        ▪   found a edge case bug in log recovery a few weeks ago, which can cause data loss!
    ▪   performance spikes every 6 hours (even off-peak)!
        ▪   cleanup of HDFS’s Recycle bin was sub-optimal! Needed code and config fix.
    ▪   transient rack switch failures
Recent HBase Development At
Facebook
HBase Development at Facebook
▪   History with Apache HBase
    ▪   Based on Apache HBase 0.89
    ▪   Need time to stabilize one branch for production.


▪   Release Early, Release Soon
    ▪   Benefit from Open Source
    ▪   Contribute back to the Apache HBase 0.90, 0.92, 0.94 …


▪   HBase Engineer and Ops Team
    ▪   3 committers and 9 active contributors at Facebook
HBase Applications at Facebook
▪   Facebook Messages
▪   Realtime Data Streams and Analytics
    ▪   Ads Insight
    ▪   Open Graph


▪   More and more applications in the future…
Major Contributions from Facebook in 0.92
▪ HFile V2
▪ Compactions Improvement

    ▪   Off peak compactions
    ▪ Multi-thread compactions
    ▪ Improved compaction selection algorithms




▪ Distributed Log Splitting
▪ CacheOnWrite and EvictOnClose

▪ Online Schema Changes
Major Contributions from Facebook (Contd.)
▪   Rolling Region Server Restarts
▪   String-based Filter Language
▪   Per Table/CF Metrics
▪   Delete Optimizations
▪   Lazy Seek Optimizations (31%)
▪   Delete Family Bloom Filters (41%)
▪   Create Table with Presplit
▪   Backups
    ▪   RBU, CBU, DBU
    ▪   Multi CF Bulk Loader
HFile V2 Motiviation
Opening a HFile
 ▪   Loading when region server startup
     ▪   Block indexes and Bloom filters may take a while to load
     ▪   Dark launch #1: 250 MB × 20 regions = 5 GB of block index data to
         load before a region server is up
     ▪   No need to waste memory space for unused or infrequently CFs


 ▪   Cache miss/age out for bloom filter, huge outliers in terms of
     latency
     ▪   Need a cheaper way to load a small chunk of bloom filter
HFile V2 Motivation (Contd.)
Writing HFile
 ▪   Memory footprint when writing an HFile
     ▪   No need to hold all the bloom filter and block index when writing
     ▪   Write block index data and Bloom filter data along with keys/values
     ▪   More important with multithreaded compactions
Ease of bloom configuration
 ▪   No more guessing how many keys will be added to an HFile
     ▪   To allocate the blooms space and other config
 ▪   New Bloom filter settings:
     ▪   Target false positive rate
     ▪   Allocate bloom block as a fixed size (e.g. 128 K)
Write Operation
HFile

Data Block
Data Block
Data Block
                          Memory
                                                   1) Data Block
…..
Inline Bloom Filter       Root-Level Block Index
Inline Block Index
                          Bloom Filter Index
                                                   2) Write bloom/ index inline
                          Other File Info
Data Block
Data Block                                         3) Only hold root level index
Data Block
…..
                                                      in memory
Inline Bloom Filter
Inline Block Index


Multi-Level Block Index
Bloom Filter Index
Other File Info
Open/Read Operation
HFile

Data Block
Data Block                                         Open Operation
Data Block
                          Memory                   1) Version / backward compatible
…..
                                                   2) Loading data
Inline Bloom Filter       Root-Level Block Index
Inline Block Index
                          Bloom Filter Index
                          Other File Info          Read Operation
Data Block                                         1) Lookup Root level index/Bloom
Data Block
                                                   Index
Data Block
                                                   2) Go to leaf level index on demand
…..
Inline Bloom Filter
                                                      (Block Cache
Inline Block Index


Multi-Level Block Index
Bloom Filter Index
Other File Info
Compactions Improvement
▪   More problems!
    ▪   Read performance dips during      Small Compaction Q   Big Compaction Q
        peak
    ▪   Major compaction storms               Compaction         Compaction

    ▪   Large compactions bottleneck          Compaction         Compaction

▪   Enhancements/fixes:                       Compaction
    ▪   Staggered major compactions
                                              Compaction
    ▪   Multi-thread compactions;
        separate queues for small & big
        compactions
    ▪   Aggressive off-peak compactions
Compactions Improvement                                           Was
▪   Critical for read performance
▪   Old Algorithm:
#1. Start from newest file (file 0); include next file if:
    ▪   size[i] < size[i-1] * C (good!)
#2. Always compact at least 4 files, even if rule #1 isn’t met.

Solution:
                                                                  Now
#1. Compact at least 4 files, but only if eligible files found.
#2. Also, new file selection based on summation of sizes.
           size[i] < (size[0] + size[1] + …size[i-1]) * C
Metrics, metrics, metrics…
▪   Initially, only had coarse level overall metrics (get/put latency/ops;
    block cache counters).
▪   Slow query logging
▪   Added Per Table/Column Family stats for:
    ▪   ops counts, latency
    ▪   block cache usage & hit ratio
    ▪   memstore usage
    ▪   on-disk file sizes
    ▪   file counts
    ▪   bytes returned, bytes flushed, compaction statistics
    ▪   stats by block type (data block vs. index blocks vs. bloom blocks, etc.)
    ▪   bloom filter stats
Metrics (contd.)
▪   HBase Master Statistics:
    ▪   Number of region servers alive
    ▪   Number of regions
    ▪   Load balancing statistics
    ▪   ..
▪   All stats stored in Facebook’s Operational Data Store (ODS).
▪   Lots of ODS dashboards for debugging issues
    ▪   Side note: ODS planning to use HBase for storage pretty soon!
Backups
▪   Stage 1 – RBU or in-Rack Backup Unit
    ▪   Backup stored on the live DFS used by Hbase


▪   Stage 2 – CBU or in-Cluster Backup Unit
    ▪   Backup on another DFS in the same DC


▪   Stage 3 – DBU or cross-DataCenter Backup Unit
    ▪   Backed up on another DFS in a different DC
    ▪   Protects against DC level failures
Backups
 Protection against     Stage 1   Stage 2    Stage 3
                        (RBU)      (CBU)    (DBU/ISI)
   Operator errors        X         X          X
    Fast Restores         X         X
   Stable Storage                   X          X
Heterogeneous Racks                 X          X
Datacenter Separation                          X
Backups Tools
▪   Backup one, multiple or all CF’s in a table
▪   Backing up to Stage 1 (rbu), Stage 2 (cbu) and Stage 3 (dbu)
▪   Retain only last n versions of the backups
▪   Restore backups to a point in time
    ▪   Restore to a running or offline cluster
    ▪   Restore to the same table name or a new one
▪   Restore tables on the Stage 2 (cbu) as needed
▪   Verify a percentage of data in the backups (this may not be do-able
    without app knowledge)
▪   Alerts and dashboards for monitoring
Need to keep up as data grows on you!
▪   Rapidly iterated on several new features while in production:
    ▪   Block indexes up to 6GB per server! Cluster starts taking longer and
        longer. Block cache hit ratio on the decline.
        ▪   Solution: HFile V2
                ▪   Multi-level block index, Sharded Bloom Filters
    ▪   Network pegged after restarts
        ▪   Solution: Locality on full & rolling restart
    ▪   High disk utilization during peak
        ▪   Solution: Several “seek” optimizations to reduce disk IOPS
            ▪   Lazy Seeks (use time hints to avoid seeking into older HFiles)
            ▪   Special bloom filter for deletes to avoid additional seek
            ▪   Utilize off-peak IOPS to do more aggressive compactions during
Future Work
▪   Reliability, Availability, Scalability!
▪   Lot of new use cases on top of HBase in the works.
      ▪   HDFS Namenode HA
      ▪   Recovering gracefully from transient issues
      ▪   Fast hot-backups
      ▪   Delta-encoding in block cache
      ▪   Replication (Multi-Datacenter)
      ▪   Performance (HBase and HDFS)
      ▪   HBase as a service Multi-tenancy
      ▪   Region Assignment based on Data Locality
Thanks! Questions?
facebook.com/engineering
Open Operation
                 Open Operation
                 1) Version / backward compatible
                 2) Loading data

                 Write Operation
                 1) Data Block
                 2) Write bloom/ index inline
                 3) Only hold root level index in memory

                 Read Operation
                 1) Lookup Root level index/Bloom Index
                 2) Go to leaf level index on demand (Block Cache)

                 Scan Operation
                 1) SeekTo
                 2) Next
                 3) ReSeek
HBase-HDFS System Overview
                             Database Layer

   HBASE
       Master      Backup
                   Master
   Region           Region                    Region    ...
   Server           Server                    Server
                                                              Coordination Service
            Storage Layer

HDFS                                                   Zookeeper Quorum
   Namenode          Secondary Namenode                ZK      ZK           ...
                                                       Peer    Peer
Datanode Datanode           Datanode          ...

Contenu connexe

Tendances

Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesfeng1212
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014larsgeorge
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 

Tendances (17)

Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 

En vedette

Personality Development Indore:How to succeed in campus interviews by high pr...
Personality Development Indore:How to succeed in campus interviews by high pr...Personality Development Indore:How to succeed in campus interviews by high pr...
Personality Development Indore:How to succeed in campus interviews by high pr...PTE & IELTS training Online
 
Kanski ..."Lens".1aim.Net
Kanski ..."Lens".1aim.NetKanski ..."Lens".1aim.Net
Kanski ..."Lens".1aim.NetZakaria Ibrahim
 
Análisis exploratorio SPSS
Análisis exploratorio SPSSAnálisis exploratorio SPSS
Análisis exploratorio SPSSAl Cougar
 
B School Enlightener
B School EnlightenerB School Enlightener
B School Enlighteneraryan990
 
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar Experiences
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar ExperiencesCETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar Experiences
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar ExperiencesChicago eLearning & Technology Showcase
 
Personality Development classes Indore :Time management & personality develop...
Personality Development classes Indore :Time management & personality develop...Personality Development classes Indore :Time management & personality develop...
Personality Development classes Indore :Time management & personality develop...PTE & IELTS training Online
 
Hadoop基线选定
Hadoop基线选定Hadoop基线选定
Hadoop基线选定baggioss
 
YGAW (restitution)
YGAW (restitution)YGAW (restitution)
YGAW (restitution)af83media
 
Personal Response Systems In The Classroom
Personal Response Systems In The ClassroomPersonal Response Systems In The Classroom
Personal Response Systems In The Classroommatthewdvs1
 
Dept. of defense driving toward 0
Dept. of defense   driving toward 0Dept. of defense   driving toward 0
Dept. of defense driving toward 0Vaibhav Patni
 

En vedette (20)

MARKETING PARA INMOBILIARIAS
MARKETING PARA INMOBILIARIASMARKETING PARA INMOBILIARIAS
MARKETING PARA INMOBILIARIAS
 
Personality Development Indore:How to succeed in campus interviews by high pr...
Personality Development Indore:How to succeed in campus interviews by high pr...Personality Development Indore:How to succeed in campus interviews by high pr...
Personality Development Indore:How to succeed in campus interviews by high pr...
 
Kanski ..."Lens".1aim.Net
Kanski ..."Lens".1aim.NetKanski ..."Lens".1aim.Net
Kanski ..."Lens".1aim.Net
 
Bob brown
Bob brownBob brown
Bob brown
 
viêm
viêmviêm
viêm
 
Análisis exploratorio SPSS
Análisis exploratorio SPSSAnálisis exploratorio SPSS
Análisis exploratorio SPSS
 
B School Enlightener
B School EnlightenerB School Enlightener
B School Enlightener
 
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar Experiences
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar ExperiencesCETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar Experiences
CETS 2010, Amanda Schulze & Paul Dahl, Creating Winning Webinar Experiences
 
Product Market Study of ICT Industry in South Africa
Product Market Study of ICT Industry in South AfricaProduct Market Study of ICT Industry in South Africa
Product Market Study of ICT Industry in South Africa
 
Doc1
Doc1Doc1
Doc1
 
Proyecto Incredibox
Proyecto IncrediboxProyecto Incredibox
Proyecto Incredibox
 
Pavasaris
PavasarisPavasaris
Pavasaris
 
Personality Development classes Indore :Time management & personality develop...
Personality Development classes Indore :Time management & personality develop...Personality Development classes Indore :Time management & personality develop...
Personality Development classes Indore :Time management & personality develop...
 
Rudens
RudensRudens
Rudens
 
Hadoop基线选定
Hadoop基线选定Hadoop基线选定
Hadoop基线选定
 
YGAW (restitution)
YGAW (restitution)YGAW (restitution)
YGAW (restitution)
 
KL International Venture Capital Symposium 2011 flyer
KL International Venture Capital Symposium 2011 flyerKL International Venture Capital Symposium 2011 flyer
KL International Venture Capital Symposium 2011 flyer
 
Personal Response Systems In The Classroom
Personal Response Systems In The ClassroomPersonal Response Systems In The Classroom
Personal Response Systems In The Classroom
 
Vietnam ICT Summit 2012 itinerary - update 29 may
Vietnam ICT Summit 2012 itinerary - update 29 mayVietnam ICT Summit 2012 itinerary - update 29 may
Vietnam ICT Summit 2012 itinerary - update 29 may
 
Dept. of defense driving toward 0
Dept. of defense   driving toward 0Dept. of defense   driving toward 0
Dept. of defense driving toward 0
 

Similaire à [Hi c2011]building mission critical messaging system(guoqiang jerry)

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBaseEffective
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBasezpinter
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012larsgeorge
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturedrewz lin
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturemysqlops
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01jgregory1234
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构yiditushe
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 

Similaire à [Hi c2011]building mission critical messaging system(guoqiang jerry) (20)

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 

Plus de baggioss

Hdfs写流程异常处理
Hdfs写流程异常处理Hdfs写流程异常处理
Hdfs写流程异常处理baggioss
 
Hbase性能测试文档
Hbase性能测试文档Hbase性能测试文档
Hbase性能测试文档baggioss
 
Hbase使用hadoop分析
Hbase使用hadoop分析Hbase使用hadoop分析
Hbase使用hadoop分析baggioss
 
Hic 2011 realtime_analytics_at_facebook
Hic 2011 realtime_analytics_at_facebookHic 2011 realtime_analytics_at_facebook
Hic 2011 realtime_analytics_at_facebookbaggioss
 
Hdfs introduction
Hdfs introductionHdfs introduction
Hdfs introductionbaggioss
 
Hdfs原理及实现
Hdfs原理及实现Hdfs原理及实现
Hdfs原理及实现baggioss
 

Plus de baggioss (10)

Hdfs写流程异常处理
Hdfs写流程异常处理Hdfs写流程异常处理
Hdfs写流程异常处理
 
Hbase性能测试文档
Hbase性能测试文档Hbase性能测试文档
Hbase性能测试文档
 
Hbase使用hadoop分析
Hbase使用hadoop分析Hbase使用hadoop分析
Hbase使用hadoop分析
 
Hic 2011 realtime_analytics_at_facebook
Hic 2011 realtime_analytics_at_facebookHic 2011 realtime_analytics_at_facebook
Hic 2011 realtime_analytics_at_facebook
 
Hic2011
Hic2011Hic2011
Hic2011
 
Hdfs introduction
Hdfs introductionHdfs introduction
Hdfs introduction
 
Hbase
HbaseHbase
Hbase
 
Hdfs
HdfsHdfs
Hdfs
 
Hdfs
HdfsHdfs
Hdfs
 
Hdfs原理及实现
Hdfs原理及实现Hdfs原理及实现
Hdfs原理及实现
 

Dernier

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

[Hi c2011]building mission critical messaging system(guoqiang jerry)

  • 1. Building Mission Critical Messaging System On Top Of HBase Guoqiang Jerry Chen, Liyin Tang, Facebook Hadoop China 2011, Beijing
  • 2.
  • 3. Facebook Messages Messages Chats Emails SMS
  • 4.
  • 5. Facebook Messages: brief history ▪ Project started in Dec 2009 ▪ Oct 2010, selected users are in production ▪ Jan 2011, meaningful number of users are in production. ▪ Mid 2011, fully rolled out to all users (800M+), legacy messages for 1B+ accounts are migrated over to HBase as well.
  • 6. Facebook Messages: Quick Facts ▪ Unstable vs. stable storage: ▪ This is the permanent storage system for all private messages of users of Facebook, and it is the single source of truth  Stable Storage system, backup! ▪ Online vs. offline storage: ▪ It is online! Any problems with HBase are visible to the users  High availability, low latency.
  • 7. Facebook Messages: Quick Stats ▪ 8B+ messages/day ▪ Traffic to HBase ▪ 75+ Billion R+W ops/day ▪ At peak: 1.5M ops/sec ▪ ~ 55% Read vs. 45% Write ops ▪ Avg write op inserts ~16 records across multiple column families.
  • 8. Facebook Messages: Quick Stats (contd.) ▪ 2PB+ of online data in HBase (6PB+ with replication; excludes backups) ▪ message data, metadata, search index ▪ All data LZO compressed ▪ Growing at 250TB/month
  • 9. Why we chose HBase ▪ High write throughput ▪ Good random read performance ▪ Horizontal scalability ▪ Automatic Failover ▪ Strong consistency ▪ Benefits of HDFS ▪ Fault tolerant, scalable, checksums, MapReduce ▪ internal dev & ops expertise
  • 10. What do we store in HBase ▪ HBase ▪ Small messages ▪ Message metadata (thread/message indices) ▪ Search index ▪ Haystack (our photo store) ▪ Attachments ▪ Large messages
  • 11. How do we make it work? ▪ Cell based architecture ▪ Schema Design/Iteration ▪ Stabilize HBase ▪ Monitoring and operations
  • 12. Facebook Messages Architecture User Directory Service Clients (Front End, MTA, etc.) What’s the cell for this user? Cell 2 Cell 1 Cell 3 Cell 1 Application Server Application Server Application Server Attachments HBase/HDFS/Z HBase/HDFS/Z Message, Metadata, K Search HBase/HDFS/Z Index K K Haystack
  • 13. Cluster Layout For a HBase Cell ▪ Multiple clusters/cells for messaging ▪ 20 servers/rack; 5 or more racks per cluster ▪ Controllers (master/Zookeeper) spread across racks ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer HDFS Namenode Backup Namenode Job Tracker HBase Master Backup HBase Master Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker 19x... 19x... 19x... 19x... 19x... Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Rack #1 Rack #2 Rack #3 Rack #4 Rack #5
  • 14. Why multiple cells? ▪ If HBase goes down, it affects only a portion of the user base ▪ Upgrades that need a full cluster restart ▪ Bugs resulting in cluster outages ▪ Easy to move to new datacenters ▪ Enable us to do shadow testing – a cell with same load as production cells.
  • 15. Messages Schema - Row Keys • Each user is one row: • The entire messaging history of a user is in one HBase row. For some users, this can be as big as a few G. • Use the row atomicity of HBase well! • The message search index is also in the same row. This enables us to atomically update search index. • The row key is: md5(userid) + userid • Randomly distribute users and avoid hot region problem. • This also enables us to presplit the table.
  • 16. Messages Schema & Evolution ▪ “Actions” (data) Column Family the source of truth ▪ Log of all user actions (addMessage, markAsRead, etc.) ▪ Metadata (thread index, message index, search index) etc. in other column families ▪ Metadata portion of schema underwent 3 changes: ▪ Coarse grained snapshots (early development; rollout up to 1M users) ▪ Hybrid (up to full rollout – 1B+ accounts; 800M+ active) ▪ Fine-grained metadata (after rollout) ▪ MapReduce jobs against production clusters! ▪ Ran in throttled way ▪ Heavy use of HBase bulk import features
  • 19. Stabilizing HBase ▪ Right before launch (Oct 2010), we created a production branch from Apache trunk, close to 0.89.20100924. Now the code base is on Apache under 0.89-fb. ▪ The refactored master code was coming in around the same time and we decided not to take it due to imminent product release. ▪ Continuously iterate on this branch and contribute back patches to Apache trunk. ▪ Sometime, knowing what to avoid can also help ▪ disable automatic split, instead presplit your table and set a huge split limit.
  • 20. Stabilizing HBase HBase has never been put into test at this scale. The only way is to roll it out and iterate rapidly.
  • 21. Shadow Testing: Before Rollout ▪ Both product and infrastructure were changing. ▪ Shadows the old messages product + chat while the new one was under development Messages Chat Backfilled data from V1 Frontend Frontend MySQL for % of users and mirrored their production traffic Application Servers Cassandra MySQL ChatLogger HBase/HDFS Cluster Production Shadow
  • 22. Shadow Testing: After Rollout ▪ Shadows the live traffic for a production cell to a shadow cell. ▪ All backend changes go through shadow cluster before prod push Messages Frontend Application Servers Application Servers HBase/HDFS Cluster HBase/HDFS Cluster Shadow Production
  • 23. Shadow Testing ▪ A high % of issues caught in shadow/stress testing ▪ Sometimes, tried things which hadn’t been tested in shadow cluster! ▪ Added a rack of servers to help with performance issue ▪ Pegged top of the rack network bandwidth! ▪ Had to add the servers at much slower pace. Very manual . ▪ Intelligent load balancing needed to make this more automated.
  • 24. Monitoring and Operation ▪ Monitoring/ops is a huge part of HBase reliability. ▪ Alerts (HBCK, memory alerts, perf alerts, health alerts) ▪ auto detecting/decommissioning misbehaving machines ▪ Dashboards ▪ HBase is not very DBA/application developer friendly yet. We’ve spent quite bit of effort there (slow query log, job monitor, etc). Have to go through master/region server logs in order to identify issues. Even worse, these logs do not make much sense unless you know the inner details of HBase. ▪ We are learning along the way.
  • 25. Scares & Scars! ▪ Not without our share of scares and incidents: ▪ 1/3 of region servers in many cells went down around the same time due to a bug in config management. ▪ Multiple region servers died around the same time due to DFS time out. It turns out, the NFS filer where we save the namenode log to is slow. ▪ HDFS Namenode – SPOF. It happens more often than you thought it would. ▪ s/w bugs. (e.g., deadlocks, incompatible LZO used for bulk imported data, etc.) ▪ found a edge case bug in log recovery a few weeks ago, which can cause data loss! ▪ performance spikes every 6 hours (even off-peak)! ▪ cleanup of HDFS’s Recycle bin was sub-optimal! Needed code and config fix. ▪ transient rack switch failures
  • 27. HBase Development at Facebook ▪ History with Apache HBase ▪ Based on Apache HBase 0.89 ▪ Need time to stabilize one branch for production. ▪ Release Early, Release Soon ▪ Benefit from Open Source ▪ Contribute back to the Apache HBase 0.90, 0.92, 0.94 … ▪ HBase Engineer and Ops Team ▪ 3 committers and 9 active contributors at Facebook
  • 28. HBase Applications at Facebook ▪ Facebook Messages ▪ Realtime Data Streams and Analytics ▪ Ads Insight ▪ Open Graph ▪ More and more applications in the future…
  • 29. Major Contributions from Facebook in 0.92 ▪ HFile V2 ▪ Compactions Improvement ▪ Off peak compactions ▪ Multi-thread compactions ▪ Improved compaction selection algorithms ▪ Distributed Log Splitting ▪ CacheOnWrite and EvictOnClose ▪ Online Schema Changes
  • 30. Major Contributions from Facebook (Contd.) ▪ Rolling Region Server Restarts ▪ String-based Filter Language ▪ Per Table/CF Metrics ▪ Delete Optimizations ▪ Lazy Seek Optimizations (31%) ▪ Delete Family Bloom Filters (41%) ▪ Create Table with Presplit ▪ Backups ▪ RBU, CBU, DBU ▪ Multi CF Bulk Loader
  • 31. HFile V2 Motiviation Opening a HFile ▪ Loading when region server startup ▪ Block indexes and Bloom filters may take a while to load ▪ Dark launch #1: 250 MB × 20 regions = 5 GB of block index data to load before a region server is up ▪ No need to waste memory space for unused or infrequently CFs ▪ Cache miss/age out for bloom filter, huge outliers in terms of latency ▪ Need a cheaper way to load a small chunk of bloom filter
  • 32. HFile V2 Motivation (Contd.) Writing HFile ▪ Memory footprint when writing an HFile ▪ No need to hold all the bloom filter and block index when writing ▪ Write block index data and Bloom filter data along with keys/values ▪ More important with multithreaded compactions Ease of bloom configuration ▪ No more guessing how many keys will be added to an HFile ▪ To allocate the blooms space and other config ▪ New Bloom filter settings: ▪ Target false positive rate ▪ Allocate bloom block as a fixed size (e.g. 128 K)
  • 33. Write Operation HFile Data Block Data Block Data Block Memory 1) Data Block ….. Inline Bloom Filter Root-Level Block Index Inline Block Index Bloom Filter Index 2) Write bloom/ index inline Other File Info Data Block Data Block 3) Only hold root level index Data Block ….. in memory Inline Bloom Filter Inline Block Index Multi-Level Block Index Bloom Filter Index Other File Info
  • 34. Open/Read Operation HFile Data Block Data Block Open Operation Data Block Memory 1) Version / backward compatible ….. 2) Loading data Inline Bloom Filter Root-Level Block Index Inline Block Index Bloom Filter Index Other File Info Read Operation Data Block 1) Lookup Root level index/Bloom Data Block Index Data Block 2) Go to leaf level index on demand ….. Inline Bloom Filter (Block Cache Inline Block Index Multi-Level Block Index Bloom Filter Index Other File Info
  • 35. Compactions Improvement ▪ More problems! ▪ Read performance dips during Small Compaction Q Big Compaction Q peak ▪ Major compaction storms Compaction Compaction ▪ Large compactions bottleneck Compaction Compaction ▪ Enhancements/fixes: Compaction ▪ Staggered major compactions Compaction ▪ Multi-thread compactions; separate queues for small & big compactions ▪ Aggressive off-peak compactions
  • 36. Compactions Improvement Was ▪ Critical for read performance ▪ Old Algorithm: #1. Start from newest file (file 0); include next file if: ▪ size[i] < size[i-1] * C (good!) #2. Always compact at least 4 files, even if rule #1 isn’t met. Solution: Now #1. Compact at least 4 files, but only if eligible files found. #2. Also, new file selection based on summation of sizes. size[i] < (size[0] + size[1] + …size[i-1]) * C
  • 37. Metrics, metrics, metrics… ▪ Initially, only had coarse level overall metrics (get/put latency/ops; block cache counters). ▪ Slow query logging ▪ Added Per Table/Column Family stats for: ▪ ops counts, latency ▪ block cache usage & hit ratio ▪ memstore usage ▪ on-disk file sizes ▪ file counts ▪ bytes returned, bytes flushed, compaction statistics ▪ stats by block type (data block vs. index blocks vs. bloom blocks, etc.) ▪ bloom filter stats
  • 38. Metrics (contd.) ▪ HBase Master Statistics: ▪ Number of region servers alive ▪ Number of regions ▪ Load balancing statistics ▪ .. ▪ All stats stored in Facebook’s Operational Data Store (ODS). ▪ Lots of ODS dashboards for debugging issues ▪ Side note: ODS planning to use HBase for storage pretty soon!
  • 39. Backups ▪ Stage 1 – RBU or in-Rack Backup Unit ▪ Backup stored on the live DFS used by Hbase ▪ Stage 2 – CBU or in-Cluster Backup Unit ▪ Backup on another DFS in the same DC ▪ Stage 3 – DBU or cross-DataCenter Backup Unit ▪ Backed up on another DFS in a different DC ▪ Protects against DC level failures
  • 40. Backups Protection against Stage 1 Stage 2 Stage 3 (RBU) (CBU) (DBU/ISI) Operator errors X X X Fast Restores X X Stable Storage X X Heterogeneous Racks X X Datacenter Separation X
  • 41. Backups Tools ▪ Backup one, multiple or all CF’s in a table ▪ Backing up to Stage 1 (rbu), Stage 2 (cbu) and Stage 3 (dbu) ▪ Retain only last n versions of the backups ▪ Restore backups to a point in time ▪ Restore to a running or offline cluster ▪ Restore to the same table name or a new one ▪ Restore tables on the Stage 2 (cbu) as needed ▪ Verify a percentage of data in the backups (this may not be do-able without app knowledge) ▪ Alerts and dashboards for monitoring
  • 42. Need to keep up as data grows on you! ▪ Rapidly iterated on several new features while in production: ▪ Block indexes up to 6GB per server! Cluster starts taking longer and longer. Block cache hit ratio on the decline. ▪ Solution: HFile V2 ▪ Multi-level block index, Sharded Bloom Filters ▪ Network pegged after restarts ▪ Solution: Locality on full & rolling restart ▪ High disk utilization during peak ▪ Solution: Several “seek” optimizations to reduce disk IOPS ▪ Lazy Seeks (use time hints to avoid seeking into older HFiles) ▪ Special bloom filter for deletes to avoid additional seek ▪ Utilize off-peak IOPS to do more aggressive compactions during
  • 43. Future Work ▪ Reliability, Availability, Scalability! ▪ Lot of new use cases on top of HBase in the works. ▪ HDFS Namenode HA ▪ Recovering gracefully from transient issues ▪ Fast hot-backups ▪ Delta-encoding in block cache ▪ Replication (Multi-Datacenter) ▪ Performance (HBase and HDFS) ▪ HBase as a service Multi-tenancy ▪ Region Assignment based on Data Locality
  • 45. Open Operation Open Operation 1) Version / backward compatible 2) Loading data Write Operation 1) Data Block 2) Write bloom/ index inline 3) Only hold root level index in memory Read Operation 1) Lookup Root level index/Bloom Index 2) Go to leaf level index on demand (Block Cache) Scan Operation 1) SeekTo 2) Next 3) ReSeek
  • 46. HBase-HDFS System Overview Database Layer HBASE Master Backup Master Region Region Region ... Server Server Server Coordination Service Storage Layer HDFS Zookeeper Quorum Namenode Secondary Namenode ZK ZK ... Peer Peer Datanode Datanode Datanode ...