Millions of Regions in HBase: Size Matters

Millions of Regions in HBase: Size Matters
PRESENTED BY
Francis Liu | toffer@apache.org
Virag Kothari | virag@apache.org

HBase @ Y! Grid
▪ Off-stage processing
› Batch
› Near Real Time
▪ Store lots of data
▪ Hosted Multi-tenant
▪ Performance
› Throughput
› Latency
› Availability
› Scale!

HBase
Client
HBase
Client
JobTracker Namenode
TaskTracker
DataNode
Namenode
RegionServer
DataNode
RegionServer
DataNode
RegionServer
DataNode
HBase MasterZookeeper
Quorum
HBase
Client
MR Client
M/R Task
TaskTracker
DataNode
M/R Task
TaskTracker
DataNode
MR Task
Compute Cluster HBase Cluster
Gateway/Launcher
Experience at Scale
▪ 6 Multitenant HBase clusters
▪ ~100k regions
▪ 50 - 700 nodes

Need to Scale
▪ Scale to Petabytes for Near Real-time
▪ Multi-tenant clusters still growing
▪ Web Crawl Cache
› ~2.3PB Table
› Batch Processing workload
› 80GB regions -> 20GB regions

Region
▪ Subset of a table’s key space
▪ Unit of work
▪ Load distribution
▪ Availability

Unit of Work
▪ Map reduce Split per Region
› Parallelism
› Compute/Recovery Time
› Skew
▪ Filters & Coprocessors
› Region boundaries
› Sparse filters -> scan timeouts
• 30mins to scan 80GB region
▪ Custom Applications
› Storm Grouping, etc

Load Distribution
▪ Load balancing granularity
▪ Fast as slowest region server
▪ Tasks per Region server (ie MapReduce)
› Limit running tasks (MAPREDUCE-5583)

Compaction
▪ Optimization for reads
▪ Less files to read the better
▪ Contend for I/O
▪ Cache Misses
▪ Write amplification
▪ Too Many Store files
› Blocked flushes (90 secs)

Regions and Compaction
▪ Optimization for reads
▪ Less files to read the better
▪ Contend for I/O
▪ Cache Misses
▪ Write amplification

HDFS
▪ Storefiles are broken up into blocks

What size then?
▪ As a general rule keep regions small-ish
▪ HDFS block size? (not there yet)

Scaling Region Count
▪ Master Region Management
› Creation, Assign, Balance, etc.
› Meta table
▪ Metadata
› HDFS scalability
› Zookeeper
› Region Server density

ZK Region Assignment
▪ Master orchestrates region assignment
▪ Region mapping tracked by master
memory, meta table and zookeeper
znodes
RS
Master
Zookeeper
Meta
Region 1
Region 2
RS

1. Master tries to assign region
2. RS transitions the region to open
3. Masters updates its in memory state
4. RS persists region state to META
Region transition example
RS
Master
Zookeeper
Meta
3
1
2
4
Region 1
Region 2
RS

Observations with 1M regions
▪ Complex
› 3 way communication
› Split brain problem
▪ Zookeeper
› More storage
› Operations like listing a znode is
not efficient
RS
Master
Zookeeper
Meta
Region 1
Region 2
RS

▪ Assignment
› ZK less assignment (HBASE-11059)
› No involvement of ZK
› Region assignment is controlled by
Master
› Better API’s - E.g scanning meta vs ls
on znode
▪ Unlock region states (HBASE-11290)
› Reduce CPU utilization
Enhancements - Assignment
Meta region
RS
Master Region 1
Region 2
RS

Performance Comparison
Assignment time for 1M regions
ZK (force-sync=yes) ZK (force-sync=no) ZK Less
1hr 16 mins 11 mins 11 mins

Single HOT meta
▪ Assignment info is persisted to meta
▪ 7GB in size for 1M
▪ Meta cannot split
▪ Large compactions
▪ Longer failover times
RS
Meta
Master Region 1
Region 2
RS

▪ Split meta (HBASE-11288)
› Distributed IO load
› Distributed caching
› Shorter scan time
› Distributed compaction
Master
Meta region
User region
User region
Meta region
RS
Meta region
User region
RS
Enhancements – Split Meta

Performance comparison
Split size: 200 MB
Meta split across 10 servers. Each server has 5 meta regions
Assignment time for 3M regions
Single Meta Split Meta
18 mins 10 mins

Scaling namenode operations
Longer time to create all region dirs under a single table dir
Namenode limitation to hold maximum 6.3 million files
TableDir
RegionDir1 RegionDir2 RegionDirN...

Namenode create file ops during region init for 5M normal table

Enhancements - Hierarchical region dir
● Approach - Buckets within table directory (Humungous table)
● E.g 3 letters of bucket names gives 4k buckets
TableDir
Bucket1 BucketM...
RegionDir1 RegionDirKRegionDir1 RegionDirK... ...

Namenode create file ops during region init for 5M humongous table

Region dir creation time - 4k buckets
1M regions 5M 10M
normal table 20 mins 4 hours 23 mins Doesn’t finish
humongous table 15 mins 48 secs 1 hour 27 mins 2hr 53 mins
Performance results

HBaseCon 2014
Thank You!
(We’re Hiring)

Millions of Regions in HBase: Size Matters

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Millions of Regions in HBase: Size Matters

Similaire à Millions of Regions in HBase: Size Matters (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

Millions of Regions in HBase: Size Matters