HBaseCon 2013: Compaction Improvements in Apache HBase

© Hortonworks Inc. 2011
Compaction Improvements in Apache HBase
Sergey Shelukhin
sergey@hortonworks.com

About me
•HBase committer since February 2013
•Member of Technical Staff at Hortonworks
•Twitter @sershe84
Architecting the Future of Big Data

Overview
•What are compactions?
•Default algorithm and improvements
•Enabling different implementations
•Algorithms for various scenarios
•Conclusions

What are compactions?

What are compactions?
•HBase writes out immutable files as data is added
–Each Store (CF+region) consists of these rowkey-ordered files
–Immutable => more files accumulate over time
–More files => slower reads
•Compaction rewrites several files into one
–Less files => faster reads
• Major compaction rewrites all files in a Store into one
–Can drop deleted records, tombstones and old versions
•In minor compaction, files to compact are selected
based on a heuristic

Compactions example
•Memstore fills up, files are flushed
•When enough files accumulate, they are compacted
MemStore
HDFS
writes
HFile
…
HFile HFile HFileHFile

Reads slow down w/o compactions
•If too many files accumulate, reads slow down
•Read latency over time without compactions:
0
5
10
15
20
25
0 3600 7200 10800 14400
Readlatency,ms.
Load test time, sec

But, compaction cause slowdowns
•Looks like lots of I/O for no apparent benefit
•Example effect on reads (note better average)
0
5
10
15
20
25
0 3600 7200 10800
Readlatency,ms
Load test time, sec

Default algorithm and improvements

Compaction tradeoffs
•Hbase resolves key conflicts by file age
–Therefore, can only compact contiguous files
•Large compactions are more efficient (less total I/O)
–However, they can cause long slowdown for clients
•Small compactions have less effect on clients
–However, in total you do more rewriting
•We want to compact similar files

Default algorithm in 0.94
•Ratio-based selection
–Look for files at most F times larger than the following files
–Also allows limiting file numbers and sizes
•Higher ratio => more aggressive (default 1.2)
•Example: 2 files minimum, 3 maximum, ratio 1.2
HFile HFile HFile HFile HFile
Too big!Too many files!OK.
•Usually good for typical accumulation of flushed files
•Not good for bulk load – unpredictable file sizes!

Off-peak compactions
•Good if you have variable load through the day
•HBASE-4463 - present in 0.94 (since 2011)
•Compact more aggressively during certain hours of
the day, when load is lower
•Set off-peak period via
– hbase.offpeak.start.hour,hbase.offpeak.end.hour (0-23)
•Then, set ratio via
– hbase.hstore.compaction.ratio.offpeak (default is 5)
•Only one "off-peak" compaction at a time, so load is
not totally prohibitive

Inefficiencies in default algorithm
•First valid selection is chosen
•Ratio is only considered for the first selected file
–Thus, other files in compaction may not be similar
•The solution found may not be the best one
–especially for bulk load, with unpredictable file sizes
Matches the ratio, but this is a bad selection
HFile

Exploring compaction selection
•There are usually not so many files, so looking at all
valid permutations and comparing quality is viable
•HBASE-7842 - "exploring" compaction selection
–Ratio checked for each file to choose good permutations
–When store is ok, try to compact the most files
–When store has too many files, try to eliminate some as
fast as possible
•On by default in 0.95/0.96
•Works with your old configuration settings

Examples and results
•In previous example
Not in ratio, dissimilar files
HFile
•On bulk loads of random size, depending on settings:
–loses only 0-10% efficiency in reducing files count;
–While reducing I/O 3-10 times
•Best results with ratio 1.3-1.4, 4 minimum files
In ratio, may be valid… But this has more files!

Enabling different implementations

Making compactions pluggable
•To allow further improvements, the code should be
easy to replace; not the case as of 0.94
•Initial implementation – p/o HBASE-7055, HBASE-7516
– make just the selection pluggable
•This is called "policy" (CompactionPolicy)
•Example usages
–exploring selection, mentioned previously
–tier-based selection (port from Facebook)

Making compactions more pluggable
• Other potential improvements are more involved
• Need to change other things (HBASE-7678)
• The meta-structure of the files (StoreFileManager, HBASE-7603)
–Group files by some key/time/… based scheme
–In memory/metadata only - filesystem structure or file format
changes would be a compatibility nightmare
–Example – LeveDB-style compactions, stripes
• Compactor to compact the files (Compactor)
–Example – large object store, levels, stripes
• Can replace parts together or separately (StoreEngine)
–E.g. level compactor only makes sense with level-aware store

Enabling compaction tuning
•Different tables (or even column families) have
different data and access patterns
•Compactions already have large number of knobs
•Starting with 0.96, they can be configured on table/CF
level (HBASE-7236)
•Example from the shell:
alter 'table1', CONFIGURATION => {'hbase.hstore.engine.class' =>
'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', ... }

Algorithms for various scenarios

Key ways to improve compactions
• Read from fewer files
–Separate files by row key, version, time, etc.
–Allows large number of files to be present, uncompacted
• Don't compact the data you don't need to compact
–For example, old data in OpenTSDB-like systems
–Obviously, results in less I/O
• Make compactions smaller
–Without too much I/O amplification or too many files
–Results in less compaction-related outages
• HBase works better with few large regions; however, large
compactions cause unavailability

How to avoid large compactions
•LevelDB compactions
–Files live on multiple levels
–Files on each level have non-overlapping row-key ranges
–…except level 0 (L0), where memstore flushes go
–Compact overlapping subsets of 2 level, data goes up a level
–Most read requests need only one file per level, plus all of L0
•Small compactions, few files per read, however...
–More I/O, as the data moves from level to level
–No major compactions – dropping deletes is not trivial
–Messes up file ordering due to file boundary overlaps
between levels – not readable correctly by default store

Stripe compactions (HBASE-7667)
• Somewhat like LevelDB, partition the keys inside each region/store
• But, only 1 level (plus optional L0)
• Compared to regions, partitioning is more flexible
–The default is a number of ~equal-sized stripes
• To read, just read relevant stripes + L0, if present
HFile HFile
Region start key: ccc eee
Row-key axis
iii: region end keyggg
H
HFileHFileHFile
HFile L0
get
'hbase'

Stripe compactions – writes
•Data flushed from MemStore into several files
•Each stripe compacts separately most of the time
MemStore
HDFS
HFile HFile
H
HFileHFileHFile
H
H
H
HFile

Stripe compactions – other
•Why L0?
–Bulk loaded files go to L0
–Flushes can also go into single L0 files (to avoid tiny files)
–Several L0 files are then compacted into striped files
•Can drop deletes if compacting one entire stripe +L0
–No need for major compactions, ever
•Compact 2 stripes together – rebalance if unbalanced
–Very rare, however - unbalanced stripes are not a huge deal
• Boundaries could be used to improve region splits in future

Stripe compactions - performance
•EC2, c1.xlarge, preload; then measure random read perf
–LoadTestTool + deletes + overwrites; measure random reads
0
500
1000
1500
2000
2500 3500 4500 5500 6500 7500 8500
Randomgetspersecond
Test time, sec.
Default gets-per-second, 30sec. MA
Stripe gets-per-second, 30sec. MA

Stripe compactions - performance
• On individual request level: median latency – same (1.6ms)
• However 90th pct - 15% improvement (~13ms to ~11ms),
• 99th pct – 20% improvement (~60 to ~47ms)
• While also sending ~18% more reads in ~4% less time
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20
Latency (ms) CDF
Default
Stripes (12)

Other stripe boundary schemes
•For sharded sequential keys (like OpenTSDB), compacting
old data again and again is not useful
•What if stripes split dynamically as they grow?
–If data is sequential, only a subset of stripes will grow
–Non-growing stripes never need to be compacted
HFileHFile HFile HFile
H
H
HFile
HFile
HFile
H
Rowkey space
Too big!
HFile H
Now this will hardly ever compact

Others in development – tier-based
•Tier-based compaction selection (HBASE-7055;
originally developed in Facebook)
–Old data may not be read as frequently, new data may all
be in cache so # of files does not matter, etc.
–So, during selection, dynamically arrange files into
tiers, and apply different rules (ratios, etc.) to them
•Simple example (only 2 tiers)
HFile HFile HFile
However, if old files are rarely read,
it's better to compact new first
HFile HFile HFile HFile
Looks like a good selection…

Others in development, or considered
•Large Object store (HBASE-7949)
•Partition files based on versions, timestamp, etc.
•LevelDB compactions (HBASE-7519)
•…more to come?

Resources
•HBase book section contains a lot of details on tuning
the default selection
–http://hbase.apache.org/book.html#compaction
–There are other knobs that may be poorly documented
•JIRAs to track the work done for compactions
–https://issues.apache.org/jira/browse/HBASE/component/12319905
•Design and configuration documentation for the new
compactions are attached to JIRAs
–Tier-based: HBASE-7055, stripe: HBASE-7667
–Book will be updated as things make it into trunk

Summary
•Compactions are a way to reduce the number of files to
read when getting data
•Compactions are expensive, so efficiency is important
•HBase 0.96 compactions
–contain automatic improvements to default algo
–are easier to improve, build upon, and configure
•Work in progress to improve compactions for Big Data
•Scenario-specific compaction algorithms are also
possible, and being worked on

Q & A

HBaseCon 2013: Compaction Improvements in Apache HBase

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à HBaseCon 2013: Compaction Improvements in Apache HBase

Similaire à HBaseCon 2013: Compaction Improvements in Apache HBase (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Dernier

Dernier (20)

HBaseCon 2013: Compaction Improvements in Apache HBase

Notes de l'éditeur