If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
1. HBase Metrics: What they mean to you
David S. Wang
dsw@cloudera.com
5/22/12
HBaseCon 2012. 5/22/12
Copyright 2012 Cloudera Inc. All rights reserved
2. Agenda
• Motivation
• Where to get metrics
• Collection methods
• Metrics:
• HBase Operations
• Memory-related
• Latency-related
• Host-level
• Takeaways
HBaseCon 2012. 5/22/12 2
Copyright 2012 Cloudera Inc. All rights reserved
3. Motivation
• To understand the steady-state
behavior of your HBase cluster
• To debug when something bad
happens
• To help evaluate changes
• To figure out when you need
to buy more hardware
HBaseCon 2012. 5/22/12 3
Copyright 2012 Cloudera Inc. All rights reserved
4. Where to get metrics
JVM JVM
Region server
Master
Region
Region
JVM
HDFS
Metrics can be obtained DataNode
from all of these areas
Host
CPUs Memory Disks Network
HBaseCon 2012. 5/22/12 4
Copyright 2012 Cloudera Inc. All rights reserved
5. Collection methods
• Dump to file
• web UI /jmx
• jconsole
• Ganglia
• Cloudera Manager
• Many other tools out
there – suggestions
welcome
HBaseCon 2012. 5/22/12 5
Copyright 2012 Cloudera Inc. All rights reserved
6. HBase Operations: compactions
• hbase.regionserver.compactionQueueSize – size of queue of compactions
waiting to be processed
• h.r.compactionSize_avg_time – average bytes processed per compaction
• h.r.compactionTime_avg_time – average time per compaction in
milliseconds
• h.r.compactionSize_num_ops, h.r.compactionTime_num_ops – total
number of compactions so far
• TIP: Spikes in h.r.compactionQueueSize could mean all of your regions
are growing at the same rate and need to split/compact at around the
same time – time to presplit your regions or turn off auto-compactions
HBaseCon 2012. 5/22/12 6
Copyright 2012 Cloudera Inc. All rights reserved
7. HBase Operations: flushes
• h.r.flushQueueSize – size of queue of flushes waiting to be
processed
• h.r.flushSize_avg_time – average bytes processed per flush
• h.r.flushTime_avg_time – average time per flush in milliseconds
• h.r.flushSize_num_ops, h.r.flushTime_num_ops – total number of
flushes so far
• TIP: Small h.r.flushSize_avg_time and large flushQueueSize mean
premature flushes: may indicate that you need more RAM, or you
are ingesting and flushing faster than your disks can handle
HBaseCon 2012. 5/22/12 7
Copyright 2012 Cloudera Inc. All rights reserved
8. HBase Operations: region splits
• h.r.regionSplitFailureCount – number of unsuccessful splits
• h.r.regionSplitSuccessCount – number of successful splits
• TIP: Spikes in h.r.regionSplitSuccessCount may mean that you are
spending too much time splitting – think about pre-splitting your
regions
• TIP: Sustained high rates of h.r.regionSplitSuccessCount may mean
that you are ingesting fast enough to exceed your region size
configuration – change the configuration and/or presplit/disable
automatic splits
HBaseCon 2012. 5/22/12 8
Copyright 2012 Cloudera Inc. All rights reserved
9. HBase Operations: stores and storefiles
• h.r.stores – total number of stores
• h.r.storefiles – total number of storefiles
• TIP: Large ratio between number of h.r.stores and
h.r.storefiles may indicate it’s time to compact more
frequently – otherwise read performance can suffer
HBaseCon 2012. 5/22/12 9
Copyright 2012 Cloudera Inc. All rights reserved
10. Memory: memstores
• h.r.memstoreSizeMB – sum of all the memstore sizes in megabytes
• h.r.numPutsWithoutWAL – number of puts with
setWriteToWAL(false)
• h.r.mbInMemoryWithoutWAL – amount of data from puts with
setWriteToWAL(false) in megabytes
• TIP: Anytime you have non-zero values in h.r.numPutsWithoutWAL
and h.r.mbInMemoryWithoutWAL, you risk data loss if the RS
crashes, until the next flush when they go back down to zero.
Useful to detect applications that setWriteToWAL(false).
HBaseCon 2012. 5/22/12 10
Copyright 2012 Cloudera Inc. All rights reserved
11. Memory: block cache
• h.r.blockCacheCount - number of blocks in block cache
• h.r.blockCacheEvictedCount – number of blocks evicted in the last period
• h.r.blockCacheFree, h.r.blockCacheSize – free/occupied space in block
cache in bytes
• TIP: High sustained levels of h.r.blockCacheEvictedCount mean your block
cache is turning over due to heap size constraints – possibly more GCs
• Get more memory
• Repartition heap amongst the various processes on the host – more for RS
HBaseCon 2012. 5/22/12 11
Copyright 2012 Cloudera Inc. All rights reserved
12. Memory: block cache
• h.r.blockCacheHitCount, h.r.blockCacheMissCount – number of cache
hits/misses
• h.r.blockCacheHitRatio/h.r.blockCacheHitRatioPastNPeriods – hit ratio (cache
hits/total requests) for past period, past N periods
• h.r.blockCacheHitCachingRatio/h.r.blockCacheHitCachingRatioPastNPeriods
– hit caching ratio (cache hits from requests set to use block cache/total
requests set to use block cache) for past period, past N periods
• TIP: With sustained traffic, in general you want your block cache to be fully
utilized for maximum performance – high h.r.blockCacheHitRatio and
h.r.blockCacheHitCachingRatio
HBaseCon 2012. 5/22/12 12
Copyright 2012 Cloudera Inc. All rights reserved
13. Memory: JVM-specific metrics
• GC
• TIP: Correlate with “concurrent mode failure” or “promotion failed” in the
GC logs for GC pauses
• Heap usage
• TIP: Do not want this to max out – add more memory or assign more heap
to your RS
• Number of logs categorized by level
• TIP: Be careful of any WARN or ERROR logs – helpful first indicator to look
more deeply in logs
• Threads blocked/running
HBaseCon 2012. 5/22/12 13
Copyright 2012 Cloudera Inc. All rights reserved
14. Latency: FS read
• h.r.fsReadLatency_avg_time – average time per sequential read in ms
• h.r.fsReadLatency_num_ops – number of sequential reads
• h.r.fsPreadLatency_avg_time – average time per positional read in ms
• h.r.fsPreadLatency_num_ops – number of positional reads
• Various histograms based on this – 75th, 95th percentile, mean, standard
deviation, etc., in nanoseconds
• TIP: Spikes in h.r.fsReadLatency_avg_time/h.r.fsPreadLatency_avg_time
can indicate HDFS/disk/network problems
HBaseCon 2012. 5/22/12 14
Copyright 2012 Cloudera Inc. All rights reserved
15. Latency: FS write/sync
• h.r.fsWriteLatency_avg_time – average time per HLog edit write in ms
• h.r.fsWriteSize_avg_time – average size per HLog edit write in bytes
• h.r.fsWriteLatency_num_ops, h.r.fsWriteSize_num_ops – number of HLog
edit writes
• h.r.fsSyncLatency_avg_time – average time to sync HLogs to the filesystem
in ms
• h.r.fsSyncLatency_num_ops – number of HLog syncs
• h.r.slowHLogAppendCount – number of HLog edits that took longer than 1
second
TIP: Search for a WARN-level log message with the text “appending an edit to hlog”;
use that time to spelunk through logs to see what else is going on
• Various histograms based on this – 75th, 95th percentile, mean, standard
deviation, etc., in nanoseconds
• TIP: Spikes in write or sync latencies can also be due to HDFS/bad
disks/bad network
HBaseCon 2012. 5/22/12 15
Copyright 2012 Cloudera Inc. All rights reserved
16. Latency, region: per-operation-type, per-region
• From regionserver metrics: h.r.*RequestLatency - latency histograms for
various client operations in nanoseconds
• Also general RPC metrics for all requests from clients
• rpc.metrics.RPCQueueTime is time it takes from receipt of the RPC until it starts
being processed
• rpc.metrics.RPCProcessingTime is time it takes from start to end of processing
• TIP: These are some of the first places to look if your client seems to be running
slow
• h.r.readRequestsCount – requests for gets, scanners. Not # of rows.
• h.r.writeRequestsCount – requests for puts, deletes, etc. Not # of rows.
• hbase.RegionServerDynamicStatistics.* contains some metrics on a per-
region and per-column-family basis
• Subset of block cache, compaction, flush, store/storefile sizes, etc.
• TIP: These metrics can be used to answer the questions:
• Is the problem only in one region or in all regions?
• Is the problem only with one operation type or all operations?
HBaseCon 2012. 5/22/12 16
Copyright 2012 Cloudera Inc. All rights reserved
17. Host-level metrics
• CPU:
• Load averages
• % idle/system/user/wio (waiting for block I/O)
• Running/total processes
• Correlate with “top”, “sar”, “mpstat” to figure out which process is
doing what.
• TIP: If a process is eating a lot of CPU, that is a clue to look at its logs
• Memory:
• swap_free/swap_total
• cached/free/shared
• TIP: Compare with logs, other metrics to help determine causes of
OOMs
HBaseCon 2012. 5/22/12 17
Copyright 2012 Cloudera Inc. All rights reserved
18. Host-level metrics
• Network:
• Bytes/packets sent/received
• TIP: Correlate with ZK timeouts from logs if RSes are going down
• TIP: Also look at Ethernet frame errors from ifconfig if you are
experiencing unexpected dips in network traffic
• Disk I/O:
• Read/write latencies
• Slow disks make HBase performance very bad, especially on .META.
• TIP: Check dmesg for SCSI errors
• disk space available
• swap usage
• TIP: Should be 0 – otherwise you will have timeouts and RSes going
down. Add RAM and/or buy more boxes.
HBaseCon 2012. 5/22/12 18
Copyright 2012 Cloudera Inc. All rights reserved
19. Takeaways
• Know what you are looking for
• Know what metrics are available, what they mean
• Collect a lot of different metrics, but don’t drown in them
• Metrics are not all you need
• Correlate metrics with logs, workload
• Metrics are another tool in your toolbox. You still have to do
the work to monitor/debug/tune.
HBaseCon 2012. 5/22/12 19
Copyright 2012 Cloudera Inc. All rights reserved
20. Takeaways
• Take a baseline of your
system in steady-state
• For later comparison if things
go bad
• Try to make this as apples-to-
apples as possible, e.g. same
hardware/workload/config
• Spikes or dips from
baseline can indicate
problems
• But also depends on which
metric and if you can explain
it (e.g. increased workload)
HBaseCon 2012. 5/22/12 20
Copyright 2012 Cloudera Inc. All rights reserved
21. Thank you
(Especially to Jon Hsieh and Todd Lipcon
for their helpful reviews and suggestions)
Cloudera is hiring
E-mail me: dsw@cloudera.com
HBaseCon 2012. 5/22/12 21
Copyright 2012 Cloudera Inc. All rights reserved
Notes de l'éditeur
To understand the steady-state behavior of your HBase cluster:Establish a baselineTo debug when something bad happens:Compare with baselineTo help evaluate changes:ConfigurationWorkloadHardwareHelps point out bottlenecks
Most metrics reside in the region serverMetrics for various categories (e.g. Stores, compactions, flushes)Metrics per operation type (e.g. get, put, delete)Remember HDFS,JVM, and host metrics as well
Use FileContext in hadoop-metrics.properties to have metrics dumped to a file periodicallyweb UIs’ /jmx web pages dump metrics in JSONjconsole for interactive browsingGanglia: has built-in support in Hadoop/HBase, use GangliaContext or GangliaContext31
avg_time metrics name is confusing when you are measuring something that isn’t time. Multiple num_ops metrics that mean the same thing are also confusing. Both are legacy of how code is writtenor use time-based keys, salted keys, reduce data (less columns/smaller column names, compression)
again can change data schema (salted keys, row key composite of time and sha1/md5)
Unsuccessful splits are rolled back. If the rollback fails, the RS aborts.
Stores represent a CF, and can contain one or more StoreFiles. Reads have to go through all of the StoreFiles to get an overall view of the dataset for reads.Bloom filters and time range predicates may preclude storefiles from being culled.
Remember that there is one memstore per store/CF
http://hbase.apache.org/book/regionserver.arch.html contains a thorough explanation of how the block cache is used and what you can do to utilize it better.Turn off block caching for tables that have full table scans
http://hbase.apache.org/book/trouble.log.html#trouble.log.gc explains how to configure GC logging and what they meanhttp://hbase.apache.org/book/jvm.html for what to do about GCs
Network problems can come into play here if HDFS needs to fetch data for the read from somewhere else.
HFile latencies are handled by the compaction and flush metricsHLog edits affect client-facing system latency more visiblyHDFS will have a pipeline per write, so any hiccups along that pipeline in either network or disk will affect latency.
Hotspotting can be solved by methods such as hashing your rowkeys and pre-splitting regionshdfsBlocksLocalityIndex (HBASE-4114) may also be useful as higher indices normally mean lower latencies
Easier to screen out metrics if you know what each of them mean
Keep a library of baselines, changing one thing at a time