Graham Baecher & Patrick Dignan (HubSpot)
At HubSpot, all HBase clusters run with G1GC and are highly multi-tenant, powering hundreds of unique APIs, Hadoop jobs, daemons, and crons. This two-part talk will cover challenges and solutions involving HBase multi-tenancy and G1GC tuning at HubSpot, including an overview of our request-by-request monitoring and analysis tools and how we identify/address G1 settings and behaviors that might be causing performance or stability problems.
32. Why G1GC?
● Designed for large heaps.
○ Divides heap into many smaller G1 regions.
○ G1 regions scanned and collected independently.
● Instead of occasional very long pauses,
G1GC has more frequent, shorter pauses.
If tuned properly, G1GC can provide performant GC
that scales well for large RegionServer heaps.
33. The Need for Tuning
Out of the box, G1GC hurt our HBase
clusters’ performance:
● Too much time spent in GC pauses.
● Occasional very long GC pauses.
● “To-space Exhaustion”, leading to Full GCs,
which led to slow RegionServer deaths.
38. Necessary Tuning: Method
A. Find max block cache size, memstore size,
and static index size from the past month.
B. Sum 110% of (A) maxes, add heap waste.
C. Set IHOP and heap size such that Initiating
Heap Occupancy > (B) by at least 10% heap.
D. Ensure IHOP + G1NewSizePercent < 90%.
– 90% = 100% - G1ReservePercent (default 10)
39. Necessary Tuning: cont.
In hbase-site.xml:
● Set hfile.block.cache.size ratio value to 110%
max block cache size from the past month.
● Set hbase.regionserver.global.memstore.size
ratio value to 110% max Memstore size from
the past month.
40.
41.
42. Further Tuning & Considerations
● -XX:G1ReservePercent
○ Accommodating for burst-y usage.
● -XX:G1HeapRegionSize
○ Reducing occurrence of humongous objects.
○ Reducing long tail of slow GCs in some cases.
● -XX:G1NewSizePercent
○ Tuning individual pause time vs. % time in GC.
43. HBase Usage & Tuning Limits
A Full GC isn’t necessarily G1GC’s fault. There’s
a level of “bad usage” that’s unreasonable to
tune around:
● Unexpected, excessively burst-y traffic.
● Too many/enormous Humongous objects.
In either of these cases, the real solution is to
fix the client code.
45. ...to Summarize:
● Tune heap size, IHOP, & HBase memory
caps based on HBase memory usage.
● Tune Eden size based on % time in GC &
average Young GC pause times.
● Make adjustments as needed, based on
cluster usage.
● Look for suboptimal usage in your HBase
clients to further improve HBase GC.
46. Links & Reference
Blog Post — http://bit.ly/hbasegc
G1GC CollectD Plugin — http://bit.ly/collectdgc
G1GC Log Visualizer — http://bit.ly/gclogviz