Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14

Chicago Java User Group Meetup: Lightning Talks
January 14, 2016
Some Garbage Talk…..
Finding a Suitable Garbage Collector for OpenTSDB
Presented by: Jayesh Thakrar
jthakrar@conversantmedia.com

What Does Conversant do? (www.conversantmedia.com)
• Uses programmatic advertising for personalized messaging on the
internet across browsers and devices (phones, tablets, etc.)
• Facilitates targeted, measurable audience campaigning for
customers with demonstrable effectiveness
• Links in-store (offline) and online activity of anonymized
individuals and evaluates messaging effectiveness
My Role
Sr. Software and Data Engineer - get to build, play, tinker, tweak and
manage big data toys (data systems and pipelines)
2

3
HA Proxy
Load
Balancer
TSDB Daemon
+
HBase + Hadoop
Application
Services
Application
Services
Application
Services
• OpenTSDB = Timeseries datastore
• No caching within TSDB daemons
• 12 OpenTSDB servers
each with TSDB + HBase + Hadoop
• 2.5 years data retention
• Automatic data purge via
HBase column family TTL setting
What, Why and How of OpenTSDB

4
HA Proxy
Load
Balancer
TSDB Daemon
+
HBase + Hadoop
Application
Services
Application
Services
Application
Services
• OpenTSDB = Timeseries datastore
• No caching within TSDB daemons
• 12 OpenTSDB servers
each with TSDB + HBase + Hadoop
• 2.5 years data retention
• Automatic data purge via
HBase column family TTL setting
• Metrics from 1200+
application services
across US and Europe
• 550+ million metric data
points created daily
• 20-30 concurrent users
What, Why and How of OpenTSDB

Problem: Long GC pauses in OpenTSD daemons
causing user annoyances and often long pauses
Java Version : java version "1.7.0_40"
Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
Initial Tuning: Increasing heap from 6 GB to 12 GB in increments of 2 GB
significantly reduced long GC pauses
Improvement "good enough", but continued further to better understand
the interaction of the various GC types and OpenTSDB characteristics….
5
How It All Began…..

All collectors below are "generational", i.e. heap memory has areas for young and old objects
Young generation area = Eden space (new objects since last GC) + Survivor 0 (from) + Survivor 1 (to)
Old generation area = Contains objects that have survived a number of GC cycles
Parallel GC: - Young generation: stop-the-world parallel threads
- Old generation: stop-the-world serial mark-sweep-compact of old gen
- Performs compaction
- ParallelOldGC (-XX:+UseParallelOldGC) for parallel old generation
Concurrent Mark-Sweep (CMS) GC: - Young generation: same as parallel GC
- Old generation: Mix of stop-the-world and concurrent steps
- No compaction and occasional stop-the-world full gc of heap
G1 (Garbage-First) GC: Young generation: parallel, stop the world
- Old generation: Similar to young generation + snapshot-based marking
- Dynamic old and young area sizes, performs compaction
- Better young generation pointer/reference management
- Supposedly better "goal management" - gc pause or throughput 6
Tested 3 Garbage Collector Types : Parallel, CMS, G1

Deciding Metrics: GC events - count, max/avg time, total time (clock or real time)
Tools Used: jmap -heap <pid>
jstat -gcutil <pid> OR jstat -gccause <pid>
jstat -gcutil -t <pid> <interval duration in ms> <no. of durations>
jconsole?
How to Set Each Collector: Parallel GC = no flag required (or -XX:+UseParallelOldGC)
CMS GC = -XX:+UseConcMarkSweepGC
G1 GC = -XX:+UseG1GC
java -verbose:gc <flag-to-set-specific-GC> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+UnlockExperimentalVMOptions -Xloggc:/opt/logs/opentsdb/gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
Approach: Run OpenTSDB daemon with each GC type
and examine jmap, jstat and gc log output 7
Garbage Collector Shootout

$ jmap -heap 23181
Attaching to process ID 23181, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.0-b56
using thread-local object allocation.
Parallel GC with 10 thread(s)
…..
8
Parallel Collector: jmap Output
Heap Usage:
PS Young Generation
Eden Space:
capacity = 3668967424 (3499.0MB)
used = 1571211720 (1498.4242630004883MB)
free = 2097755704 (2000.5757369995117MB)
42.82435733067986% used
From Space:
capacity = 312999936 (298.5MB)
used = 89260032 (85.125MB)
free = 223739904 (213.375MB)
28.517587939698494% used
To Space:
capacity = 296222720 (282.5MB)
used = 0 (0.0MB)
free = 296222720 (282.5MB)
0.0% used
PS Old Generation
capacity = 8589934592 (8192.0MB)
used = 3749497432 (3575.79940032959MB)
free = 4840437160 (4616.20059967041MB)
43.64989502355456% used

$ jmap -heap 3061
using parallel threads in the new generation.
Concurrent Mark-Sweep GC
…….
9
CMS Collector: jmap Output
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 785186816 (748.8125MB)
used = 269009024 (256.5469970703125MB)
free = 516177792 (492.2655029296875MB)
34.260512086845836% used
Eden Space:
capacity = 697958400 (665.625MB)
used = 181780608 (173.3594970703125MB)
free = 516177792 (492.2655029296875MB)
26.044619278169016% used
From Space:
capacity = 87228416 (83.1875MB)
used = 87228416 (83.1875MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 87228416 (83.1875MB)
used = 0 (0.0MB)
free = 87228416 (83.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 12012486656 (11456.0MB)
used = 6924832160 (6604.034576416016MB)
free = 5087654496 (4851.965423583984MB)
57.64694986396662% used

$ jmap -heap 13183
Garbage-First (G1) GC with 10 thread(s)
…..
10
G1 Collector: jmap Output
Heap Usage:
G1 Heap:
regions = 3072
capacity = 12884901888 (12288.0MB)
used = 9139494440 (8716.101112365723MB)
free = 3745407448 (3571.8988876342773MB)
70.9318124378721% used
G1 Young Generation:
Eden Space:
regions = 1127
capacity = 7407140864 (7064.0MB)
used = 4726980608 (4508.0MB)
free = 2680160256 (2556.0MB)
63.81653454133635% used
Survivor Space:
regions = 6
capacity = 25165824 (24.0MB)
used = 25165824 (24.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 1049
capacity = 5452595200 (5200.0MB)
used = 4387348008 (4184.101112365723MB)
free = 1065247192 (1015.8988876342773MB)
80.46348293011005% used

Parallel Collector
$ jstat -gcutil -t 23181
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT
40759.5 0.00 2.50 77.07 69.04 50.09 475 44.447 5 1.278 45.725
CMS Collector
41819.4 0.00 100.00 63.34 47.13 59.70 2771 133.708 13 4.700 138.407
G1 Collector
41762.6 0.00 100.00 15.22 80.44 72.74 396 40.286 0 0.000 40.286
11
jstat Output
Key Points:
• S0/S1 = Survivor Space 0/1
• E/O = Eden Space / Old Gen Space
• YGC = Young Garbage Collection
• FGC = Full Garbage Collection
• YGCT/FGCT = YGC/FGC Time
• GCT = Cumulative GC Time
• Compare:
• YGC and FGC count
(YGC & FGC)
• Total GC time
(YGCT, FGCT, GCT)
• Avg. YGC and FGC time
(YGCT/YGC and FGCT/FGC)
• Max GC pause time
need to examine gc log output

Why is G1GC Better?
• TSDB has a lot of "object churn" due to traffic activity
(see HAProxy stats below)
• Most of the objects are short lived
• In all the collectors, young gen collections are more efficient
So the more churn data that can fit in eden, better/faster is the gc event
12
Conclusion
Incoming Metrics TrafficUI Traffic

13
Memory Churn and Steady-State Live Size in G1GC
$ strings gc.log.0 | grep ' Heap: ' | less
……
[Eden: 768.0M(528.0M)->0.0B(7120.0M) Survivors: 80.0M->80.0M Heap: 3264.0M(12.0G)->2535.0M(12.0G)]
……..
[Eden: 6592.0M(6592.0M)->0.0B(5184.0M) Survivors: 240.0M->496.0M Heap: 10.2G(12.0G)->4128.0M(12.0G)]
…….
…..
……

14
Aberrations in G1GC (as gleaned from gc.log)
44998.882: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 414 regions, survivors: 10 regions, old: 0 regions,
predicted pause time: 96.40 ms, target pause time: 250.00 ms]
, 1.3608460 secs]
[Parallel Time: 1359.1 ms, GC Workers: 10]
[GC Worker Start (ms): Min: 44998882.5, Avg: 44998882.5, Max: 44998882.5, Diff: 0.1]
[Ext Root Scanning (ms): Min: 0.9, Avg: 1.0, Max: 1.2, Diff: 0.3, Sum: 10.0]
[Update RS (ms): Min: 17.9, Avg: 18.1, Max: 18.2, Diff: 0.2, Sum: 180.9]
[Processed Buffers: Min: 38, Avg: 43.5, Max: 52, Diff: 14, Sum: 435]
[Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.3, Diff: 0.1, Sum: 2.6]
[Object Copy (ms): Min: 1326.2, Avg: 1332.9, Max: 1339.6, Diff: 13.4, Sum: 13328.6]
[Termination (ms): Min: 0.0, Avg: 6.8, Max: 13.4, Diff: 13.4, Sum: 67.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
[GC Worker Total (ms): Min: 1359.0, Avg: 1359.0, Max: 1359.1, Diff: 0.1, Sum: 13590.1]
[GC Worker End (ms): Min: 45000241.5, Avg: 45000241.5, Max: 45000241.5, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.8 ms]
[Other: 0.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.6 ms]
[Times: user=1.11 sys=12.48, real=1.36 secs]

• GC is unavoidable - it’s a fact of a life
• Not memory constrained? First make "reasonable" increases to heap size
• Use identical values for max/min heap sizes to reduce memory resizing
• Memory constrained? Focus on sizing heap, new and old generations
• CPU constrained? Focus on reducing total GC time
• Latency sensitivity (gc pauses)? Focus on reducing max/avg GC pause times
• Understand gc causes and time spend across different activities of gc
• Understand memory churn - "steady-state size" and "live size"
• Good tunables: -XX:MaxGCPauseMillis -XX:InitiatingHeapOccupancyPercent15
What About GC Tuning?

• Parallel and CMS Garbage Collectors
http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
• G1 Garbage Collector
http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html#t5
http://www.infoq.com/presentations/java-g1
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
• Several blog articles at Code Centric, e.g.
https://blog.codecentric.de/en/2012/08/useful-jvm-flags-part-5-young-generation-garbage-collection/
• Comparison #1
http://www-public.tem-tsp.eu/~thomas_g/research/biblio/2015/carpen-amarie15pmam-gcanalysis.pdf
• Comparison #2
http://research.ijcaonline.org/volume43/number11/pxc3878524.pdf 17
References

Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14

Similaire à Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14 (20)

Plus de Jayesh Thakrar

Plus de Jayesh Thakrar (7)

Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14