How long can you afford to Stop The World?

How long can you afford to Stop The World?
Strategies to overcome long application pause times caused by Java GC
Berlin, May 22. 2013 | Eric Hubert - Strategy & Architecture

The information and evaluations expressed in this presentation are based on the author‘s
personal experiences and knowledge.
They do not necessarily reflect the views of Jesta Digital.
The author makes no warranties of any kind regarding the accuracy and veracity of information
and data provided.
No one shall rely on any of the published test results which are inherently environment-specific.
Readers are strongly encouraged to conduct own testing in their specific environment which
may or may not show different results.
All mentioned trademarks are property of their respective owners.
BERLIN, May 22. 2013 | Eric Hubert - Strategy & Architecture
Disclaimer

• Developing software for about 20 years
• More than 10 years experience in the Enterprise Java world (JDK 1.2)
• Prior Jesta worked for debis Systemhaus, T-Systems and Adesso
• Working for Jesta since 2007 (formerly Jamba!, Fox Mobile)
• Currently leading „Strategy & Architecture“ team focused on
– Strategic development of platform infrastructure and middleware
– Automation of software build-, packaging-, testing- , deployment-, release-
and application monitoring processes
– Close collaboration with cross-functional teams and central system
administration/operations team
• Contact: eric.hubert@jestadigital.com, XING, Linked in
About me

Agenda
• Motivation and Scope
• Summary of Java Memory Management Basics / GC Analysis
• Discussion of Different Strategies to overcome GC Pause Time issues
• Future Perspectives
• Open Discussion

Motivation
• The demand to process large quantities of data in memory is steadily
increasing:
– More and more data to process and analyze in shorter times
(near-time/real time business requirements)
– Availability of commodity servers with up to 2 TB of RAM
(over the past decades available memory grew ≈ 100x every 10 years)
– Memory is still by far the fastest storage technology

Motivation
• Java GC can heavily impact application performance, especially in terms of
latency / responsiveness (multi-second pause times on multi GB heaps)
• The runtime of most GC algorithms is proportional to the size of the live
set of objects  the larger the Heap the larger the pause times
Storage Technology Random Access Latency
Registers 1-3 ns
CPU L1 Cache 2-8 ns
CPU L2 Cache 5-12 ns
Memory (RAM) 10-60 ns
High-speed network 10,000-30,000ns (10-30µs)
Solid State Disk (SSD) Drives 70,000-120,000ns (70-120µs)
Hard Disk Drives 3,000,000-10,000,000ns (3-10ms)
[REF_01] – Random Access Latencies of Storage Technologies

Motivation / Scope
• There are multiple strategies to overcome/minimize application pause
time issues related to Garbage Collection
• Nevertheless most talks, blog posts and other information sources center
on JVM tuning
(choice of collector algorithms, hints to improve promotion between
generations etc.)
• Most people know at least the most frequently used JVM GC tuning
arguments, but only some know basics of the automated memory
management and alternative strategies
• OOME causes, memory leaks, heap analysis etc. out of scope
• Not going to stress you with excessive JVM tuning options either

Out of Scope - HotSpot GC Tuning Details
[REF_02] Devoxx FR 2012 „Death by Pauses“ by Frank Pavageau

Scope
• Ensure all attendees are aware of enough of the memory management
basics in order to at least understand
– The reason of long GC pauses
– How to verify an application unresponsiveness was caused by GC
• The main goal of my talk is to provide you with a broader view on
strategies to solve application pause times due to GC activity
• Will not deep dive into any of those strategies, but explain each approach,
discuss the pros and cons (as well as limitations and side- effects)
• If applicable will provide pointers to information sources covering more
details

Automatic Memory Management Basics
• Responsibilities of Automatic Memory Management
• Basic Garbage Collector Algorithms / Important Terms
• Concept of Generations
• Common Triggers of Full GC
• Analysis of GC behavior / Information Sources
• Memory Performance Triangle

Responsibilities of Automatic Memory Management
• Service provided by a “managed runtime” (e.g. the Java Virtual Machine)
in which the program executes
– Assisted allocation
– Managed access to objects and their fields
– Automatic de-allocation of objects (Garbage Collection)
• Ensures that objects remain as long as they are in use
• Deems objects with no incoming references from other live objects as
garbage
• Ensures that objects that are no longer required are thrown away to free
up the memory they occupy for new objects
• Ensures any finalize method is run before the object is thrown away

Garbage Collectors – Classification (1)
• Serial versus Parallel
– Serial: Only one GC task at a time (only single CPU core used)
– Parallel: Multiple GC tasks are performed in parallel (multiple CPU core usage)
• Stop-The-World (STW) versus (Mostly) Concurrent
– STW: app threads are suspended during whole GC
– Concurrent: app threads are executed while GC tasks are performed
• Incremental
– Performs a garbage collection operation or phase as a series of smaller
operations with gaps in between

Mostly-
[REF_03] Memory Management in the Java HotSpot VM

• Reference Counting / Tracing
– Ref. Counting: No longer in practical use due to reference cycle problematic
– Tracing: Currently most common; either single phase copy or multiple phases
(mark and optionally sweep and/or compact)
• Copying versus Non-compacting versus Compacting

Garbage Collectors – Simplified View (Tracing)
• Find and reclaim unreachable objects
> Trace the heap starting at the roots
(thread stacks, static fields, operands of executed expression)
> Visits every live object
> Anything not visited is unreachable
> Therefore garbage
• If you can follow a chain of references from a root to a particular object,
then that object is "strongly" referenced. It will not be collected.
• Referenced objects are also called „live objects“ or “live set”

Garbage Collection – Basic Algorithms (1)
• Copy/Scavenge
– Copy all live objects starting from the roots in a single pass operation from a
source space to a target space and reclaim source space (effectively a move
operation)
• At the beginning all objects are in source space and all references point to source
space
• Start at the roots, copy any reachable object to target space and correct references
while doing so
• At the end of copy all objects are in target space and all references point to target
space; source space can be completely cleared
– Amount of work is generally linear to the „live set“

• Mark / Sweep / (Compact)
– Mark any object reachable as live
– Scan heap for objects not marked live (traced in a kind of free-list)
(the sweep step is generally linear to the entire heap size, not just the live set)
– Over time, memory fragments
• Slower allocation
• Longer sweep phases
• Risk not having large enough contiguous space for allocation of large objects; can
result in OOME
– Compaction moves (relocates) live objects together to reclaim contiguous
empty space; all object references need to be corrected (remap); compacting
is an expensive /time consuming operation
– A mark/sweep collector would not be a good choice for young generation, as
it will not gain efficiency from the sparseness

• Mark / Sweep / (Compact)
[REF_04] Mark-Sweep-Compact – Keith D. Gregory

• Mark / Compact
– Reachable objects are marked
– Compacting step relocates the reachable (marked) objects either towards the
beginning of the heap area (in-place compaction) or to another location
(evacuating compaction)
– Mark and compact work are both linear to live set, while sweep work is linear
to heap size
– Consequently, a mark/compact collector is linear to live set only, giving it
similar efficiency characteristics to copying collectors
– Examle: Azul C4

Concept of Generations / Generational GC
• Incorporate this typical object lifetime structure into GC
– Different heap areas for objects with different lifetime
– Mostly different GC algorithms for objects with different lifetime
[REF_05] The Art of Garbage Collection Turing – Angelika Langer & Klaus Kreft

• Generations are of new and survived objects
• Heap divided in zones by age of the objects
eden survivor tenured
Allocation of objects in
eden space
(experienced no GC )
2 alternately used
copy target spaces
(experienced several GCs)
Objects survived
multiple GCs
Young (nursery) generation tenured generation
(old generation)
objects
collected by
Minor and
Full GC
objects
collected only
by
Full GC
object lifetime
[REF_06] Based on Java 7 Garbage Collector G1 by Antons Kranga

• Focus collection efforts on young generation
– Normally live objects represent only relatively small percentage of space
– Promote objects living long enough to older generations
• Tends to be much more efficient; great way to keep up with high allocation
rate
• Only collect older generation as it fills up
• Requires a “Remembered set”: a way to track all references into the young
generation from the outside
• Usually want to keep surviving objects in young generation for a while
before promoting them to the old generation:
– Immediate promotion can dramatically reduce generational filter efficiency
– Waiting too long to promote can dramatically increase copying work

Common Triggers of Full GC
• Completely JVM implementation specific, more specifically it also depends
on selected GC algorithms
• “Common“ triggers in Oracle HotSpot JVM are:
– Old generation or permanent generation filled to a certain percentage
– Calling System.gc() (unless JVM option -XX:+DisableExplicitGC is set)
– Not enough free space in survivor space to copy objects from eden space
– Space extends or shrinkage (also applies to PermGen)
• Verification via gc logs and/or Java MBeans

Analysis of GC behavior / Information Sources (1)
• GC traces from JVM
-XX:+PrintGC (same as -verbose:gc )
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps (since JDK 6 Update 6)
-Xloggc:logs/gc.log
-XX:GCLogFileSize=50M
-XX:NumberOfGCLogFiles=3
-XX:+UseGCLogFileRotation
Addition diagnose options (trouble shooting / tuning)
-XX:+PrintTenuringDistribution
-XX:+PrintHeapAtGC
-XX:+TraceClassLoading / -XX:+TraceClassUnloading
-XX:+PrintGCApplicationStoppedTime (Warning: misleading – all JVM safepoints)
-XX:+PrintSafepointStatistics (can be used for verification)

• Example of STW Pause caused by Full GC using Parallel GC
[Full GC [PSYoungGen: 8523K->0K(188160K)]
[PSOldGen: 574126K->428865K(575808K)] 582650K->428865K(763968K)
[PSPermGen: 115404K->115404K(246144K)], 4.8381260 secs]
• Example of STW Pauses caused by non-concurrent phases of CMS GC
[GC [1 CMS-initial-mark: 1477934K(1835008K)] 1521654K(2053504K),
0.0902490 secs]
[1 CMS-remark: 3717734K(4023936K)] 3810187K(4177280K), 1.0523700 secs]
• Examples of STW Pauses caused by fallback of CMS GC to Serial Old GC
(concurrent mode failure): 1784934K->1309805K(1926784K), 3.6729840 secs]
1927090K->1309805K(2080128K), [CMS Perm : 94690K->93690K(131072K)],
3.7968250 secs]
Hint: -XX:CMSInitiatingOccupancyFraction=<xx> and -XX:+UseCMSInitiatingOccupancyOnly
GC ParNew (promotion failed): 153344K->153344K(153344K), 0.1724000
secs]1574.221: [CMS: 2786273K->2531770K(4023936K), 5.5668010 secs]
2926282K->2531770K(4177280K), [CMS Perm : 94733K->92808K(131072K)],
5.7397890 secs]

• Standard Java GC-related Management Beans (JMX)
– Can be used for (remote) real-time monitoring of GC behavior and memory
usage
– MBean names are Garbage Collector specific
 we use custom code for normalization to streamline monitoring config
– java.lang:type=GarbageCollector,name=<collector name>
• Young Gen: Copy, ParNew, PS Scavenge, G1 Young Generation
• Old Gen: MarkSweepCompact , PS MarkSweep, ConcurrentMarkSweep,
G1 Old Generation
• Metrics: CollectionCount, CollectionTime, LastGcInfo (Composite)

• Standard Memory Management Beans
– java.lang:type=Memory
• Metrics: HeapMemoryUsage (Composite: init, committed, usage, max)
– java.lang:type=MemoryPool,name=<space name>
• Eden: Eden Space, Par Eden Space, PS Eden Space, G1 Eden
• Survivor: Survivor Space, Par Survivor Space, PS Survivor Space, G1 Survivor
• Old: Tenured Gen, PS Old Gen, CMS Old Gen, G1 Old Gen
• Perm: Perm Gen, PS Perm Gen, G1 Perm Gen
• Metrics: Usage (init, committed, max, used)

• Custom GC Management Bean
– MajorCollectionCount
– MajorCollectionTime
– MinorCollectionCount
– MinorCollectionTime
– CumulatedCollectionTime
– LastMajorCollectionDuration
– LastMajorCollectionMemoryReduction
– LastMajorCollectionStartTime
– LastMajorCollectionEndTime
– TenuredCollector
– YoungCollector
– Uptime
• Warning: CollectionTime <> STW Pause Time for Concurrent Collectors

GC Analysis – Command Line Tools
1) jstat/jstad/jps
Usage: jstat -help|-options
jstat -<option> [-t] [-h<lines>] <vmid> [<interval> [<count>]]
Example output – Live Server instance:
jstat -gc 30731 1s 10
S0C S1C S0U S1U EC EU OC OU
3520.0 3520.0 2593.7 0.0 28224.0 2688.8 261728.0 184199.8
PC PU YGC YGCT FGC FGCT GCT
43172.0 25851.9 168136 978.259 1021 4.195 982.454
– Can only be used to calculate averages or with short update intervals
– Use option „-gcutil“ if you rather want to see space usage percentages
– jps helps to determine vmid (but mostly maps to PID anyway)
– jstatd is required for remote usage of jstat

GC Analysis – GUI Tools (1)
Gcviewer – GC trace analyzer
• Originally developed by tagtraum industries (only maintained until 2008)
• Fork: https://github.com/chewiebug/GCViewer/downloads

HPjmeter
• GC log analyzer and monitoring (the latter for HP UX)
• Download: www.hp.com/go/hpjmeter

HPjmeter

JConsole
• Part of JDK
• Can be used for monitoring of local (jvmid/pid) or remote (JMX RMI) JVM
service:jmx:rmi:///jndi/rmi://<remote-machine>:<port>/jmxrmi
or if behind firewall and using custom jmx rmi proxy:
service:jmx:rmi://<remote-machine>:<proxyport>/jndi/rmi://<remote-
machine>:<port>/jmxrmi
• Integrated MBean browser
• Shows active JVM options, GC info and more runtime information
• Can be used to verify memory/gc behavior in realtime

JConsole

Visual VM with Visual GC Plugin
• Part of JDK
• Based on jvmstat (local monitoring via jvmid, remote requires jstad)
• Many other plugins available (also MBean browser)
• Shows active JVM options and other runtime information (not GC algos)
• Can be used to verify memory/gc behavior in realtime
• Very detailed view including information regarding survivor space usage
as well as age information (histogram – not available for all algorithms)

Visual VM with Visual GC Plugin

IBM GCMV
• Eclipse RCP Application (can also be installed as plugin in Eclipse)
• Loads gc log file similar to gcviewer and provides statistics and graphs
• nice capability to zoom into pause time graph area
• Mainly written for IBM J9, but most parts also work for Oracle JVMs
• Update Site:
http://download.boulder.ibm.com/ibmdl/pub/software/isa/isa410/produ
ction/

IBM GCMV

JHiccup
• Small Java tool from Azul Systems to demonstrate application hiccups
(primarily caused by GC, or any other JVM safepoint/OS jitter etc.)
• Either run from Command Line (script wrapping java command) or as
javaagent
• Writes logfiles which can later be loaded in Excel to render nice diagrams
by hitting a button (macros need to be active)
• Possibility to compare percentile values against expectations (SLA)

JHiccup – Example Graphs of Telco App
• More details/examples later in this presentation …
• Download: http://www.azulsystems.com/downloads/jHiccup

GC Tuning – Memory Performance Triangle
Memory footprint
Throughput Latency

Strategies to overcome GC Pause Time issues
1. Tuning of the JVM Runtime Behavior
2. Reduce memory footprint of the application
3. More powerful hardware (more RAM/CPU cores)
4. Distribute processing to multiple JVMs (with Remote Communication)
5. Custom Off-heap memory management
6. Switch to JVM implementation with more efficient Memory Management

1. Tuning of the JVM Runtime Behavior (1)
• Structured Approach - Preconditions
– Knowledge about Java Memory Management
• Understanding how the memory is organized in the JVM to be tuned
• Knowing the options to change GC related runtime behavior and their limits
– Effect of Garbage Collector Choice
– Effect of Memory Space Sizing
– Effect of other collector-specific configuration switches
– Knowledge about GC Analysis
• Knowing what to measure
• Knowing how to measure
• Knowing how to interpret metrics
– Knowledge about GC Tuning
• Know at least how to approach a tuning
– Know your Operational Requirements
– Have one or multiple concrete Tuning Goals prior to any modification!

• If motivator is concrete performance issue (e.g. large pause time)
– First ensure the problem is really GC-related!
– Verify current GC configuration and analyze current GC behavior
– Evaluate your chances of improvement by runtime configuration tuning
• Verify your hardware and OS resources
• Verify object allocation rate
• Verify occupancy of tenured generation after Full GC
• Set those measures in relation to your goals
• Decide whether to proceed
– Use a comparable test environment with comparable workload to replicate
your issue (automate tests!)
– Capture baseline data, do small changes at a time and compare with baseline

• Where are the large pauses?  typically old gen
• Start to tune young gen!
• First verify reasons of promotion (young  old), depending on outcome
– Think of increasing new size (decrease NewRatio value); do it stepwise and
verify result
– Think of increasing survivor spaces (SurvivorRatio)
– Think of increasing age threshold to avoid too early promotion to old gen
• verify total heap size, think of decreasing it (if possible)
• Proceed with old gc tuning
• Switch collector, try to use CMS (-XX:+UseConcMarcSweepGC)

Young Old
• Generational Oracle HotSpot JVM (6 Collector choices/combinations)
(G1 Young Generation) G1 (G1 Old Generation)
-XX:+UseG1GC
Serial Young
(DefNew : Copy)
-XX:+UseSerialGC
Concurrent Old
ConcurrentMarkSweep
-XX:+UseConcMarkSweepGC
Serial Old
(MarkSweepCompact)
Parallel Young
(ParNew : ParNew)
-XX:+UseParNewGC
Parallel Old
(PS OldGen - PS MarkSweep)
-XX:+UseParallelOldGC
Parallel Scavenge
(PSYoungGen : PS Scavenge)
-XX:+UseParallelGC
Fallback
-XX:-UseParNewGC
Fallback

• JVM attempts to use reasonable defaults in all areas, but also offers a large
number of feature switches (currently more than 600)
• Useful resources to gather details:
– Official Oracle JVM documentation (lists about 90 options)
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-
140102.html
– Use JVM build-in listing functions
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal
For adventurous guys add (Please don’t use any of those in production!)
-XX:+UnlockExperimentalVMOptions
– Choose one of the unofficial „complete references“ (gathered from source), e.g.
http://www.pingtimeout.fr/2012/05/jvm-options-complete-reference.html

• Assessment
Pros Cons
can drastically improve performance
(e.g. reduce maximum pause times
and/or improve throughput)
quite a lot of knowledge about
memory management and
implementation specific switches
required
relatively quickly to apply (depending
on knowledge and experience)
danger to optimize for the moment
(many variants: load, functionality
used, software changes)
needs to be carefully monitored and
repeated with each redeployment /
changed use
heavily implementation dependent /
can change with each minor JVM
version update

2. Reduce Memory Footprint of the application
• Sometimes easier said than actually done
• Generally one should avoid too much premature optimizations
• Rather frequently use heap dumps with memory analyzer or memory
profiler and verify proper data structure usage in development iterations
• Look out for usage of wrong scopes, e.g. mistakenly declared variables
within loops although not needed (unnecessary allocation pressure)
• Only load the amount of data in memory you need to process (e.g. from
some persistent store)
• For large amounts of objects carefully select data structures (verify
overhead – fixed and per entry)

2. Reduce Memory Footprint of the application
• Assessment
Pros Cons
high memory savings possible
(e.g. reduction of allocation rate and/or
long-lived objects by up to >50%) also
resulting in much smaller pause times
rather high, consistent effort
can have bigger positive impact than
any runtime tuning
can negatively impact execution time if
not properly applied
can introduce bugs if existing code
needs to be changed

3. More Powerful Hardware (RAM/CPU cores)
• Very much depending on starting situation whether more computing
resources can help to solve GC issues
– e.g. application is heavily CPU bound and not enough CPU cycles to properly
run GC concurrently
– or maximum heap sizes should be increased, but not enough RAM
• The VM implementation and the chosen GC algorithms have a big impact
as well
• If live set > 1 or 2 GB and currently using parallel GC on only two CPU
cores, increasing the heap size, switching to CMS and increasing the
number of CPU cores (thus GC threads) can have a large effect

3. More Powerful Hardware (RAM/CPU cores)
• Assessment
Pros Cons
not much working effort to realize involved costs
may only mask underlying problems
until a later stage (e.g. increased load)

4. Distribute Memory Processing to multiple JVMs
• Sometimes easy, sometimes harder
• If we are talking about a mostly stateless application with a rather small
amount of long lived objects horizontal scaling is quite easy (using proper
loadbalancing and failover)
• Long-lived data needs to be somehow partitioned/sharded in order to
improve efficiency (either manually or by using products supporting
distributed memory structures aka. DataGrids – like Hazelcast, Infinispan,
Terracotta, GridGain, Coherence etc.)

4. Distribute Memory Processing to multiple JVMs
• Assessment
Pros Cons
depending on nature of application
Java heap usage per instance can be
drastically reduced
if existing application, distribution may
need some re-architecture
if memory issues are the only reason to
massively scale horizontally one
shouldn‘t forget about increased
complexity, operational overhead and
total memory overhead

5. Custom Off-Heap Memory Management (1)
• sun.misc.Unsafe
(internal implementation, dangerous, non-portable, and volatile)
• java.nio.ByteBuffer#allocateDirect (since JDK 1.4)
• Maximum size to be set with –XX:MaxDirectMemorySize=
• You have to use some serialization/deserialization mechanism
• Java‘s default serialization/deserialization is not very fast
• Two sub strategies:
– Dynamic size and merging: no memory wasted, but suffers fragmentation
(synchronization at allocation/deallocation)
– Fixed size buffer allocation: no fragmentation, but memory wasted
• Proper cleanup not quite elegant to achieve (relies on finalizer )

static {
Method directBufferCleanerX = null;
Method directBufferCleanerCleanX = null;
boolean v;
try {
directBufferCleanerX =
Class.forName("java.nio.DirectByteBuffer").getMethod("cleaner");
directBufferCleanerX.setAccessible(true);
directBufferCleanerCleanX =
Class.forName("sun.misc.Cleaner").getMethod("clean");
directBufferCleanerCleanX.setAccessible(true);
v = true;
} catch (Exception e) {
v = false;
}
CLEAN_SUPPORTED = v;
directBufferCleaner = directBufferCleanerX;
directBufferCleanerClean = directBufferCleanerCleanX;
}
Lucene/Elastic Search Code:
within inner class in interface
org.apache.lucene.store.
bytebuffer.ByteBufferAllocator
forked to other projects like JBoss Netty

Custom Off-Heap Memory Management (3)
• Projects and products using this strategy:
– Oracle Coherence
– GigaSpaces, (to be validated)
– Hazelcast (Enterprise Edition)
– GridGain
– Terracotta BigMemory
– Lucene / Elastic Search
• Open Frameworks to use this strategy
– Apache DirectMemory (Serialization via Protostuff)
– FST - Fast Serialization

• Assessment:
Pros Cons
can reduce GC overhead / max pause
times
quite tricky to get right, extreme
implementations end up in own GC
„useful“ usage limited to simple/flat
data structures (key-value) access;
usage as medium speed tier in Caches
off-heap allocation is a lot slower than
on Java-heap allocatoin
either memory waste or fragmentation
standard heap analysis tooling does not
apply, second set of tooling required

6. JVM with more efficient Memory Management (1)
The Ultimate JVM GC Tuning Guide
java -Xmx40g
ZING

• Azul Zing Practical Evaluation / Comparison against Oracle HotSpot JVM
• Preparation:
– selected real software system as part of our platform showing
some GC issues in production
– setup test environment (single JVM instance) on VM with 16 GB, 8 cores
– created load test using real data captured from live systems
– single test run designed to last about one and a half hours
• Test Conduction:
– incrementally increased load (concurrent users) until Oracle Hotspot with
some default memory configuration showing severe issues
– changed memory sizing as well as GC algorithm in order to demonstrate issues
known from real live
– switched to untuned Azul Zing

• Oracle HotSpot 1.6.0_43-b01, 64bit – 1 GB MaxHeap - ParallelGC:
-Xms768m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=128m

Oracle HotSpot 1.6.0_43-b01, 64bit – 1 GB MaxHeap - ParallelGC:

• Oracle Hotspot 1.6.0_43-b01, 64bit – 4 GB MaxHeap, ParallelGC
-Xms2048m –Xmx4096m -XX:PermSize=128m -XX:MaxPermSize=128m

Oracle Hotspot 1.6.0_43-b01, 64bit – 4 GB MaxHeap, ParallelGC

• Oracle Hotspot 1.6.0_43-b01, 64bit – 4 GB MaxHeap, CMS
-Xms4096m -Xmx4096m -XX:+UseConcMarkSweepGC //PermGen unchanged

Oracle Hotspot 1.6.0_43-b01, 64bit – 4 GB MaxHeap, CMS

• Oracle Hotspot 1.6.0_43-b01, 64bit – 2 GB MaxHeap, CMS (tuned):
-Xms2g -Xmx2g -Xmn256m -XX:SurvivorRatio=4 -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=16
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly // PermGen unchanged

Oracle Hotspot 1.6.0_43-b01, 64bit – 2 GB MaxHeap, CMS (tuned):

• Azul Zing 1.6.0_33-ZVM_5.5.3.0-b5, 64bit – 10 GB MaxHeap, C4:
-Xmx10g

Azul Zing 1.6.0_33-ZVM_5.5.3.0-b5, 64bit – 10 GB MaxHeap, C4

• The efficiency of memory and thread management is up to each JVM
implementation
• Azul Systems offers an highly optimized commercial JVM called Zing which is
designed for low latency use with large (multi-GB heaps: 1 – > 300 GB)
• It uses a special read barrier (Loaded Value Barrier) to support concurrent
compaction, concurrent remapping, and concurrent incremental update
tracing
• Zing uses Generational GC, but the same base algorithm for both young and
old gen: C4 (Continuously Concurrent Compacting Collector (C4)
• Zing is built on top of a proprietary Loadable Linux Kernel Module
(multiple Linux distributions supported: RedHat, CentOs, SLES, Ubuntu, etc.)

• Comparison of peak mremap rates for 16 GB of remaps
• Zing has a custom memory and thread management implementation and
adds a production monitoring and management platform
Active
Threads
Stock
Linux
Modified
Linux
Speedup
0 43.58 GB/sec (360ms) 4734.85 TB/sec (3us) >100,000x
1 3.04 GB/sec (5s) 1488.10 TB/sec (11us) >480,000x
2 1.82 GB/sec (8s) 1166.04 TB/sec (14us) >640,000x
4 1.19 GB/sec (13s) 913.74 TB/sec (18us) >750,000x
8 897.65 MB/sec (18s) 801.28 TB/sec (20us) >890,000x
12 736.65 MB/sec (21s) 740.52 TB/sec (22us) >1,000,000x
[REF_07] C4: The Continuously Concurrent Compacting Collector

[REF_08] Understanding Zing LX Memory Use

• Assessment
Pros Cons
incredibly low pause times not available in a free, unsupported form;
license costs as with other supported JVMs
no GC tuning required if not
aiming for microsecond
pause times
requires Azul Linux Kernel Module (dependent on
Linux distribution and Kernel ABI/signature change
policy some update restrictions and increased
operational effort possible)
predictable worst case pause
times
requires more memory to work efficiently (reserved
for Zing usage even if JVM is not running)
supports multi-GB large
heaps
a delay in which features in the major Oracle Java
Hotspot JVM releases are available in Zing, which is
aimed to be further reduced in the future
sophisticated monitoring in
production
(certain CPU requirements, but fulfilled by all
modern commodity server CPUs)

Future Perspectives (1)
• Garbage Collection and performance on virtualized environments is
among the hot future topics
• Oracle currently busy with GC „convergence“
(merging sources of former Bea JRockit and Sun Hotspot JVMs and
tooling), stabilization and performance improvements of G1 (future
standard?, only one GC „framework“ instead of currently three)
• General Goals according to talks to vendors like Oracle and IBM:
– Solve linear scaling problem (200ms @ 1GB → 20s @ 100GB)
(partly caused as result of deferring of “expensive operations”)
– Scale using result-based, concurrent and incremental (through partitioned
heap) garbage collection
– More flexible utilization of hardware (different data stores, SSD etc.)

Future Perspectives (2)
• No interest of Linux Kernel community to integrate Azul’s improvements
(consequently they gave up on this)
• Future will show to what extend Azul Systems will actively participate in
the OpenJDK project and how this will influence upcoming Java SE
versions
• The gap between Azul Zing and all other JVM implementations in terms of
memory model efficiency seems to be large and not likely to be closed
anytime soon

Discussion / Questions

References
• [REF_01] Systems Computing - Understanding CPU Caching and Performance
by Jon "Hannibal" Stokes
• [REF_02] Death By Pauses - Devoxx France 2012
by Frank Pavageau
• [REF_03] Memory Management in the Java HotSpot Virtual Machine
by Sun Microsystems
• [REF_04] Java Reference Objects
by Keith D. Gregory
• [REF_05] The Art of Garbage Collection Tuning
by Angelika Langer & Klaus Kreft
• [REF_06] Java7 Garbage Collector G1
by Antons Kranga
• [REF_07] C4: The Continuously Concurrent Compacting Collector
by Azul Systems (Gil Tene, Balaji Iyengar, Michael Wolf
• [REF_08] Understanding Zing LX Memory Use
by Azul Systems

Further Reading
• Official Oracle JVM options documentation (subset)
• Official Oracle JVM GC Tuning Documentation
• Java Garbage Collection Analysis and Tuning
• JavaOne 2012 - Gil Tene - Azul Systems - Understanding GC
• Alexey Ragozin - HotSpot JVM GC Options Cheat Sheet (v2)
• Alexey Ragozin - Understanding GC pauses in JVM, HotSpot's minor GC
• Alexey Ragozin - Understanding GC pauses in JVM, HotSpot's CMS collector
• Alexey Ragozin - Surviving 16GiB heap and greater
• Java OutOfMemoryError – Eine Tragödie in sieben Akten
• JavaOne 2012 - G1 Garbage Collector Tuning
• How to Monitor Java Garbage Collection | CUBRID Blog
• Everything I ever learned about jvm performance tuning (twitter)
• Displaying Java’s Memory Pool Statistics with VisualVM

How long can you afford to Stop The World?

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à How long can you afford to Stop The World?

Similaire à How long can you afford to Stop The World? (20)

Plus de Java Usergroup Berlin-Brandenburg

Plus de Java Usergroup Berlin-Brandenburg (19)

Dernier

Dernier (20)

How long can you afford to Stop The World?