3. Agenda
●
42 minutes of
–
–
Fun (Practice)
–
Fun (Feedbacks)
–
Fun (Questions/Answers)
–
●
Fun (Theory)
Fun (Trolls)
Because performance is fun !
@pingtimeout
Scala.IO – 24&25 oct 13
4. Disclaimer
●
●
●
Be critical with the information contained in this
talk
JVM Tuning is always made on a case-by-case
basis. There is no magic, no special set of flags
that produces good results on every project.
The resemblance of any opinion,
recommendation or comment made during this
presentation to performance tuning advice is
merely coincidental.
@pingtimeout
Scala.IO – 24&25 oct 13
14. Theory – Memory pools
●
Java Heap – 2 memory pools
(Except for G1 GC)
●
Young Generation for... young objects
●
Old Generation for... old objects !!!
Amazing, right ?
@pingtimeout
Scala.IO – 24&25 oct 13
16. Theory – Memory pools
●
Young Generation = Eden + Survivors
●
Every object is created in Eden*
* : except when it is too big to fit in Eden
* : except in special cases for G1 GC
@pingtimeout
Scala.IO – 24&25 oct 13
18. Why memory pools ?!
●
Always 2 GC per JVM*
* Except for G1 GC
●
Young GC
–
–
●
Cheap
Duration mostly ≈ O(Live data in YG)
Old GC
–
Expensive
–
Duration mostly ≈ O(Live data in OG)
@pingtimeout
Scala.IO – 24&25 oct 13
19. Why memory pools ?!
Common GC
Name
Young Gen GC
Old Gen GC
“Parallel GC”
PSYoungGen
ParOldGen
ParNew
CMS
“CMS”
“G1 GC”
@pingtimeout
G1
Scala.IO – 24&25 oct 13
21. App with small live set
@pingtimeout
Scala.IO – 24&25 oct 13
22. App with big live set
@pingtimeout
Scala.IO – 24&25 oct 13
23. Experiment 1
●
1st run (SmallLiveSet)
–
50 GB heap (-ms50g -mx50g)
–
49.9GB Young Gen (-Xmn49900m)
–
GC logs
@pingtimeout
Scala.IO – 24&25 oct 13
24. Experiment 1
●
1st run (SmallLiveSet)
–
50 GB heap
●
–
49.9GB Young Gen
●
–
●
-ms50g -mx50g
-Xmn49900m
GC logs
Result :
–
6ms YGC pauses to free 38GB of memory
@pingtimeout
Scala.IO – 24&25 oct 13
25. Experiment 1 : Result
[PSYoungGen: 38329728K->6496K(44710400K)]
38329744K->6512K(46041600K),
0.0067050 secs] //...
●
38.329.728K data before GC in YG, 6.496K after
●
YG size is 44.710.400K
●
38.329.744K data before GC in heap, 6.512K after
●
Heap size is 46.041.600K
●
Total pause time : 6.7ms
@pingtimeout
Scala.IO – 24&25 oct 13
26. Experiment 2
●
2nd run (SmallLiveSet)
–
50 GB heap (-ms50g -mx50g)
–
10MB Young Gen (-Xmn10m)
–
GC logs
@pingtimeout
Scala.IO – 24&25 oct 13
27. Experiment 2
●
1st run (SmallLiveSet)
–
50 GB heap
●
–
10MB Young Gen
●
–
●
-ms50g -mx50g
-Xmn10m
GC logs
Result :
–
322ms Full GC pauses to free 52GB of memory
@pingtimeout
Scala.IO – 24&25 oct 13
28. Experiment 2 : Result
[Full GC
[PSYoungGen: 3072K->0K(7168K)]
[ParOldGen: 52418151K->30287K(52418560K)]
52421223K->30287K(52425728K)//...
0.3229410 secs]
●
52.418.151K data before GC in OG, 30.287K after
●
OG size is 52.418.560K
●
52.421.223K data before GC in heap, 30.287K after
●
Heap size is 52.425.728K
●
Total pause time : 322.9ms
@pingtimeout
Scala.IO – 24&25 oct 13
29. Experiments 1->4, Wrap up
●
1st and 2nd runs with BigLiveSet
–
Ran out of time* :-(
*: Stopped measuring at Heap occupancy ≈ 22GB
●
GC Pauses :
Live
set
Small
Big
@pingtimeout
6 millis
55 secs (Full GC)*
Scala.IO – 24&25 oct 13
322 millis (Full GC)
250 secs (Full GC)*
33. Immutability
●
What does this code do ?
–
Create more temporary objects that dies young
–
Respect Weak Generational Hypothesis
@pingtimeout
Scala.IO – 24&25 oct 13
34. Immutability
●
Consequences compared to mutable state
–
GC will run more frequently
–
GC time will be short
O(Live data in YG)
@pingtimeout
Scala.IO – 24&25 oct 13
35. Tuning for immutability
●
Reduce YGC frequency (for ParallelGC and
CMS)
–
Identify allocation rate (MB/seconds)
–
Define the GC interval (seconds between GCs)
=> Set Eden = Allocation rate * GC interval
@pingtimeout
Scala.IO – 24&25 oct 13
36. Tuning for immutability
●
Reduce YGC frequency (for ParallelGC and
CMS)
–
AR = 200 MB/s
–
Desired interval = 1 YGC every 4 seconds
=> Set Eden to 800 MB (Young to 1 GB)
-Xmn1g
@pingtimeout
Scala.IO – 24&25 oct 13
38. G1 GC time !
@pingtimeout
Scala.IO – 24&25 oct 13
39. G1 GC
●
Idea
–
Split the heap in
2048 regions
–
Associate on-the-fly one
region to a memory pool
–
Increase/Shrink memory
pool at runtime
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout
Scala.IO – 24&25 oct 13
40. G1 GC
●
Memory pools :
–
–
Old
–
●
Young (Eden, Survivors)
Humongous
Humongous:
–
Objects >= 50% region
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout
Scala.IO – 24&25 oct 13
41. G1 GC
●
1 ½ GC algorithm:
–
Always collect Young Gen
–
Collect Old Gen if possible
●
●
●
●
●
Best regions only
Time budget large enough
Preconditions
“mixed” collection
G1 is self-tuning
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout
Scala.IO – 24&25 oct 13
42. G1 GC Tuning
●
Define GC time budget
-XX:MaxGCPauseMillis=<N>
-XX:GCPauseIntervalMillis=<M>
●
Set Xms == Xmx
●
Drop all other GC-related flags
-Xmn, -XX:TenuringThreshold, -XX:NewRatio
-XX:InitiatingHeapOccupancyPercent, …
●
Don't try to outsmart the GC
@pingtimeout
Scala.IO – 24&25 oct 13
43. G1 GC Tuning
●
Enable GC logs
-Xloggc:gc.log
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCCause
-XX:+PrintAdaptiveSizePolicy
●
Wait and see
@pingtimeout
Scala.IO – 24&25 oct 13
44. G1 GC Tuning – Low hanging fruits
●
Eliminate Humongous allocations
–
Humongous regions collected only at Full GC
–
Or when empty
[G1Ergonomics (Concurrent Cycles) request concurrent cycle
initiation, reason: occupancy higher than threshold, occupancy: 0
bytes, allocation request: 79012360 bytes, threshold: 47185920 bytes
(45.00 %), source: concurrent humongous allocation]
[G1Ergonomics (Concurrent Cycles) request concurrent cycle
initiation, reason: requested by GC cause, GC cause: G1 Humongous
Allocation]
@pingtimeout
Scala.IO – 24&25 oct 13
45. G1 GC Tuning – Low hanging fruits
●
Eliminate Humongous allocations
–
Humongous regions collected only at Full GC
–
Or when empty
2013-10-21T19:23:48.758+0200:
[GC pause (G1 Humongous Allocation) (young) (initial-mark)
Desired survivor size 1572864 bytes, new threshold 15 (max 15)
, 0.0015120 secs]
@pingtimeout
Scala.IO – 24&25 oct 13
46. G1 GC Tuning – Low hanging fruits
●
Eliminate Humongous
allocations
–
–
●
Track your big allocations
Kill'em !
Why ?
–
Fragments the heap
–
Can cause evacuations
failures
@pingtimeout
Scala.IO – 24&25 oct 13
47. G1 GC Tuning – Low hanging fruits
●
Get rid of “mixed collections”
–
Increase heap size
–
Set a higher threshold for mixed collections
-XX:InitiatingHeapOccupancyPercent=<N>
●
Why ?
–
Some phases of G1 are STW (like “baaaaad”)
–
G1 goal : find the best candidates among all old
regions
@pingtimeout
Scala.IO – 24&25 oct 13
48. G1 GC Tuning – Low hanging fruits
●
Eliminate “Evacuation/Allocation failures”
–
They are our good old Full Gcs
[GC pause (G1 Evacuation Pause) (young)
//...
[Full GC (Allocation Failure)
5860M->2690M(7000M), 0.9824032 secs]
@pingtimeout
Scala.IO – 24&25 oct 13
49. Summary
●
Performance is fun !
●
Understand what you do
●
Immutability is not an issue (by itself)
–
Bad code is.
●
GC Duration ≈ O(Live data)
●
G1 is self-tuning
–
Try it :-)
@pingtimeout
Scala.IO – 24&25 oct 13
50. Thank you for listening !
For more information :
http://www.pingtimeout.fr
@pingtimeout
pierre@pingtimeout.fr
@pingtimeout
Scala.IO – 24&25 oct 13