2. AGENDA
o What is JIT
o Types - Client, Server, Tiered
o Main optimizations approach
o JIT tuning
o Conclusions
2
3. WHAT IS JIT
o Just In Time compiler
o Compilation done during execution of a
program – at run time – rather than prior to
execution
o First presented at 1960 in LISP
o Java, .NET, JS…
o Oracle HotSpot, IBM J9, Azul…
3
4. WHAT IS JIT
o JIT separates optimization from SD (just update JVM
- not improve code, tune for your platform)
o JIT'ing requires Profiling
• Because you don't want to JIT everything
o Profiling allows better code-gen
• Inline what’s hot
• Loop unrolling, range-check elimination, etc
• Branch prediction, spill-code-gen, scheduling
4
6. JIT CLIENT (C1)
o Produced Compilations quickly
o Generated code runs relatively slowly
6
7. HOTSPOT JIT SERVER (C2) WORKFLOW
7
Java
Source Bytecode compiler
Bytecode
Optimized
code (native)
HotSpot info
Profiler
JIT compiler
(optimization)
Run time
JIT compiler
(deoptimization)
10K invocations
8. HOTSPOT JIT SERVER (C2)
o Produce compilations slowly (long warm-up)
o Generated code runs fast
o Profiler guided
o Speculative
8
9. HOTSPOT JIT TIERED (C2)
o Available from Java 7
o Default in Java 8
o Best of C1 and C2 approaches
o Level0=Interpreter
o Level1-3=C1
o #1 – C1 w/o profiling
o #2 – C1 with basic profiling (invocations)
o #3 – C1 w full profiling (~35% overhead)
o Level4=C2
9
10. KEYS FOR JIT VERSION
10
o -client
o -server (-d64)
o -server (-d64) -XX:+TieredCompilation
11. DEFAULT JIT VERSION
11
Install bits -client -server -d64
Linux 32-bit 32-bit client compiler 32-bit server compiler Error
Linux 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
Mac OS X 64-bit server compiler 64-bit server compiler 64-bit server compiler
Windows 32-bit 32-bit client compiler 32-bit server compiler Error
Windows 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
OS Default compiler
Windows, 32-bit, any number of CPUs -client
Windows, 64-bit, any number of CPUs -server
MacOS, any number of CPUs -server
Linux/Solaris, 32-bit, 1 CPU -client
Linux/Solaris, 32-bit, 2 or more CPUs -server
Linux, 64-bit, any number of CPUs -server
*In Java 8 the server compiler is the default in any of these cases
Information about default compiler
% java -version
java version "1.7.0" Java(TM) SE Runtime Environment (build
1.7.0-b147)
Java HotSpot(TM) Server VM (build 21.0-b17, mixed mode)
12. OPTIMIZATIONS IN HOTSPOT JVM
12
• compiler tactics
• delayed compilation
• tiered compilation
• on-stack replacement
• delayed reoptimization
• program dependence graph rep.
• static single assignment rep.
• proof-based techniques
– exact type inference
– memory value inference
– memory value tracking
– constant folding
– reassociation
– operator strength reduction
– null check elimination
– type test strength reduction
– type test elimination
– algebraic simplification
– common subexpression elimination
– integer range typing
• flow-sensitive rewrites
– conditional constant propagation
– dominating test detection
– flow-carried type narrowing
– dead code elimination
• language-specific techniques
• class hierarchy analysis
• devirtualization
• symbolic constant propagation
• autobox elimination
• escape analysis
• lock elision
• lock fusion
• de-reflection
• speculative (profile-based) techniques
• optimistic nullness assertions
• optimistic type assertions
• optimistic type strengthening
• optimistic array length strengthening
• untaken branch pruning
• optimistic N-morphic inlining
• branch frequency prediction
• call frequency prediction
• memory and placement transformation
expression hoisting
expression sinking
redundant store elimination
adjacent store fusion
card-mark elimination
merge-point splitting
• loop transformations
• loop unrolling
• loop peeling
• safepoint elimination
• iteration range splitting
• range check elimination
• loop vectorization
• global code shaping
• inlining (graph integration)
• global code motion
• heat-based code layout
• switch balancing
• throw inlining
• control flow graph transformation
• local code scheduling
• local code bundling
• delay slot filling
• graph-coloring register allocation
• linear scan register allocation
• live range splitting
• copy coalescing
• constant splitting
• copy removal
• address mode matching
• instruction peepholing
• DFA-based code generator
13. INLINING – MOTHER OF OPTIMIZATION
13
Before After
*Using JVM Devirtualization if needed
Frequency and size matter
int addAll(int max){
int accum=0;
for (int i=0;i<max;i++) {
accum = add(accum, i);
}
return accum;
}
}
int add(int a, int b) {return a+b;}
int addAll(int max){
int accum=0;
for (int i=0;i<max;i++) {
accum = accum+i;
}
return accum;
}
}
int add(int a, int b) {return a+b;}
14. OSR – ON-STACK REPLACEMENT
14
oRunning method never exits?
oBut it’s getting really hot?
oGenerally means loops, back-branching
oCompile and replace while running
oNot typically useful in large systems
oLooks great on benchmarks!
15. ESCAPE ANALYSIS
15
oObject is referenced only inside some loop; no
other code can ever access that object?
oIt needn’t get a synchronization lock when
calling the methods working with object
oIt needn’t store the fields in memory; it can
keep that value in a register
oSimilarly it can store the objects references in a
register
16. ESCAPE ANALYSIS
16
public class Factorial {
private BigInteger factorial;
private int n;
public Factorial(int n) {
this.n = n;
}
public synchronized BigInteger getFactorial() {
if (factorial == null) factorial =...;
return factorial;
}
}
ArrayList< BigInteger > list = new ArrayList < BigInteger >();
for ( int i = 0 ; i < 100 ; i ++) {
Factorial factorial = new Factorial ( i );
list.add(factorial.getFactorial ());
}
17. ESCAPE ANALYSIS (SIMPLE CASE)
17
oIt needn’t get a synchronization lock when
calling the getFactorial() method.
oIt needn’t store the field n in memory; it can
keep that value in a register.
oIt can just keep track of the individual fields of
the object.
oSometime – it needn’t to execute it at all.
18. JIT TUNING
(THESE MIGHT SAVE YOU )
o -client , -server or -XX:+TieredCompilation
o -XX:ReservedCodeCacheSize=, -XX:InitialCodeCacheSize=
19
19. JIT TUNING
o -XX:CompileThreshold=invocation value for compiling
o -XX:CICompilerCount= number of threads
o -XX:MaxFreqInlineSize=for hot methods (default value 325
bytes)
o -XX:MaxInlineSize= method smaller this will be inlined anyway
(default value 35 bytes)
20
20. WANT TO GET MORE DETAILS?
(BE CAREFUL WITH USING THEM ON PRODUCTION)
o -XX:+UnlockDiagnosticVMOptions
o -XX:+TraceClassLoading
o -XX:+LogCompilation
o -XX:+PrintAssembly
o -XX:+PrintCompilation - info about compiled methods
o -XX:+PrintInlining – info about inlining decisions
o -XX:CompileCommand=… - to control compilation policy
21
23. CONCLUSIONS
o KISS, SOLID, DRY, YAGNI – all well-known principles are
perfect for JIT to make his job
o Your code will be optimized and compiled, de-compiled
o There is a lot of various algorithms to do it inside JVM
o You need to reserve memory for compiled code
(CodeCache inside Metaspace/Permgen)
o To get full performance throttle JVM needs to warm-up
o Micro benchmarks lie to you. All the time
24
24. WHAT WE DIDN’T TOUCH
o Deoptimazing
o Specific benchmark for compilers
o Specific compiled code examples
o …
25