Inside the JVM
Follow the white rabbit!
Sylvain Wallez - @bluxte
Toulouse JUG - 2017-04-26
Who’s this guy?
Software engineer at Elastic (cloud team)
● IoT tech lead at OVH
● CEO at Actoboard
● Backend architect at Sigfox
● CTO at Goojet/Scoop it
● Lead architect at Joost
● Member of the Apache Software Foundation
● Cofounder & CTO at Anyware Technologies (now part of Sierra Wireless)
● How it started: let’s optimize 6 lines of (hot) code!
● Profiling memory usage
● What’s in a class file?
● Micro-benchmarking with JMH
● Exploration of the OpenJDK source code
How it started
Let’s optimize 6 lines of (hot) code
On the CouchBase blog...
“JVM Profiling - Lessons from the trenches”: optimize the conversion of a protocol
error code into a readable message.
private final short code;
private final String description;
KeyValueStatus(short code, String description) {
this.code = code;
this.description = description;
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
return UNKNOWN;
public enum KeyValueStatus {
UNKNOWN((short) -1, "Unknown code"),
SUCCESS((short) 0x00,
"The operation completed successfully"),
ERR_NOT_FOUND((short) 0x01,
"The key does not exists"),
ERR_EXISTS((short) 0x02,
"The key exists in the cluster"),
ERR_TOO_BIG((short) 0x03,
"The document exceeds the maximum size"),
ERR_INVALID((short) 0x04,
"Invalid request"),
ERR_NOT_STORED((short) 0x05,
"The document was not stored"),
On the CouchBase blog...
Finding: values() is allocating memory
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
return UNKNOWN;
public static KeyValueStatus valueOf(final short code) {
if (code == SUCCESS.code) {
return SUCCESS;
} else if (code == ERR_NOT_FOUND.code) {
} else if (code == ERR_EXISTS.code) {
return ERR_EXISTS;
} else if (code == ERR_NOT_MY_VBUCKET.code) {
for (KeyValueStatus value : values()) {
if (value.code() == code) {
return value;
return UNKNOWN;
Optimization: fast path on common values
If something
goes wrong, it’ll
make it worse!
xkcd #386
Oh well,
blog post
Profiling memory usage
Various kinds of memory optimization
● Memory usage / memory leaks
○ My application needs tons of heap
○ How many objects are held active?
→ Memory profiler / jmap
● Garbage collection pressure
○ My application spends a lot of time in the GC
○ How often are objects allocated?
→ Java Mission Control / jmap
jmap histograms
jmap -histo
num #instances #bytes class name
1: 4217124 674740720 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus;
2: 486 14947912 [I
3: 5855 493864 [C
4: 1461 166752 java.lang.Class
5: 5848 140352 java.lang.String
6: 503 136440 [B
7: 968 62480 [Ljava.lang.Object;
8: 1255 40160 java.util.HashMap$Node
9: 991 39640 java.util.LinkedHashMap$Entry
10: 258 30720 [Ljava.util.HashMap$Node;
11: 259 22792 java.lang.reflect.Method
12: 441 20952 [Ljava.lang.String;
13: 229 16488 java.lang.reflect.Field
14: 171 9576 java.util.LinkedHashMap
15: 291 9312 java.util.concurrent.ConcurrentHashMap$Node
16: 160 7680 java.util.HashMap
17: 178 7120 java.lang.ref.SoftReference
18: 89 7120 code available on GitHub
jmap histograms
jmap -histo:live – perform a full GC first
num #instances #bytes class name
1: 5855 493864 [C
2: 1461 166752 java.lang.Class
3: 5848 140352 java.lang.String
4: 503 136440 [B
5: 967 62456 [Ljava.lang.Object;
6: 1255 40160 java.util.HashMap$Node
7: 991 39640 java.util.LinkedHashMap$Entry
8: 258 30720 [Ljava.util.HashMap$Node;
9: 259 22792 java.lang.reflect.Method
10: 441 20952 [Ljava.lang.String;
11: 283 19272 [I
12: 229 16488 java.lang.reflect.Field
51: 35 1400
52: 3 1360 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus;
53: 55 1320$Entry
54: 29 1304 [Ljava.lang.reflect.Field;
55: 36 1152 net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus
Java Mission Control / Java Flight Recorder
Lightweight monitoring agent
● Integrated into the (Oracle) JVM
● Very low overhead
Continuously samples diagnostics data
● Thread activity
● GC activity
● Memory allocations
Java Mission Control / Java Flight Recorder
Available only with Oracle JDK
● Free for development
● Commercial for use in production
How to enable it?
● at launch time: java -XX:+UnlockCommercialFeatures
● after launch: jcmd <pid> VM.unlock_commercial_features
Original code
Simple loop on the enum values
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
return UNKNOWN;
Original code - Memory stats
Looks good! No leak!
Hmm… growing fast!
Original code - Allocations
Original code - GC activity
Iteration on constant array
Still trivial, but reuse the values array
private static final KeyValueStatus[] VALUES = values();
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: VALUES) {
if (value.code() == code) return value;
return UNKNOWN;
Constant array - Allocations
Constant array - GC activity
GC pressure collateral damages
Full GC clears weak references
→ clears some caches
→ additional load to repopulate them!
Enum.values() ?
Exploring the bytecode
Enum.values() – a generated method
The compiler automatically adds
some special methods when it
creates an enum. For example, they
have a static values method that
returns an array containing all of the
values of the enum in the order they
are declared.
– The Java Tutorial
* Returns an array containing the constants of this enum
* type, in the order they're declared. This method may be
* used to iterate over the constants as follows:
* for(E c : E.values())
* System.out.println(c);
* @return an array containing the constants of this enum
* type, in the order they're declared
public static E[] values();
– The Java Language Specification
Show me the (byte)code!
public class SimpleMain {
public static void main(String[] args) {
System.out.println("Hello world!");
public class {
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello world!
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
javap -c
or IntelliJ’s bytecode plugin
Show me the (byte)code!
public class SimpleMain {
static String hello = "Hello";
static String world = "world";
public static void main(
String[] args
) {
hello + " " + world
public class {
static java.lang.String hello;
static java.lang.String world;
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: new #3 // class java/lang/StringBuilder
6: dup
7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
10: getstatic #5 // Field hello:Ljava/lang/String;
13: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
16: ldc #7 // String “ “
18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: getstatic #8 // Field world:Ljava/lang/String;
24: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
27: invokevirtual #9 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
30: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
33: return
static {};
0: ldc #11 // String Hello
2: putstatic #5 // Field hello:Ljava/lang/String;
5: ldc #12 // String world
7: putstatic #8 // Field world:Ljava/lang/String;
10: return
String concat with
Static initializer
Show me the (byte)code!
public enum SimpleEnum {
public static[] values();
0: getstatic #1 // Field $VALUES:[Lnet/bluxte/experiments/talk/SimpleEnum;
3: invokevirtual #2 // Method "[Lnet/bluxte/experiments/talk/SimpleEnum;".clone:()Ljava/lang/Object;
6: checkcast #3 // class "[Lnet/bluxte/experiments/talk/SimpleEnum;"
9: areturn
public static valueOf(java.lang.String);
0: ldc #4 // class net/bluxte/experiments/talk/SimpleEnum
2: aload_0
3: invokestatic #5 // Method java/lang/Enum.valueOf:(Ljava/lang/Class;Ljava/lang/String;)Ljava/lang/Enum;
6: checkcast #4 // class net/bluxte/experiments/talk/SimpleEnum
9: areturn
Aha! We found
the culprit!
But why the clone?
Java arrays are mutable
The caller can mess with it, which would break other users
→ Perform a defensive copy every time
How to could it be prevented?
Return an immutable List, but probably too high level here
More on the bytecode
A class file is composed of:
● constant pool: strings, fields/methods name+type, class names, etc.
● fields and methods definitions and code
○ Access flags and attributes
○ Code
○ Line number table
○ Local variable table (type and name)
○ Exception table
But wait…
...why would I want to know about this?
● Better understand low level diagnostics
● Check generated code
○ Java: enum values (!), for loops, etc
○ Scala, Kotlin: implementation of higher level constructs
○ Hibernate & co: how do they mangle your code?
● Grasping low level stuff allows writing better high-level code
#1 = Methodref #6.#20 // java/lang/Object."<init>":()V
#2 = Fieldref #21.#22 // java/lang/System.out:Ljava/io/PrintStream;
#3 = String #23 // Hello world
#4 = Methodref #24.#25 // java/io/PrintStream.println:(Ljava/lang/String;)V
#5 = Class #26 // net/bluxte/experiments/talk/SimpleMain
#6 = Class #27 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 LocalVariableTable
#12 = Utf8 this
#13 = Utf8 Lnet/bluxte/experiments/talk/SimpleMain;
#14 = Utf8 main
#15 = Utf8 ([Ljava/lang/String;)V
#16 = Utf8 args
#17 = Utf8 [Ljava/lang/String;
#18 = Utf8 SourceFile
#19 = Utf8
#20 = NameAndType #7:#8 // "<init>":()V
#21 = Class #28 // java/lang/System
#22 = NameAndType #29:#30 // out:Ljava/io/PrintStream;
#23 = Utf8 Hello world
#24 = Class #31 // java/io/PrintStream
#25 = NameAndType #32:#33 // println:(Ljava/lang/String;)V
#26 = Utf8 net/bluxte/experiments/talk/SimpleMain
#27 = Utf8 java/lang/Object
Constant pool for SimpleMain
public class {
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello world!
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
Type encoding
What is (Ljava/lang/String;)V ???
L<class path>; → class name
I, J, S, B, C → integer, long, short, byte, char
F, D → float, double
Z → boolean
public String foo(int a, char[] b, List<Integer> c, boolean d)
The bytecode “language”
Stack-based machine
● Easier to target a large variety of CPUs
(Android/Dalvik is register based)
Object-oriented assembler
● Method calls (static / virtual / interface / special)
Controlled memory access
● Local variables
● Object fields
The bytecode “language”
Very simple 200 instructions set
Instruction groups:
● Load and store
● Arithmetic and logic
● Type conversion
● Object creation and manipulation
● Operand stack management
● Control transfer
● Method invocation and return
Only addition since 1996: invokedynamic in Java7
The bytecode “language”
public static void main(String[] args) {
long start = System.nanoTime();
while(System.nanoTime() - start < MAX_NANOS) {
for (int i = 0; i < 1_000_000; i++) {
resolved = resolve((short)rnd.nextInt(0x100));
0: invokestatic #2 // Method java/lang/System.nanoTime:()J
3: lstore_1
4: invokestatic #2 // Method java/lang/System.nanoTime:()J
7: lload_1
8: lsub
9: getstatic #3 // Field MAX_NANOS:J
12: lcmp
13: ifge 55
16: iconst_0
17: istore_3
18: iload_3
19: ldc #4 // int 1000000
21: if_icmpge 46
24: getstatic #5 // Field rnd:Ljava/util/Random;
27: sipush 256
30: invokevirtual #6 // Method java/util/Random.nextInt:(I)I
33: i2s
34: invokestatic #7 // Method resolve:(S)Lnet/bluxte/experiments/
37: putstatic #8 // Field resolved:Lnet/bluxte/experiments/
40: iinc 3, 1
43: goto 18
46: ldc2_w #9 // long 100l
49: invokestatic #11 // Method java/lang/Thread.sleep:(J)V
52: goto 4
55: return
Start Length Slot Name Signature
18 28 3 i I
0 56 0 args [Ljava/lang/String;
4 52 1 start J
Benchmarking with JMH
(Back to good old Java)
Improving our solution
We fixed the memory issue but it’s clearly non optimal
Let’s benchmark it!
private static final KeyValueStatus[] VALUES = values();
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: VALUES) {
if (value.code() == code) return vue;
return UNKNOWN;
O(n) on
constant data!
JMH: an OpenJDK project
● Provides drivers and guidance for writing tests
● Takes care of pre-warming the JVM, collecting results and computing stats
● Provides a Maven artifact type for benchmarking projects
“JMH is a Java harness for building, running,
and analysing nano/micro/milli/macro
benchmarks written in Java and other
languages targetting the JVM.”
Benchmark code
public class ValueOfBenchmark {
"0", // 0x00, Success
"1", // 0x01, Not Found
"134", // 0x86 Temporary Failure
"255", // undefined
"1024" // undefined, out of bounds
public short code;
public KeyValueStatus loopNoFastPath() {
return KeyValueStatus.valueOfLoop(code);
public KeyValueStatus loopFastPath() {
return KeyValueStatus.valueOf(code);
mvn clean install
java -jar target/benchmarks.jar
# VM invoker:
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.bluxte.experiments.couchbase_keyvalue.
# Parameters: (code = 0)
# Run progress: 0,00% complete, ETA 04:53:20
# Fork: 1 of 10
# Warmup Iteration 1: 152063982,769 ops/s
# Warmup Iteration 2: 149808416,787 ops/s
# Warmup Iteration 3: 210436722,740 ops/s
# Warmup Iteration 4: 202906403,960 ops/s
# Warmup Iteration 5: 204518647,481 ops/s
# Warmup Iteration 6: 209602101,373 ops/s
# Warmup Iteration 7: 204717066,594 ops/s
# Warmup Iteration 8: 209156212,425 ops/s
# Warmup Iteration 9: 215544157,049 ops/s
# Warmup Iteration 10: 213919676,979 ops/s
# Warmup Iteration 11: 211316588,650 ops/s
Benchmark-driven optimization
public static KeyValueStatus valueOfLoop(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
return UNKNOWN;
Benchmark (code) Mode Samples Score Score error Units
loopNoFastPath 0 avgt 10 19.383 0.331 ns/op
loopNoFastPath 1 avgt 10 19.243 0.376 ns/op
loopNoFastPath 134 avgt 10 24.855 0.651 ns/op
loopNoFastPath 255 avgt 10 30.587 0.833 ns/op
loopNoFastPath 1024 avgt 10 30.619 1.209 ns/op
Time grows linearly with value,
even with out of bound values
Initial implementation
Benchmark-driven optimization
private static final KeyValueStatus[] VALUES = values();
public static KeyValueStatus valueOf(short code) {
for (KeyValueStatus value: VALUES) {
if (value.code() == code) return value;
return UNKNOWN;
Benchmark (code) Mode Samples Score Score error Units
loopOnConstantArray 0 avgt 10 2.975 0.086 ns/op
loopOnConstantArray 1 avgt 10 3.035 0.080 ns/op
loopOnConstantArray 134 avgt 10 10.215 0.269 ns/op
loopOnConstantArray 255 avgt 10 16.856 0.679 ns/op
loopOnConstantArray 1024 avgt 10 17.015 0.577 ns/op
Still linear, removed ~15 ns
allocation overhead
Reuse the constant array
Benchmark-driven optimization
private static final Map<Short, KeyValueStatus> code2statusMap =
new HashMap<>();
static {
for (KeyValueStatus value: values()) {
code2statusMap.put(value.code(), value);
public static KeyValueStatus valueOf(final short code) {
return code2statusMap.getOrDefault(code, UNKNOWN);
Benchmark (code) Mode Samples Score Score error Units
lookupMap 0 avgt 10 4.954 0.134 ns/op
lookupMap 1 avgt 10 4.036 0.125 ns/op
lookupMap 134 avgt 10 5.597 0.157 ns/op
lookupMap 255 avgt 10 4.006 0.144 ns/op
lookupMap 1024 avgt 10 6.752 0.228 ns/op
More or less constant
Worse on small values
Way better on larger values
Prepare a hashmap,
then simple lookup
Oh wait… autoboxing!
public static net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus valueOfLookupMap(short);
0: getstatic #17 // Field code2statusMap:Ljava/util/HashMap;
3: iload_0
4: invokestatic #18 // Method java/lang/Short.valueOf:(S)Ljava/lang/Short;
7: getstatic #15 // Field UNKNOWN:Lnet/bluxte/experiments/couchbase_keyvalue/KeyValueStatus;
10: invokevirtual #19 // Method java/util/HashMap.getOrDefault:
13: checkcast #4 // class net/bluxte/experiments/couchbase_keyvalue/KeyValueStatus
16: areturn
public static KeyValueStatus valueOf(final short code) {
return code2statusMap.getOrDefault(code, UNKNOWN);
Oh wait… autoboxing!
Using Carrot HPPC (high performance primitive collections) avoids this
Benchmarking variations
private static final KeyValueStatus[] code2status =
new KeyValueStatus[0x100];
static {
Arrays.fill(code2status, UNKNOWN);
for (KeyValueStatus keyValueStatus : values()) {
if (keyValueStatus != UNKNOWN) {
code2status[keyValueStatus.code()] = keyValueStatus;
public static KeyValueStatus valueOfLookupArray(short code) {
if (code >= 0 && code < code2status.length) {
return code2status[code];
} else {
return UNKNOWN;
Benchmark (code) Mode Samples Score Score error Units
lookupArray 0 avgt 10 3.061 0.126 ns/op
lookupArray 1 avgt 10 3.048 0.127 ns/op
lookupArray 134 avgt 10 3.070 0.084 ns/op
lookupArray 255 avgt 10 3.035 0.113 ns/op
lookupArray 1024 avgt 10 3.034 0.113 ns/op
Constant fast time
No GC overhead
Prepare a lookup array,
then simple lookup
Dangers of JMH
Benchmark-driven iterations
● Can drive you to partial incremental improvements
● Take a step back, think outside of the box
Optimizing for the sake of optimizing
● Time consuming
● No real effect if not on “hot” code
Diving into OpenJDK
(This gets scary!)
The VM does a lot of things
C1 “client”
C2 “server”
Garbage collector
Finding your way in OpenJDK
Main website
Get the code:
hg clone
hg clone
still alive!
garbage collectors
bytecode interpreter
server compiler (c2 / opto)
client compiler (c1)
LLVM-based JIT
OS and/or CPU specific code
root of shared code
CPU-independent target (works
with shark JIT)
Additional support in JDK9:
● ARM 32 & 64 bits
● PowerPC
● S390
Intrinsic methods
What you see is not what you get
● The JVM “intercepts” some methods calls
○ String / StringBuffer methods, Math, Unsafe, array manipulation, etc.
● Replaced inline with native (assembly) code
○ Extremely fast and optimized
○ Not even JNI overhead
● Find them in hotspot/src/share/vm/classfile/vmSymbols.hpp
Intrinsic methods
// IndexOf for constant substrings with size >= 8 chars
// which don't need to be loaded through stack.
void MacroAssembler::string_indexofC8(Register str1, Register str2,
Register cnt1, Register cnt2,
int int_cnt2, Register result,
XMMRegister vec, Register tmp) {
ShortBranchVerifier sbv(this);
assert(UseSSE42Intrinsics, "SSE4.2 is required");
// This method uses pcmpestri inxtruction with bound registers
// inputs:
// xmm - substring
// rax - substring length (elements count)
// mem - scanned string
// rdx - string length (elements count)
// 0xd - mode: 1100 (substring search) + 01 (unsigned shorts)
// outputs:
// rcx - matched index in string
assert(cnt1 == rdx && cnt2 == rax && tmp == rcx, "pcmpestri");
Example: String.indexOf on x86
In JDK9 beta, String.indexOf(String) is faster than String.indexOf(char)!
This is because one is intrinsic, and not yet the other
Intrinsic methods
Benchmark Mode Cnt Score Error Units
# JDK 8u121
IndexOfBenchmark.StringIndexOfChar thrpt 5 141857.332 ± 5530.472 ops/s
IndexOfBenchmark.StringIndexOfString thrpt 5 113091.517 ± 2241.533 ops/s
# JDK 9b152
IndexOfBenchmark.StringIndexOfChar thrpt 5 154525.343 ± 3796.818 ops/s
IndexOfBenchmark.StringIndexOfString thrpt 5 185917.059 ± 3391.230 ops/s
(from the jdk9-dev mailing-list)
Intrinsic methods
● “I can do it better than JDK source” – think twice!
→ Have a look at vmSymbols.hpp first!
● Can sometimes be indirect (esp with strings and arrays)
● When in doubt, benchmark (with the same JVM)
● Know your tools
● Be curious, and follow the white rabbit from time to time, you’ll learn a lot
● However… don’t go overboard and waste (too much) time!
Sylvain Wallez - @bluxte
Toulouse JUG - 2017-04-26
Bonus links to dive deeper
Java MissionControl & FlightRecorder docs
What the JIT!? Anatomy of the OpenJDK HotSpot VM
Intrinsic Methods in HotSpot VM
Zero and Shark (LLVM JIT)

Inside the JVM - Follow the white rabbit!

  • 1. Inside the JVM Follow the white rabbit! Sylvain Wallez - @bluxte Toulouse JUG - 2017-04-26
  • 2. Who’s this guy? Software engineer at Elastic (cloud team) Previously: ● IoT tech lead at OVH ● CEO at Actoboard ● Backend architect at Sigfox ● CTO at Goojet/Scoop it ● Lead architect at Joost ● Member of the Apache Software Foundation ● Cofounder & CTO at Anyware Technologies (now part of Sierra Wireless)
  • 3. Agenda ● How it started: let’s optimize 6 lines of (hot) code! ● Profiling memory usage ● What’s in a class file? ● Micro-benchmarking with JMH ● Exploration of the OpenJDK source code
  • 4. How it started Let’s optimize 6 lines of (hot) code
  • 5. On the CouchBase blog... “JVM Profiling - Lessons from the trenches”: optimize the conversion of a protocol error code into a readable message. ... private final short code; private final String description; KeyValueStatus(short code, String description) { this.code = code; this.description = description; } public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN; } public enum KeyValueStatus { UNKNOWN((short) -1, "Unknown code"), SUCCESS((short) 0x00, "The operation completed successfully"), ERR_NOT_FOUND((short) 0x01, "The key does not exists"), ERR_EXISTS((short) 0x02, "The key exists in the cluster"), ERR_TOO_BIG((short) 0x03, "The document exceeds the maximum size"), ERR_INVALID((short) 0x04, "Invalid request"), ERR_NOT_STORED((short) 0x05, "The document was not stored"), ...
  • 6. On the CouchBase blog... Finding: values() is allocating memory public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN; } public static KeyValueStatus valueOf(final short code) { if (code == SUCCESS.code) { return SUCCESS; } else if (code == ERR_NOT_FOUND.code) { return ERR_NOT_FOUND; } else if (code == ERR_EXISTS.code) { return ERR_EXISTS; } else if (code == ERR_NOT_MY_VBUCKET.code) { return ERR_NOT_MY_VBUCKET; } for (KeyValueStatus value : values()) { if (value.code() == code) { return value; } } return UNKNOWN; } Optimization: fast path on common values If something goes wrong, it’ll make it worse!
  • 10. Various kinds of memory optimization ● Memory usage / memory leaks ○ My application needs tons of heap ○ How many objects are held active? → Memory profiler / jmap ● Garbage collection pressure ○ My application spends a lot of time in the GC ○ How often are objects allocated? → Java Mission Control / jmap
  • 11. jmap histograms jmap -histo num #instances #bytes class name ---------------------------------------------- 1: 4217124 674740720 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus; 2: 486 14947912 [I 3: 5855 493864 [C 4: 1461 166752 java.lang.Class 5: 5848 140352 java.lang.String 6: 503 136440 [B 7: 968 62480 [Ljava.lang.Object; 8: 1255 40160 java.util.HashMap$Node 9: 991 39640 java.util.LinkedHashMap$Entry 10: 258 30720 [Ljava.util.HashMap$Node; 11: 259 22792 java.lang.reflect.Method 12: 441 20952 [Ljava.lang.String; 13: 229 16488 java.lang.reflect.Field 14: 171 9576 java.util.LinkedHashMap 15: 291 9312 java.util.concurrent.ConcurrentHashMap$Node 16: 160 7680 java.util.HashMap 17: 178 7120 java.lang.ref.SoftReference 18: 89 7120 code available on GitHub
  • 12. jmap histograms jmap -histo:live – perform a full GC first num #instances #bytes class name ---------------------------------------------- 1: 5855 493864 [C 2: 1461 166752 java.lang.Class 3: 5848 140352 java.lang.String 4: 503 136440 [B 5: 967 62456 [Ljava.lang.Object; 6: 1255 40160 java.util.HashMap$Node 7: 991 39640 java.util.LinkedHashMap$Entry 8: 258 30720 [Ljava.util.HashMap$Node; 9: 259 22792 java.lang.reflect.Method 10: 441 20952 [Ljava.lang.String; 11: 283 19272 [I 12: 229 16488 java.lang.reflect.Field .................... 51: 35 1400 52: 3 1360 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus; 53: 55 1320$Entry 54: 29 1304 [Ljava.lang.reflect.Field; 55: 36 1152 net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus
  • 13. Java Mission Control / Java Flight Recorder Lightweight monitoring agent ● Integrated into the (Oracle) JVM ● Very low overhead Continuously samples diagnostics data ● Thread activity ● GC activity ● Memory allocations
  • 14. Java Mission Control / Java Flight Recorder Available only with Oracle JDK ● Free for development ● Commercial for use in production How to enable it? ● at launch time: java -XX:+UnlockCommercialFeatures ● after launch: jcmd <pid> VM.unlock_commercial_features
  • 15. Original code Simple loop on the enum values public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN; }
  • 16. Original code - Memory stats Looks good! No leak! Hmm… growing fast!
  • 17. Original code - Allocations
  • 18. Original code - GC activity
  • 19. Iteration on constant array Still trivial, but reuse the values array private static final KeyValueStatus[] VALUES = values(); public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return value; } return UNKNOWN; }
  • 20. Constant array - Allocations
  • 21. Constant array - GC activity
  • 22. GC pressure collateral damages Full GC clears weak references → clears some caches → additional load to repopulate them!
  • 24. Enum.values() – a generated method The compiler automatically adds some special methods when it creates an enum. For example, they have a static values method that returns an array containing all of the values of the enum in the order they are declared. – The Java Tutorial /** * Returns an array containing the constants of this enum * type, in the order they're declared. This method may be * used to iterate over the constants as follows: * * for(E c : E.values()) * System.out.println(c); * * @return an array containing the constants of this enum * type, in the order they're declared */ public static E[] values(); – The Java Language Specification
  • 25. Show me the (byte)code! public class SimpleMain { public static void main(String[] args) { System.out.println("Hello world!"); } } public class { public; Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello world! 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return } Default constructor javap -c SimpleMain.class or IntelliJ’s bytecode plugin
  • 26. Show me the (byte)code! public class SimpleMain { static String hello = "Hello"; static String world = "world"; public static void main( String[] args ) { System.out.println( hello + " " + world ); } } public class { static java.lang.String hello; static java.lang.String world; public; Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: new #3 // class java/lang/StringBuilder 6: dup 7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V 10: getstatic #5 // Field hello:Ljava/lang/String; 13: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 16: ldc #7 // String “ “ 18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 21: getstatic #8 // Field world:Ljava/lang/String; 24: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 27: invokevirtual #9 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 30: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 33: return static {}; Code: 0: ldc #11 // String Hello 2: putstatic #5 // Field hello:Ljava/lang/String; 5: ldc #12 // String world 7: putstatic #8 // Field world:Ljava/lang/String; 10: return } String concat with StringBuilder Static initializer
  • 27. Show me the (byte)code! public enum SimpleEnum { FIRST_ENUM, SECOND_ENUM } ... public static[] values(); Code: 0: getstatic #1 // Field $VALUES:[Lnet/bluxte/experiments/talk/SimpleEnum; 3: invokevirtual #2 // Method "[Lnet/bluxte/experiments/talk/SimpleEnum;".clone:()Ljava/lang/Object; 6: checkcast #3 // class "[Lnet/bluxte/experiments/talk/SimpleEnum;" 9: areturn public static valueOf(java.lang.String); Code: 0: ldc #4 // class net/bluxte/experiments/talk/SimpleEnum 2: aload_0 3: invokestatic #5 // Method java/lang/Enum.valueOf:(Ljava/lang/Class;Ljava/lang/String;)Ljava/lang/Enum; 6: checkcast #4 // class net/bluxte/experiments/talk/SimpleEnum 9: areturn ... Aha! We found the culprit!
  • 28. But why the clone? Java arrays are mutable The caller can mess with it, which would break other users → Perform a defensive copy every time How to could it be prevented? Return an immutable List, but probably too high level here
  • 29. More on the bytecode A class file is composed of: ● constant pool: strings, fields/methods name+type, class names, etc. ● fields and methods definitions and code ○ Access flags and attributes ○ Code ○ Line number table ○ Local variable table (type and name) ○ Exception table
  • 30. But wait… ...why would I want to know about this? ● Better understand low level diagnostics ● Check generated code ○ Java: enum values (!), for loops, etc ○ Scala, Kotlin: implementation of higher level constructs ○ Hibernate & co: how do they mangle your code? ● Grasping low level stuff allows writing better high-level code
  • 31. #1 = Methodref #6.#20 // java/lang/Object."<init>":()V #2 = Fieldref #21.#22 // java/lang/System.out:Ljava/io/PrintStream; #3 = String #23 // Hello world #4 = Methodref #24.#25 // java/io/PrintStream.println:(Ljava/lang/String;)V #5 = Class #26 // net/bluxte/experiments/talk/SimpleMain #6 = Class #27 // java/lang/Object #7 = Utf8 <init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 Lnet/bluxte/experiments/talk/SimpleMain; #14 = Utf8 main #15 = Utf8 ([Ljava/lang/String;)V #16 = Utf8 args #17 = Utf8 [Ljava/lang/String; #18 = Utf8 SourceFile #19 = Utf8 #20 = NameAndType #7:#8 // "<init>":()V #21 = Class #28 // java/lang/System #22 = NameAndType #29:#30 // out:Ljava/io/PrintStream; #23 = Utf8 Hello world #24 = Class #31 // java/io/PrintStream #25 = NameAndType #32:#33 // println:(Ljava/lang/String;)V #26 = Utf8 net/bluxte/experiments/talk/SimpleMain #27 = Utf8 java/lang/Object Constant pool for SimpleMain public class { public; Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello world! 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return }
  • 32. Type encoding What is (Ljava/lang/String;)V ??? L<class path>; → class name I, J, S, B, C → integer, long, short, byte, char F, D → float, double Z → boolean public String foo(int a, char[] b, List<Integer> c, boolean d) (I[CLjava/util/List;Z)Ljava/lang/String;
  • 33. The bytecode “language” Stack-based machine ● Easier to target a large variety of CPUs (Android/Dalvik is register based) Object-oriented assembler ● Method calls (static / virtual / interface / special) Controlled memory access ● Local variables ● Object fields
  • 34. The bytecode “language” Very simple 200 instructions set Instruction groups: ● Load and store ● Arithmetic and logic ● Type conversion ● Object creation and manipulation ● Operand stack management ● Control transfer ● Method invocation and return Only addition since 1996: invokedynamic in Java7
  • 35. The bytecode “language” public static void main(String[] args) { long start = System.nanoTime(); while(System.nanoTime() - start < MAX_NANOS) { for (int i = 0; i < 1_000_000; i++) { resolved = resolve((short)rnd.nextInt(0x100)); } Thread.sleep(100); } } 0: invokestatic #2 // Method java/lang/System.nanoTime:()J 3: lstore_1 4: invokestatic #2 // Method java/lang/System.nanoTime:()J 7: lload_1 8: lsub 9: getstatic #3 // Field MAX_NANOS:J 12: lcmp 13: ifge 55 16: iconst_0 17: istore_3 18: iload_3 19: ldc #4 // int 1000000 21: if_icmpge 46 24: getstatic #5 // Field rnd:Ljava/util/Random; 27: sipush 256 30: invokevirtual #6 // Method java/util/Random.nextInt:(I)I 33: i2s 34: invokestatic #7 // Method resolve:(S)Lnet/bluxte/experiments/ couchbase_keyvalue/KeyValueStatus; 37: putstatic #8 // Field resolved:Lnet/bluxte/experiments/ couchbase_keyvalue/KeyValueStatus; 40: iinc 3, 1 43: goto 18 46: ldc2_w #9 // long 100l 49: invokestatic #11 // Method java/lang/Thread.sleep:(J)V 52: goto 4 55: return LocalVariableTable: Start Length Slot Name Signature 18 28 3 i I 0 56 0 args [Ljava/lang/String; 4 52 1 start J
  • 36. Benchmarking with JMH (Back to good old Java)
  • 37. Improving our solution We fixed the memory issue but it’s clearly non optimal Let’s benchmark it! private static final KeyValueStatus[] VALUES = values(); public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return vue; }al return UNKNOWN; } O(n) on constant data!
  • 38. JMH: an OpenJDK project ● Provides drivers and guidance for writing tests ● Takes care of pre-warming the JVM, collecting results and computing stats ● Provides a Maven artifact type for benchmarking projects “JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.”
  • 39. Benchmark code @State(Scope.Benchmark) public class ValueOfBenchmark { @Param({ "0", // 0x00, Success "1", // 0x01, Not Found "134", // 0x86 Temporary Failure "255", // undefined "1024" // undefined, out of bounds }) public short code; @Benchmark public KeyValueStatus loopNoFastPath() { return KeyValueStatus.valueOfLoop(code); } @Benchmark public KeyValueStatus loopFastPath() { return KeyValueStatus.valueOf(code); } ... } mvn clean install java -jar target/benchmarks.jar # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents /Home/jre/bin/java # VM options: <none> # Warmup: 20 iterations, 1 s each # Measurement: 20 iterations, 1 s each # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: net.bluxte.experiments.couchbase_keyvalue. ValueOfBenchmark.loopNoFastPath # Parameters: (code = 0) # Run progress: 0,00% complete, ETA 04:53:20 # Fork: 1 of 10 # Warmup Iteration 1: 152063982,769 ops/s # Warmup Iteration 2: 149808416,787 ops/s # Warmup Iteration 3: 210436722,740 ops/s # Warmup Iteration 4: 202906403,960 ops/s # Warmup Iteration 5: 204518647,481 ops/s # Warmup Iteration 6: 209602101,373 ops/s # Warmup Iteration 7: 204717066,594 ops/s # Warmup Iteration 8: 209156212,425 ops/s # Warmup Iteration 9: 215544157,049 ops/s # Warmup Iteration 10: 213919676,979 ops/s # Warmup Iteration 11: 211316588,650 ops/s
  • 40. Benchmark-driven optimization public static KeyValueStatus valueOfLoop(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN; } Benchmark (code) Mode Samples Score Score error Units loopNoFastPath 0 avgt 10 19.383 0.331 ns/op loopNoFastPath 1 avgt 10 19.243 0.376 ns/op loopNoFastPath 134 avgt 10 24.855 0.651 ns/op loopNoFastPath 255 avgt 10 30.587 0.833 ns/op loopNoFastPath 1024 avgt 10 30.619 1.209 ns/op Time grows linearly with value, even with out of bound values Initial implementation
  • 41. Benchmark-driven optimization private static final KeyValueStatus[] VALUES = values(); public static KeyValueStatus valueOf(short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return value; } return UNKNOWN; } Benchmark (code) Mode Samples Score Score error Units loopOnConstantArray 0 avgt 10 2.975 0.086 ns/op loopOnConstantArray 1 avgt 10 3.035 0.080 ns/op loopOnConstantArray 134 avgt 10 10.215 0.269 ns/op loopOnConstantArray 255 avgt 10 16.856 0.679 ns/op loopOnConstantArray 1024 avgt 10 17.015 0.577 ns/op Still linear, removed ~15 ns allocation overhead Reuse the constant array
  • 42. Benchmark-driven optimization private static final Map<Short, KeyValueStatus> code2statusMap = new HashMap<>(); static { for (KeyValueStatus value: values()) { code2statusMap.put(value.code(), value); } } public static KeyValueStatus valueOf(final short code) { return code2statusMap.getOrDefault(code, UNKNOWN); } Benchmark (code) Mode Samples Score Score error Units lookupMap 0 avgt 10 4.954 0.134 ns/op lookupMap 1 avgt 10 4.036 0.125 ns/op lookupMap 134 avgt 10 5.597 0.157 ns/op lookupMap 255 avgt 10 4.006 0.144 ns/op lookupMap 1024 avgt 10 6.752 0.228 ns/op More or less constant Worse on small values Way better on larger values Prepare a hashmap, then simple lookup
  • 43. Oh wait… autoboxing! public static net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus valueOfLookupMap(short); Code: 0: getstatic #17 // Field code2statusMap:Ljava/util/HashMap; 3: iload_0 4: invokestatic #18 // Method java/lang/Short.valueOf:(S)Ljava/lang/Short; 7: getstatic #15 // Field UNKNOWN:Lnet/bluxte/experiments/couchbase_keyvalue/KeyValueStatus; 10: invokevirtual #19 // Method java/util/HashMap.getOrDefault: (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; 13: checkcast #4 // class net/bluxte/experiments/couchbase_keyvalue/KeyValueStatus 16: areturn public static KeyValueStatus valueOf(final short code) { return code2statusMap.getOrDefault(code, UNKNOWN); }
  • 44. Oh wait… autoboxing! Using Carrot HPPC (high performance primitive collections) avoids this
  • 45. Benchmarking variations private static final KeyValueStatus[] code2status = new KeyValueStatus[0x100]; static { Arrays.fill(code2status, UNKNOWN); for (KeyValueStatus keyValueStatus : values()) { if (keyValueStatus != UNKNOWN) { code2status[keyValueStatus.code()] = keyValueStatus; } } } public static KeyValueStatus valueOfLookupArray(short code) { if (code >= 0 && code < code2status.length) { return code2status[code]; } else { return UNKNOWN; } } Benchmark (code) Mode Samples Score Score error Units lookupArray 0 avgt 10 3.061 0.126 ns/op lookupArray 1 avgt 10 3.048 0.127 ns/op lookupArray 134 avgt 10 3.070 0.084 ns/op lookupArray 255 avgt 10 3.035 0.113 ns/op lookupArray 1024 avgt 10 3.034 0.113 ns/op Constant fast time No GC overhead w00t! Prepare a lookup array, then simple lookup
  • 46. Dangers of JMH Benchmark-driven iterations ● Can drive you to partial incremental improvements ● Take a step back, think outside of the box Optimizing for the sake of optimizing ● Time consuming ● No real effect if not on “hot” code
  • 48. The VM does a lot of things C1 “client” compiler C2 “server” compiler Interpreter Garbage collector
  • 49. Finding your way in OpenJDK Main website Get the code: hg clone hg clone Mercurial still alive!
  • 50. garbage collectors bytecode interpreter server compiler (c2 / opto) client compiler (c1) LLVM-based JIT OS and/or CPU specific code root of shared code
  • 51. CPU-independent target (works with shark JIT) Additional support in JDK9: ● ARM 32 & 64 bits ● PowerPC ● S390 ● AIX
  • 52. Intrinsic methods What you see is not what you get ● The JVM “intercepts” some methods calls ○ String / StringBuffer methods, Math, Unsafe, array manipulation, etc. ● Replaced inline with native (assembly) code ○ Extremely fast and optimized ○ Not even JNI overhead ● Find them in hotspot/src/share/vm/classfile/vmSymbols.hpp
  • 53. Intrinsic methods // IndexOf for constant substrings with size >= 8 chars // which don't need to be loaded through stack. void MacroAssembler::string_indexofC8(Register str1, Register str2, Register cnt1, Register cnt2, int int_cnt2, Register result, XMMRegister vec, Register tmp) { ShortBranchVerifier sbv(this); assert(UseSSE42Intrinsics, "SSE4.2 is required"); // This method uses pcmpestri inxtruction with bound registers // inputs: // xmm - substring // rax - substring length (elements count) // mem - scanned string // rdx - string length (elements count) // 0xd - mode: 1100 (substring search) + 01 (unsigned shorts) // outputs: // rcx - matched index in string assert(cnt1 == rdx && cnt2 == rax && tmp == rcx, "pcmpestri"); Label RELOAD_SUBSTR, SCAN_TO_SUBSTR, SCAN_SUBSTR, RET_FOUND, RET_NOT_FOUND, EXIT, FOUND_SUBSTR, MATCH_SUBSTR_HEAD, RELOAD_STR, FOUND_CANDIDATE; Example: String.indexOf on x86
  • 54. In JDK9 beta, String.indexOf(String) is faster than String.indexOf(char)! This is because one is intrinsic, and not yet the other Intrinsic methods Benchmark Mode Cnt Score Error Units # JDK 8u121 IndexOfBenchmark.StringIndexOfChar thrpt 5 141857.332 ± 5530.472 ops/s IndexOfBenchmark.StringIndexOfString thrpt 5 113091.517 ± 2241.533 ops/s # JDK 9b152 IndexOfBenchmark.StringIndexOfChar thrpt 5 154525.343 ± 3796.818 ops/s IndexOfBenchmark.StringIndexOfString thrpt 5 185917.059 ± 3391.230 ops/s (from the jdk9-dev mailing-list)
  • 55. Intrinsic methods ● “I can do it better than JDK source” – think twice! → Have a look at vmSymbols.hpp first! ● Can sometimes be indirect (esp with strings and arrays) ● When in doubt, benchmark (with the same JVM)
  • 57. Conclusion ● Know your tools ● Be curious, and follow the white rabbit from time to time, you’ll learn a lot ● However… don’t go overboard and waste (too much) time!
  • 58. Thanks! Questions? Sylvain Wallez - @bluxte Toulouse JUG - 2017-04-26
  • 59. Bonus links to dive deeper Java MissionControl & FlightRecorder docs What the JIT!? Anatomy of the OpenJDK HotSpot VM Intrinsic Methods in HotSpot VM Zero and Shark (LLVM JIT)