How do we go from your Java code to the CPU assembly that actually runs it? Using high level constructs has made us forget what happens behind the scenes, which is however key to write efficient code.
Starting from a few lines of Java, we explore the different layers that constribute to running your code: JRE, byte code, structure of the OpenJDK virtual machine, HotSpot, intrinsic methds, benchmarking.
An introductory presentation to these low-level concerns, based on the practical use case of optimizing 6 lines of code, so that hopefully you to want to explore further!
Presentation given at the Toulouse (FR) Java User Group.
Video (in french) at https://www.youtube.com/watch?v=rB0ElXf05nU
Slideshow with animations at https://docs.google.com/presentation/d/1eIcROfLpdTU2_Z_IKiMG-AwqZGZgbN1Bs2E0nGShpbk/pub?start=true&loop=false&delayms=60000
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Inside the JVM - Follow the white rabbit!
1. Inside the JVM
Follow the white rabbit!
Sylvain Wallez - @bluxte
Toulouse JUG - 2017-04-26
2. Who’s this guy?
Software engineer at Elastic (cloud team)
Previously:
● IoT tech lead at OVH
● CEO at Actoboard
● Backend architect at Sigfox
● CTO at Goojet/Scoop it
● Lead architect at Joost
● Member of the Apache Software Foundation
● Cofounder & CTO at Anyware Technologies (now part of Sierra Wireless)
3. Agenda
● How it started: let’s optimize 6 lines of (hot) code!
● Profiling memory usage
● What’s in a class file?
● Micro-benchmarking with JMH
● Exploration of the OpenJDK source code
5. On the CouchBase blog...
“JVM Profiling - Lessons from the trenches”: optimize the conversion of a protocol
error code into a readable message.
...
private final short code;
private final String description;
KeyValueStatus(short code, String description) {
this.code = code;
this.description = description;
}
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
}
return UNKNOWN;
}
public enum KeyValueStatus {
UNKNOWN((short) -1, "Unknown code"),
SUCCESS((short) 0x00,
"The operation completed successfully"),
ERR_NOT_FOUND((short) 0x01,
"The key does not exists"),
ERR_EXISTS((short) 0x02,
"The key exists in the cluster"),
ERR_TOO_BIG((short) 0x03,
"The document exceeds the maximum size"),
ERR_INVALID((short) 0x04,
"Invalid request"),
ERR_NOT_STORED((short) 0x05,
"The document was not stored"),
...
6. On the CouchBase blog...
Finding: values() is allocating memory
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
}
return UNKNOWN;
}
public static KeyValueStatus valueOf(final short code) {
if (code == SUCCESS.code) {
return SUCCESS;
} else if (code == ERR_NOT_FOUND.code) {
return ERR_NOT_FOUND;
} else if (code == ERR_EXISTS.code) {
return ERR_EXISTS;
} else if (code == ERR_NOT_MY_VBUCKET.code) {
return ERR_NOT_MY_VBUCKET;
}
for (KeyValueStatus value : values()) {
if (value.code() == code) {
return value;
}
}
return UNKNOWN;
}
Optimization: fast path on common values
If something
goes wrong, it’ll
make it worse!
10. Various kinds of memory optimization
● Memory usage / memory leaks
○ My application needs tons of heap
○ How many objects are held active?
→ Memory profiler / jmap
● Garbage collection pressure
○ My application spends a lot of time in the GC
○ How often are objects allocated?
→ Java Mission Control / jmap
13. Java Mission Control / Java Flight Recorder
Lightweight monitoring agent
● Integrated into the (Oracle) JVM
● Very low overhead
Continuously samples diagnostics data
● Thread activity
● GC activity
● Memory allocations
14. Java Mission Control / Java Flight Recorder
Available only with Oracle JDK
● Free for development
● Commercial for use in production
How to enable it?
● at launch time: java -XX:+UnlockCommercialFeatures
● after launch: jcmd <pid> VM.unlock_commercial_features
15. Original code
Simple loop on the enum values
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
}
return UNKNOWN;
}
16. Original code - Memory stats
Looks good! No leak!
Hmm… growing fast!
19. Iteration on constant array
Still trivial, but reuse the values array
private static final KeyValueStatus[] VALUES = values();
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: VALUES) {
if (value.code() == code) return value;
}
return UNKNOWN;
}
24. Enum.values() – a generated method
The compiler automatically adds
some special methods when it
creates an enum. For example, they
have a static values method that
returns an array containing all of the
values of the enum in the order they
are declared.
– The Java Tutorial
/**
* Returns an array containing the constants of this enum
* type, in the order they're declared. This method may be
* used to iterate over the constants as follows:
*
* for(E c : E.values())
* System.out.println(c);
*
* @return an array containing the constants of this enum
* type, in the order they're declared
*/
public static E[] values();
– The Java Language Specification
25. Show me the (byte)code!
public class SimpleMain {
public static void main(String[] args) {
System.out.println("Hello world!");
}
}
public class net.bluxte.experiments.talk.SimpleMain {
public net.bluxte.experiments.talk.SimpleMain();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello world!
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}
Default
constructor
javap -c
SimpleMain.class
or IntelliJ’s bytecode plugin
26. Show me the (byte)code!
public class SimpleMain {
static String hello = "Hello";
static String world = "world";
public static void main(
String[] args
) {
System.out.println(
hello + " " + world
);
}
}
public class net.bluxte.experiments.talk.SimpleMain {
static java.lang.String hello;
static java.lang.String world;
public net.bluxte.experiments.talk.SimpleMain();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: new #3 // class java/lang/StringBuilder
6: dup
7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
10: getstatic #5 // Field hello:Ljava/lang/String;
13: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
16: ldc #7 // String “ “
18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: getstatic #8 // Field world:Ljava/lang/String;
24: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
27: invokevirtual #9 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
30: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
33: return
static {};
Code:
0: ldc #11 // String Hello
2: putstatic #5 // Field hello:Ljava/lang/String;
5: ldc #12 // String world
7: putstatic #8 // Field world:Ljava/lang/String;
10: return
}
String concat with
StringBuilder
Static initializer
27. Show me the (byte)code!
public enum SimpleEnum {
FIRST_ENUM,
SECOND_ENUM
}
...
public static net.bluxte.experiments.talk.SimpleEnum[] values();
Code:
0: getstatic #1 // Field $VALUES:[Lnet/bluxte/experiments/talk/SimpleEnum;
3: invokevirtual #2 // Method "[Lnet/bluxte/experiments/talk/SimpleEnum;".clone:()Ljava/lang/Object;
6: checkcast #3 // class "[Lnet/bluxte/experiments/talk/SimpleEnum;"
9: areturn
public static net.bluxte.experiments.talk.SimpleEnum valueOf(java.lang.String);
Code:
0: ldc #4 // class net/bluxte/experiments/talk/SimpleEnum
2: aload_0
3: invokestatic #5 // Method java/lang/Enum.valueOf:(Ljava/lang/Class;Ljava/lang/String;)Ljava/lang/Enum;
6: checkcast #4 // class net/bluxte/experiments/talk/SimpleEnum
9: areturn
...
Aha! We found
the culprit!
28. But why the clone?
Java arrays are mutable
The caller can mess with it, which would break other users
→ Perform a defensive copy every time
How to could it be prevented?
Return an immutable List, but probably too high level here
29. More on the bytecode
A class file is composed of:
● constant pool: strings, fields/methods name+type, class names, etc.
● fields and methods definitions and code
○ Access flags and attributes
○ Code
○ Line number table
○ Local variable table (type and name)
○ Exception table
30. But wait…
...why would I want to know about this?
● Better understand low level diagnostics
● Check generated code
○ Java: enum values (!), for loops, etc
○ Scala, Kotlin: implementation of higher level constructs
○ Hibernate & co: how do they mangle your code?
● Grasping low level stuff allows writing better high-level code
32. Type encoding
What is (Ljava/lang/String;)V ???
L<class path>; → class name
I, J, S, B, C → integer, long, short, byte, char
F, D → float, double
Z → boolean
public String foo(int a, char[] b, List<Integer> c, boolean d)
(I[CLjava/util/List;Z)Ljava/lang/String;
33. The bytecode “language”
Stack-based machine
● Easier to target a large variety of CPUs
(Android/Dalvik is register based)
Object-oriented assembler
● Method calls (static / virtual / interface / special)
Controlled memory access
● Local variables
● Object fields
34. The bytecode “language”
Very simple 200 instructions set
Instruction groups:
● Load and store
● Arithmetic and logic
● Type conversion
● Object creation and manipulation
● Operand stack management
● Control transfer
● Method invocation and return
Only addition since 1996: invokedynamic in Java7
37. Improving our solution
We fixed the memory issue but it’s clearly non optimal
Let’s benchmark it!
private static final KeyValueStatus[] VALUES = values();
public static KeyValueStatus valueOf(final short code) {
for (KeyValueStatus value: VALUES) {
if (value.code() == code) return vue;
}al
return UNKNOWN;
}
O(n) on
constant data!
38. JMH: an OpenJDK project
● Provides drivers and guidance for writing tests
● Takes care of pre-warming the JVM, collecting results and computing stats
● Provides a Maven artifact type for benchmarking projects
“JMH is a Java harness for building, running,
and analysing nano/micro/milli/macro
benchmarks written in Java and other
languages targetting the JVM.”
39. Benchmark code
@State(Scope.Benchmark)
public class ValueOfBenchmark {
@Param({
"0", // 0x00, Success
"1", // 0x01, Not Found
"134", // 0x86 Temporary Failure
"255", // undefined
"1024" // undefined, out of bounds
})
public short code;
@Benchmark
public KeyValueStatus loopNoFastPath() {
return KeyValueStatus.valueOfLoop(code);
}
@Benchmark
public KeyValueStatus loopFastPath() {
return KeyValueStatus.valueOf(code);
}
...
}
mvn clean install
java -jar target/benchmarks.jar
# VM invoker:
/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents
/Home/jre/bin/java
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: net.bluxte.experiments.couchbase_keyvalue.
ValueOfBenchmark.loopNoFastPath
# Parameters: (code = 0)
# Run progress: 0,00% complete, ETA 04:53:20
# Fork: 1 of 10
# Warmup Iteration 1: 152063982,769 ops/s
# Warmup Iteration 2: 149808416,787 ops/s
# Warmup Iteration 3: 210436722,740 ops/s
# Warmup Iteration 4: 202906403,960 ops/s
# Warmup Iteration 5: 204518647,481 ops/s
# Warmup Iteration 6: 209602101,373 ops/s
# Warmup Iteration 7: 204717066,594 ops/s
# Warmup Iteration 8: 209156212,425 ops/s
# Warmup Iteration 9: 215544157,049 ops/s
# Warmup Iteration 10: 213919676,979 ops/s
# Warmup Iteration 11: 211316588,650 ops/s
40. Benchmark-driven optimization
public static KeyValueStatus valueOfLoop(final short code) {
for (KeyValueStatus value: values()) {
if (value.code() == code) return value;
}
return UNKNOWN;
}
Benchmark (code) Mode Samples Score Score error Units
loopNoFastPath 0 avgt 10 19.383 0.331 ns/op
loopNoFastPath 1 avgt 10 19.243 0.376 ns/op
loopNoFastPath 134 avgt 10 24.855 0.651 ns/op
loopNoFastPath 255 avgt 10 30.587 0.833 ns/op
loopNoFastPath 1024 avgt 10 30.619 1.209 ns/op
Time grows linearly with value,
even with out of bound values
Initial implementation
45. Benchmarking variations
private static final KeyValueStatus[] code2status =
new KeyValueStatus[0x100];
static {
Arrays.fill(code2status, UNKNOWN);
for (KeyValueStatus keyValueStatus : values()) {
if (keyValueStatus != UNKNOWN) {
code2status[keyValueStatus.code()] = keyValueStatus;
}
}
}
public static KeyValueStatus valueOfLookupArray(short code) {
if (code >= 0 && code < code2status.length) {
return code2status[code];
} else {
return UNKNOWN;
}
}
Benchmark (code) Mode Samples Score Score error Units
lookupArray 0 avgt 10 3.061 0.126 ns/op
lookupArray 1 avgt 10 3.048 0.127 ns/op
lookupArray 134 avgt 10 3.070 0.084 ns/op
lookupArray 255 avgt 10 3.035 0.113 ns/op
lookupArray 1024 avgt 10 3.034 0.113 ns/op
Constant fast time
No GC overhead
w00t!
Prepare a lookup array,
then simple lookup
46. Dangers of JMH
Benchmark-driven iterations
● Can drive you to partial incremental improvements
● Take a step back, think outside of the box
Optimizing for the sake of optimizing
● Time consuming
● No real effect if not on “hot” code
48. The VM does a lot of things
C1 “client”
compiler
C2 “server”
compiler
Interpreter
Garbage collector
49. Finding your way in OpenJDK
Main website http://openjdk.java.net/
Get the code:
hg clone http://hg.openjdk.java.net/jdk8/jdk8/hotspot/
hg clone http://hg.openjdk.java.net/jdk8/jdk8/jdk/
Mercurial
still alive!
52. Intrinsic methods
What you see is not what you get
● The JVM “intercepts” some methods calls
○ String / StringBuffer methods, Math, Unsafe, array manipulation, etc.
● Replaced inline with native (assembly) code
○ Extremely fast and optimized
○ Not even JNI overhead
● Find them in hotspot/src/share/vm/classfile/vmSymbols.hpp
53. Intrinsic methods
// IndexOf for constant substrings with size >= 8 chars
// which don't need to be loaded through stack.
void MacroAssembler::string_indexofC8(Register str1, Register str2,
Register cnt1, Register cnt2,
int int_cnt2, Register result,
XMMRegister vec, Register tmp) {
ShortBranchVerifier sbv(this);
assert(UseSSE42Intrinsics, "SSE4.2 is required");
// This method uses pcmpestri inxtruction with bound registers
// inputs:
// xmm - substring
// rax - substring length (elements count)
// mem - scanned string
// rdx - string length (elements count)
// 0xd - mode: 1100 (substring search) + 01 (unsigned shorts)
// outputs:
// rcx - matched index in string
assert(cnt1 == rdx && cnt2 == rax && tmp == rcx, "pcmpestri");
Label RELOAD_SUBSTR, SCAN_TO_SUBSTR, SCAN_SUBSTR,
RET_FOUND, RET_NOT_FOUND, EXIT, FOUND_SUBSTR,
MATCH_SUBSTR_HEAD, RELOAD_STR, FOUND_CANDIDATE;
Example: String.indexOf on x86
54. In JDK9 beta, String.indexOf(String) is faster than String.indexOf(char)!
This is because one is intrinsic, and not yet the other
Intrinsic methods
Benchmark Mode Cnt Score Error Units
# JDK 8u121
IndexOfBenchmark.StringIndexOfChar thrpt 5 141857.332 ± 5530.472 ops/s
IndexOfBenchmark.StringIndexOfString thrpt 5 113091.517 ± 2241.533 ops/s
# JDK 9b152
IndexOfBenchmark.StringIndexOfChar thrpt 5 154525.343 ± 3796.818 ops/s
IndexOfBenchmark.StringIndexOfString thrpt 5 185917.059 ± 3391.230 ops/s
(from the jdk9-dev mailing-list)
55. Intrinsic methods
● “I can do it better than JDK source” – think twice!
→ Have a look at vmSymbols.hpp first!
● Can sometimes be indirect (esp with strings and arrays)
● When in doubt, benchmark (with the same JVM)
57. Conclusion
● Know your tools
● Be curious, and follow the white rabbit from time to time, you’ll learn a lot
● However… don’t go overboard and waste (too much) time!
59. Bonus links to dive deeper
Java MissionControl & FlightRecorder docs
What the JIT!? Anatomy of the OpenJDK HotSpot VM
Intrinsic Methods in HotSpot VM
Zero and Shark (LLVM JIT)