This document discusses the Java Memory Model (JMM). It begins by introducing the goals of familiarizing the attendee with the JMM, how processors work, and how the Java compiler and JVM work. It then covers key topics like data races, synchronization, atomicity, and examples. The document provides examples of correctly synchronized programs versus programs with data races. It explains concepts like happens-before ordering, volatile variables, and atomic operations. It also discusses weaknesses in some common multi-threading constructs like double-checked locking and discusses how constructs like final fields can enable safe publication of shared objects. The document concludes by mentioning planned improvements to the JMM in JEP 188.
3. Goal
• Familiarize with the JMM,
• How processor works?
• Recall how Java compiler and JVM work,
• JIT in action,
• Explain what is a data race and a correctly synchronized
program,
• Talk about synchronization and atomicity,
• Based on examples...
• Next-gen JMM...
8. RAM
i = 0
j = 0
Cache
Program execution
System Bus
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
The Java Memory Model for Practitioners: http://bit.ly/2cMXklJ
9. RAM
i = 0
j = 0
Cache
Program execution
System Bus
i = 0
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
10. RAM
i = 0
j = 0
Cache
Program execution
System Bus
i = 1
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
11. RAM
i = 0
j = 0
Cache
Program execution
System Bus
i = 1
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
12. RAM
i = 1
j = 0
Cache
Program execution
System Bus
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
13. RAM
i = 1
j = 0
Cache
Program execution
System Bus
j = 0
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
14. RAM
i = 1
j = 0
Cache
Program execution
System Bus
j = 1
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
15. RAM
i = 1
j = 1
Cache
Program execution
System Bus
j = 1
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
16. RAM
i = 1
j = 1
Cache
Program execution
System Bus
Sequentialy consistent
execution
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}
25. Core i7 Xeon 5500 Series Data Source Latency (approximate)
local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns )
remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns )
local DRAM ~60 ns
remote DRAM ~100 ns
Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors: http://intel.ly/2cV1ZFZ
Cache latency
26. Weak vs. Strong hardware Memory Models
Weak vs. Strong Memory Models: http://bit.ly/2cC4avk
27. x86/x64 processor memory model
R-R R-W
W-R W-W
Intel® 64 and IA-32 Architectures Software Developer’s Manual: http://intel.ly/2csMyB2
Processor P can read B
before it’s write to A is seen
by all processors
(processor can move its
own reads in front of its
own writes)
28. x86/x64 processor memory model
R-R R-W
W-R W-W
Intel® 64 and IA-32 Architectures Software Developer’s Manual: http://intel.ly/2csMyB2
Processor P can read B
before it’s write to A is seen
by all processors
(processor can move its
own reads in front of its
own writes)
29. How Java compiler works?
javac
Source
code
Byte
code
Bytecode
verifier
Class loader
JIT
JVM
OS
Native
code
Byte
code
33. Tiered compilation (C1 compiler)
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter
C1
• client,
• fast but dummy,
• does the profiling,
• e.g: branches, typechecks,
34. Tiered compilation (C2 compiler)
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter
C2
• server,
• slow but clever,
• aggresively optimizing,
• based on profile,
• e.g.: loop optimizations
(unswitching, unrolling),
Implicit Null Checking
35. Why do we need a JMM?
• Different platform memory models (none of them match the JMM!!!)
• Many JVM implementations,
• People don’t know how to program concurrently,
• Programmers: write reliable and multithreaded code,
• Compiler writers: implement optimization which will be a legal,
optimization according to the JLS
• Compiler: produce fast and optimal native code,
36. JMM
• Action: read and write to variable, lock and unlock of monitor, starting
and joining with thread,
• Happens-before partial order,
• Thread executing action B can see the results of action A (any thread),
there must be a happens-before relationship between A and B,
• Otherwise JVM is free to reorder,
37. Happens-before orderings
• Unlock of a monitor / lock of that monitor,
• Write to a volatile variable / read of that variable,
• Call to start() / any action in the started thread,
• All actions in a thread / any other thread successfully returns from
join() on that thread,
• Setting default values for variables, setting value to a final field in the
constructor / constructor finish,
• Write to an Atomic variable / read from that variable,
• Many java.util.concurrent methods,
38. JMM
• A promise for programmers: sequential consistency must be sacrificed to allow
optimizations, but it will still hold for data race free programs. This is the data
race free (DRF) guarantee.
• A promise for security: even for programs with data races, values should not
appear “out of thin air”, preventing unintended information leakage.
• A promise for compilers: common hardware and software optimizations should
be allowed as far as possible without violating the first two requirements.
Java Memory Model Examples: Good, Bad and Ugly: http://bit.ly/2cZfF1I
40. How can this happen?
• Processor can reorder statements (out-of-order execution,
HT)
• Lazy synchronization between caches and main memory,
• Compiler can reorder statements (or keep values is registers),
• Aggressive optimizations in JIT,
41. Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
y = a;
b = 1;
x = b;
a = 2;
42. Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
y = a;
x = b;
a = 2;
43. Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
y = a;
a = 2;
x = b;
44. Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
a = 2;
x = b;
y = a;
y == 2, x == 1
51. Visibility between threads
@ThreadSafe
public class DataRace {
int a, b;
int x, y;
void thread1() {
synchronized (this) {
y = a;
b = 1;
}
}
void thread2() {
synchronized (this) {
x = b;
a = 2;
}
}
}
52. Visibility between threads
time
Thread 1 Thread 2
(Th2 starts after Th1)
Program
order
Program
order
synchronization
order
Every operation that
happens before
an unlock (release)
Is visible to an operation that
happens after
a later lock (aquire)happens-before
order
@ThreadSafe
public class DataRace {
int a, b;
int x, y;
void thread1() {
synchronized (this) {
y = a;
b = 1;
}
}
void thread2() {
synchronized (this) {
x = b;
a = 2;
}
}
}
.
.
.
<enter this>
y = a;
b = 1;
<exit this>
<enter this>
x = b;
a = 2;
<exit this>
.
.
.
Possible results:
y==0, x == 1
y==2, x == 0
54. Volatile
@ThreadUnsafe
public class Looper {
static boolean done;
public static void main(String[] args)
throws InterruptedException {
new Thread(new Runnable() {
@Override
public void run() {
int count = 0;
while (!done) {
count++;
}
System.out.println("Ending this task");
}
}).start();
Thread.sleep(1000);
System.out.println("Waiting done");
done = true;
}
}
55. Volatile
@ThreadSafe
public class Looper {
volatile static boolean done;
public static void main(String[] args)
throws InterruptedException {
new Thread(new Runnable() {
@Override
public void run() {
int count = 0;
while (!done) {
count++;
}
System.out.println("Ending this task");
}
}).start();
Thread.sleep(1000);
System.out.println("Waiting done");
done = true;
}
}
Program
order
Program
order
synchronization
order
Thread 1
time
Thread 2
.
.
.
done = true;
while (!done)
.
.
.
happens-before
order
56. More about volatile
• Volatile reads are very cheep (no locks compared to
synchronized)
• Volatile increment is not atomic (!!!)
• Elements in volatile collection are not volatile (e.g. volatile
int[])
• Consider using java.util.concurrent
57. What operations in Java are atomic?
• Read/write on variables of primitive types (except of long
and double – Word Tearing problem),
• Read/write on volatile variables of primitive type (including
long and double),
• All read/writes to references are always atomic
(http://bit.ly/2c8kn8i),
• All operations on java.util.concurrent.atomic types,
59. Double-checked locking
@ThreadSafe
public class DoubleCheckedLocking {
private volatile Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized (this) {
if (helper == null)
helper = new Helper();
}
}
return helper;
}
}
The "Double-Checked Locking is Broken" Declaration: http://bit.ly/2cIDBnA
60. Final
@ThreadUnsafe
class UnsafePublication {
private int a;
private static UnsafePublication instance;
private UnsafePublication() {
a = 1;
}
void thread1() throws InterruptedException {
instance = new UnsafePublication();
}
void thread2() {
if (instance != null) {
System.out.println(instance.a);
}
}
}
What state
can thread 2 see???
null, 0, 1
61. Final
@ThreadSafe
class SafePublication {
private final int a;
private static SafePublication instance;
private SafePublication() {
a = 1;
}
void thread1() throws InterruptedException {
instance = new SafePublication();
}
void thread2() {
if (instance != null) {
System.out.println(instance.a);
}
}
}
62. Next-JMM
• JEP 188,
• Improve formalization,
• JVM coverage,
• Extend scope,
• Testing support,
• Tool support,
• Enh: atomic r/w for long and double,
63. To sum up...
• Concurrent programming isn’t easy,
• Design your code for concurrency (make it right before you
make it fast),
• Do not code against the implementation. Code against the
specification,
• Use high level synchronization wherever possible,
• Watch out for useless synchronization,
• Use Thread Safe Immutable objects,
64. Further reading
• Aleksey Shipilëv: One Stop Page
(http://bit.ly/2cqBt4x),
• Rafael Winterhalter: The Java Memory Model for
Practitioners (http://bit.ly/2cMXklJ),
• Brian Goetz: Java Concurrency in Practice
(http://amzn.to/2cloe76)