Java Memory Model

JMM
Java Memory Model
Łukasz Koniecki
24/10/2016

About me
Java
Universe
Spring
MyFaces
JSF
Play
Spark
GWT
Vadin
Tapestry
Wicket
Spring MVC
Struts
Grails
REST API
JPA
GC
JVM
JAVA EE
Tomcat
Spark

Goal
• Familiarize with the JMM,
• How processor works?
• Recall how Java compiler and JVM work,
• JIT in action,
• Explain what is a data race and a correctly synchronized
program,
• Talk about synchronization and atomicity,
• Based on examples...
• Next-gen JMM...

Wikipedia: http://bit.ly/2cMU0GB
Von Neumann Architecture

Dummy program
public class Example {
int i, j;
public void myDummyMethod() {
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 0
j = 0
Cache
Program execution
System Bus
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}
The Java Memory Model for Practitioners: http://bit.ly/2cMXklJ

RAM
i = 0
j = 0
Cache
Program execution
System Bus
i = 0
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 0
j = 0
Cache
Program execution
System Bus
i = 1
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 1
j = 0
Cache
Program execution
System Bus
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 1
j = 0
Cache
Program execution
System Bus
j = 0
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 1
j = 0
Cache
Program execution
System Bus
j = 1
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 1
j = 1
Cache
Program execution
System Bus
j = 1
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

RAM
i = 1
j = 1
Cache
Program execution
System Bus
Sequentialy consistent
execution
int i, j;
i+=1;
j+=1;
i+=1;
...
}
}

PC World: http://bit.ly/2cE9f7q
Haswell-E processor

Our world in data: http://bit.ly/1NLxNcH
Moore’s Law

Moore’s Law
Our world in data: http://bit.ly/1NLxNcH
2006

Processor technology
• ...
• 22 nm – 2012
• 14 nm – 2014
• 10 nm – 2017
• 7 nm – ~2019
• 5 nm – ~2021
Wikipedia: http://bit.ly/2cMWoNg

Processor vs. Memory Performance
How L1 and L2 CPU caches work, and why they’re an essential part of modern chips: http://bit.ly/2cpHu1x

Wikipedia: http://bit.ly/2cm33me
Cache hierarchy in a modern processor

Core i7 Xeon 5500 Series Data Source Latency (approximate)
local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns )
remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns )
local DRAM ~60 ns
remote DRAM ~100 ns
Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors: http://intel.ly/2cV1ZFZ
Cache latency

Weak vs. Strong hardware Memory Models
Weak vs. Strong Memory Models: http://bit.ly/2cC4avk

x86/x64 processor memory model
R-R R-W
W-R W-W
Intel® 64 and IA-32 Architectures Software Developer’s Manual: http://intel.ly/2csMyB2
Processor P can read B
before it’s write to A is seen
by all processors
(processor can move its
own reads in front of its
own writes)

How Java compiler works?
javac
Source
code
Byte
code
Bytecode
verifier
Class loader
JIT
JVM
OS
Native
code
Byte
code

JIT
•Profile guided,
•Speculatively optimizing,
•Backup strategies,
•Optimizes code for us,
•We don’t have to care so much about cache-wise
operations

Tiered compilation
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter

Tiered compilation (interpreter)
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter
Interpreter
• extremly slow,
• not profiling

Tiered compilation (C1 compiler)
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter
C1
• client,
• fast but dummy,
• does the profiling,
• e.g: branches, typechecks,

Tiered compilation (C2 compiler)
time
throuput
startup
interpreted
C1
C2
sampling full speed
deoptimize
bail to interpreter
C2
• server,
• slow but clever,
• aggresively optimizing,
• based on profile,
• e.g.: loop optimizations
(unswitching, unrolling),
Implicit Null Checking

Why do we need a JMM?
• Different platform memory models (none of them match the JMM!!!)
• Many JVM implementations,
• People don’t know how to program concurrently,
• Programmers: write reliable and multithreaded code,
• Compiler writers: implement optimization which will be a legal,
optimization according to the JLS
• Compiler: produce fast and optimal native code,

JMM
• Action: read and write to variable, lock and unlock of monitor, starting
and joining with thread,
• Happens-before partial order,
• Thread executing action B can see the results of action A (any thread),
there must be a happens-before relationship between A and B,
• Otherwise JVM is free to reorder,

Happens-before orderings
• Unlock of a monitor / lock of that monitor,
• Write to a volatile variable / read of that variable,
• Call to start() / any action in the started thread,
• All actions in a thread / any other thread successfully returns from
join() on that thread,
• Setting default values for variables, setting value to a final field in the
constructor / constructor finish,
• Write to an Atomic variable / read from that variable,
• Many java.util.concurrent methods,

JMM
• A promise for programmers: sequential consistency must be sacrificed to allow
optimizations, but it will still hold for data race free programs. This is the data
race free (DRF) guarantee.
• A promise for security: even for programs with data races, values should not
appear “out of thin air”, preventing unintended information leakage.
• A promise for compilers: common hardware and software optimizations should
be allowed as far as possible without violating the first two requirements.
Java Memory Model Examples: Good, Bad and Ugly: http://bit.ly/2cZfF1I

Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
y == 2, x == 1 ???

How can this happen?
• Processor can reorder statements (out-of-order execution,
HT)
• Lazy synchronization between caches and main memory,
• Compiler can reorder statements (or keep values is registers),
• Aggressive optimizations in JIT,

Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
y = a;
b = 1;
x = b;
a = 2;

Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
y = a;
x = b;
a = 2;

Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
y = a;
a = 2;
x = b;

Example
@NotThreadSafe
class DataRace {
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
void thread2() {
x = b;
a = 2;
}
}
time
Thread 1 Thread 2
b = 1;
a = 2;
x = b;
y = a;
y == 2, x == 1

Example of x86/x64 test results

Test using jstress
@JCStressTest
@Description("Data race")
@Outcome(id = {"0, 0", "0, 1", "2, 0"}, expect = ACCEPTABLE,
desc = "Trivial under sequential consistency")
@Outcome(id = {"2, 1"}, expect = ACCEPTABLE, desc = "Racy read of x")
@State
public class DataRace {
int a, b;
int x, y;
@Actor
void thread1(IntResult2 r) {
y = a;
b = 1;
r.r1 = y;
}
@Actor
void thread2(IntResult2 r) {
x = b;
a = 2;
r.r2 = x;
}
}
jcstress: http://bit.ly/2daSL5Q

Example of x86/x64 test results
R-R R-W
W-R W-W

Test results interpretation
y==0, x==0
y==0, x==1
y==2, x==0
time
.
.
.
y = a;
b = 1;
.
.
.
x = b;
a = 2;

Visibility between threads
@ThreadSafe
int a, b;
int x, y;
void thread1() {
synchronized (this) {
y = a;
b = 1;
}
}
void thread2() {
x = b;
a = 2;
}
}
}

Visibility between threads
time
Thread 1 Thread 2
(Th2 starts after Th1)
Program
order
Program
order
synchronization
order
Every operation that
happens before
an unlock (release)
Is visible to an operation that
happens after
a later lock (aquire)happens-before
order
@ThreadSafe
int a, b;
int x, y;
void thread1() {
y = a;
b = 1;
}
}
void thread2() {
x = b;
a = 2;
}
}
}
.
.
.
<enter this>
y = a;
b = 1;
<exit this>
<enter this>
x = b;
a = 2;
<exit this>
.
.
.
Possible results:
y==0, x == 1
y==2, x == 0

Synchronization
High level
• java.util.concurrent
Low level
• synchronized() blocks and methods,
• java.util.concurrent.locks
Low level primitives
• volatile variables
• java.util.concurrent.atomic

Volatile
@ThreadUnsafe
public class Looper {
static boolean done;
public static void main(String[] args)
throws InterruptedException {
new Thread(new Runnable() {
@Override
public void run() {
int count = 0;
while (!done) {
count++;
}
System.out.println("Ending this task");
}
}).start();
Thread.sleep(1000);
System.out.println("Waiting done");
done = true;
}
}

Volatile
@ThreadSafe
public class Looper {
volatile static boolean done;
public static void main(String[] args)
throws InterruptedException {
new Thread(new Runnable() {
@Override
public void run() {
int count = 0;
while (!done) {
count++;
}
System.out.println("Ending this task");
}
}).start();
Thread.sleep(1000);
System.out.println("Waiting done");
done = true;
}
}
Program
order
Program
order
synchronization
order
Thread 1
time
Thread 2
.
.
.
done = true;
while (!done)
.
.
.
happens-before
order

More about volatile
• Volatile reads are very cheep (no locks compared to
synchronized)
• Volatile increment is not atomic (!!!)
• Elements in volatile collection are not volatile (e.g. volatile
int[])
• Consider using java.util.concurrent

What operations in Java are atomic?
• Read/write on variables of primitive types (except of long
and double – Word Tearing problem),
• Read/write on volatile variables of primitive type (including
long and double),
• All read/writes to references are always atomic
(http://bit.ly/2c8kn8i),
• All operations on java.util.concurrent.atomic types,

Examples
Be careful what you’re doing...

Double-checked locking
@ThreadSafe
public class DoubleCheckedLocking {
private volatile Helper helper = null;
public Helper getHelper() {
if (helper == null) {
if (helper == null)
helper = new Helper();
}
}
return helper;
}
}
The "Double-Checked Locking is Broken" Declaration: http://bit.ly/2cIDBnA

Final
@ThreadUnsafe
class UnsafePublication {
private int a;
private static UnsafePublication instance;
private UnsafePublication() {
a = 1;
}
void thread1() throws InterruptedException {
instance = new UnsafePublication();
}
void thread2() {
if (instance != null) {
System.out.println(instance.a);
}
}
}
What state
can thread 2 see???
null, 0, 1

Final
@ThreadSafe
class SafePublication {
private final int a;
private static SafePublication instance;
private SafePublication() {
a = 1;
}
void thread1() throws InterruptedException {
instance = new SafePublication();
}
void thread2() {
if (instance != null) {
System.out.println(instance.a);
}
}
}

Next-JMM
• JEP 188,
• Improve formalization,
• JVM coverage,
• Extend scope,
• Testing support,
• Tool support,
• Enh: atomic r/w for long and double,

To sum up...
• Concurrent programming isn’t easy,
• Design your code for concurrency (make it right before you
make it fast),
• Do not code against the implementation. Code against the
specification,
• Use high level synchronization wherever possible,
• Watch out for useless synchronization,
• Use Thread Safe Immutable objects,

Further reading
• Aleksey Shipilëv: One Stop Page
(http://bit.ly/2cqBt4x),
• Rafael Winterhalter: The Java Memory Model for
Practitioners (http://bit.ly/2cMXklJ),
• Brian Goetz: Java Concurrency in Practice
(http://amzn.to/2cloe76)

Java Memory Model

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Java Memory Model

Similaire à Java Memory Model (20)

Dernier

Dernier (20)

Java Memory Model