3. Philip Yankov
• Sofia University – CS and AI
• Previous experience:
– SAP Labs Bulgaria, MicrosoN, VMware
– Chobolabs and other startups
– Toptal
• Global winner of NASA Space Apps Challenge
with a prototype for Smart Glove
• x8academy – an Academy for BETTER soNware
engineers
– soon AI and MulU-threading courses
4. x = y = 0
x = 1
j = y
y = 1
i = x
Thread 1 Thread 2
What could be the result?
l Compiler can reorder
instrucUons.
l Compiler can keep
values in registers.
l Processor can reorder
instrucUons.
l Values may not be
synchronized to main
memory.
5. l i = 1; j = 1
l i = 0; j = 1
l i = 1; j = 0
l i = 0; j = 0
So in order to develop a mulU-threaded applicaUon we
need to understand:
• In what order the acUons are executed in the
applicaUon;
• How does the data sharing between threads?
Answer(s)
18. Two actions can be ordered by a happens-
before relationship. If one action happens-
before another, then the first is visible to and
ordered before the second.
Java Language Specification, Java SE 7 Edition
Happens-before order
33. class ArrayTest {
volatile boolean[] ready = new boolean[] { false };
int answer = 0;
void thread1() {
while (!ready[0]);
assert answer == 42;
}
void thread2() {
answer = 42;
ready[0] = true;
}
}
Declaring an array to be volaUle does not make its elements vola#le! In the above
example, there is no write-read edge because the array is only read by any thread.
For such volaUle element access, use java.u#l.concurrent.atomic.AtomicIntegerArray.
Array Elements
38. class ThreadLifeCycle {
int foo = 0;
void method() {
foo = 42;
new Thread() {
@Override
public void run() {
assert foo == 42;
}
}.start();
}
}
Thread start
39.
40.
41.
42.
43. instance = <allocate>;
instance.foo = 42;
<freeze instance.foo>
if (instance != null) {
assert instance.foo == 42;
}
Ame
. . .
. . .
happens-before order
dereference order
When a thread creates an instance, the instance’s final fields are frozen. The Java
memory model requires a field’s ini#al value to be visible in the iniUalized form to
other threads.
This requirement also holds for properUes that are dereferenced via a final field,
even if the field value’s properUes are not final themselves (memory-chain order).
constructor
Does not apply for (reflec#ve) changes outside of a constructor / class iniUalizer.
Final fields operaUons order
44. class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
Guaranteed value 3
4 or 0 !!
Final fields
50. Source: h*p://shipilev.net/blog/2014/safe-public-construcAon/
x86 ARM
1 thread 8 threads 1 thread 4 threads
final wrapper 2.256 2.485 28.228 28.237
enum holder 2.257 2.415 13.523 13.530
double-checked 2.256 2.475 33.510 29.412
synchronized 18.860 302.346 77.560 1291.585
Problem: how to publish an instance of a class that does not define its fields to be final?
measured in ns/op; conAnuous instance requests
Besides plain synchroniza#on and the double-checked locking idiom, Java offers:
1. Final wrappers: Where double-checked locking requires volaUle field access, this
access can be avoided by wrapping the published instance in a class that stores
the singleton in a final field.
2. Enum holder: By storing a singleton as a field of an enumeraUon, it is guaranteed
to be iniUalized due to the fact that enumeraUons guarantee full iniUalizaUon.
Safe iniUalizaUon and publicaUon
52. class Externalization {
int foo = 0;
void method() {
foo = 42;
jni();
}
native void jni(); /* {
assert foo == 42;
} */
}
A JIT-compiler cannot determine the side-effects of a naUve operaUon. Therefore,
external ac#ons are guaranteed to not be reordered.
External acUons include JNI, socket communicaUon, file system operaUons or
interacUon with the console (non-exclusive list).
program
order
External acUon
54. Required
barriers
2nd operation
1st operation
Normal Load Normal Store Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load LoadStore
Normal Store StoreStore
Volatile Load
MonitorEnter
LoadLoad LoadStore LoadLoad LoadStore
Volatile Store
MonitorExit
StoreLoad StoreStore
The JSR-133 Cookbook for Compiler Writers
Memory Barriers
56. ARM PowerPC SPARC TSO x86 AMD64
load-load yes yes no no no
load-store yes yes no no no
store-store yes yes no no no
store-load yes yes yes yes yes
ARM
x86
Source: Wikipedia
Processor opUmizaUons
61. l Loads are not reordered with other loads.
l Stores are not reordered with other stores.
l Stores are not reordered with older loads.
l Loads may be reordered with older stores to different loca#ons but not with
older stores to the same loca#on.
l In a mulUprocessor system, memory ordering obeys causality (memory ordering
respects transiUve visibility).
l In a mulUprocessor system, stores to the same locaUon have a total order.
l In a mul#processor system, locked instruc#ons have a total order.
l Loads and stores are not reordered with locked instruc#ons.
Intel x86/64 Memory model details
67. Processor LoadStore LoadLoad StoreStore StoreLoad Data
dependency
orders
loads?
Atomic
Conditional
Other
Atomics
Atomics
provide
barrier?
sparc-TSO no-op no-op no-op membar
(StoreLoad)
yes CAS:
casa
swap,
ldstub
full
x86 no-op no-op no-op mfence or
cpuid or
locked
insn
yes CAS:
cmpxchg
xchg,
locked
insn
full
ia64 combine
with
st.rel or
ld.acq
ld.acq st.rel mf yes CAS:
cmpxchg
xchg,
fetchadd
target +
acq/rel
arm dmb
(see below)
dmb
(see below)
dmb-st dmb indirection
only
LL/SC:
ldrex/strex
target
only
ppc lwsync
(see below)
lwsync
(see below)
lwsync hwsync indirection
only
LL/SC:
ldarx/stwcx
target
only
alpha mb mb wmb mb no LL/SC:
ldx_l/stx_c
target
only
pa-risc no-op no-op no-op no-op yes build
from
ldcw
ldcw (NA)
The JSR-133 Cookbook for Compiler Writers
* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with
these streaming instructions.
Memory barriers - architecture