2. Memory...
● When programming we use memory all the
time
– Reading/Writing data structures on the heap, stack
or data segment.
– Reading/Writing from/to hardware
– By “Memory” I do not refer to registers in this
presentation (since every core has it's own
registers)
3. What kinds of guarantees do we
want from memory operations?
● That the operation is not optimized away completely
● That the operation does not take place in registers (that are not
visible to other cores by definition)
● Visibility to other cores (bypass/flush/sync CPU caches)
● Visibility to hardware
● Atomicity
● Order between such two or more memory operations
● Any combination of the above (possibly none of them...)
● Regular C programming (without using special features, that is), the
compiler and the CPU provides none of the above guarantees
4. So what guarantees does volatile
provide?
● It's just not clear!
● The first two, yes
● The others, maybe. No in a lot of the architectures.
● Depends on your compiler, it's version, your compilation flags,
the astrological sign of the compiler authors best friend...
● To be specific: most volatile implementations do not imply
atomicity or ordering.
● And what does volatile mean for bigger than word or int or
long structures? Pass me the joint as things are getting hazy...
● Don't use volatile (true in most cases!)
5. Memory reordering
● Imagine the following code (with no compiler
optimizations):
●
●
● What states do you expect other cores to see?
● Or maybe
●
● Yes! The CPU does this. (well, not Intel, but others)
X=5
Y=6
X=7
Y=8
X=5, Y=6
X=7, Y=6
X=7, Y=8
X=5, Y=6
X=5, Y=8
X=7, Y=8
6. Is the Compiler/CPU allowed to do
that?
● Yes. Actually there are many types of reordering that the
Compiler/CPU is allowed to perform
● Common CPU reordering include:
– Load reordered after load
– Load reordered after store
– Store reordered after load
– Store reordered after store
– Store reordered after atomics
– Load reordered after atomics
– Dependant load reordered (YES! Alpha does this, they should all
be locked up...)
7. So what do the compiler/CPU
guarantee?
●
They guarantee results in one thread.
●
This means that they may alter your code, reorder it, discard parts of it, use
different operations than the ones you use and more.
● But all of these guarantee that the results you will get will be the same, in
the same thread that they are in.
●
But sometimes you want your code to be left unaltered.
●
This is especially true when other threads or hardware is involved.
●
In these cases the order matters, the specific operations matter, etc.
8. Enter memory barrier/fence
● A machine memory barrier is a special machine instruction or a
special type of memory access instruction that guarantees order
of execution between memory instructions before it and after it.
● __sync_synchronize() in gcc (user space).
● asm volatile ("mfence" ::: "memory")
● (smp_?)mb(),(smp_?)rmb(),(smp_?)wmb() in kernel development.
● In most cases atomic operations imply a memory barrier of some sort
and new C++11 has nice API with memory model included.
9. OK, prove it to me...
● Time for a demo.
● Two threads, when we start we have:
●
●
●
●
● Could it be that R1==R2==0 at the end?
X=0
Y=0
X=1
R1=Y
Y=1
R2=X
10. Hey, but I need volatile to overcome
the compiler!
● No, you don't
● There is something called a “compiler barrier”
● Compiler barriers usually offer several features:
– Forces the compiler to sync unsynchronized registers with memory so that memory writes
before the barrier will go to memory (no cache flush, no memory barrier)
– Forces the compiler to read from memory after the barrier even if the compiler thinks it knows
the value of certain memory locations.
– Forces order of memory operations at the compiler level (not machine level) in relation to the
barrier location in the code
● A compiler barrier is not a machine instruction (as opposed to memory barrier
● It is a compiler directive, influencing how to the compiler will generate machine code
after the directive is given.
● The compiler may emit machine instructions or it may not (depends on many factors)
● Time for another demo again...
11. References
● “What every programmer should know about memory” by
Ulrich Drepper
● “memory-barries.txt” from the Linux kernel.
● The example for memory barriers shown is derived from
“Memory Reordering Caught in the Act” by Jeff Preshing
● “Volatile_variable” from wikipedia
● “Memory_barrier” from wikipedia
● All examples can be found at linuxapi project at GitHub by
me.