3. Achieved since LCA13
●
Switched to gcc-4.8
– Lots of backports from trunk
●
Gcc-4.7 is now in maintenance
●
Improved epilogues of leaf functions (can now use LR)
●
Shrink-wrapping
●
Progress on conditional compare support
●
Progress on VRP (Value Range Propagation)
●
Progress on divmod optimisation
●
Progress on disabling loop peeling
●
Address sanitizer
4. ●
Shrink-wrapping: move prologue/epilogue inside function body
●
Conditional compare support: short-circuit &&/|| if possible:
●
VRP: helps removing useless sign/zero extensions
●
Divmod: ARM runtime lib contains a routine computing div & mod at the same time
X = a / b; // call div()
Y = a % b; // call mod()
(x,y) = divmod(a,b)
short foo(unsigned char c) {
c = c & (unsigned char)0x0F ;
if (c > 7)
{
return c - 6;
}
return c;
}
foo:
and r0, r0, #15
cmp r0, #7
subhi r0, r0, #6
uxthhi r0, r0
sxth r0, r0
bx lr
foo:
and r0, r0, #15
cmp r0, #7
subhi r0, r0, #6
bx lr
Void test (int a)
{
If (a == 0) return;
….
}
Push {….}
If (a == 0) goto Lx;
…..
Lx:
Pop {…}
return
If (a == 0) return;
Push {…}
….
Pop {…}
Return
If (a == b && c == d) Cmp a,b
Cmpeq c,d
5. ●
Loop peeling: generate out of loop iterations to
make sure the loop body makes aligned
memory accesses for vectorization.
– Mostly useless on ARM which supports unaligned
memory accesses.
●
Address sanitizer: new GCC framework to
identify NULL pointers accesses, invalid
memory references....
6. Next iteration
●
Spec2k analysis
– Comparison with x86
– Looking for hot spots
– Identify and prioritise actions
●
Shrink wrapping improvements
●
Conditional compares
●
Finalize loop peeling improvements
●
Neon intrinsics improvements
●
GCC trunk backports
●
Compiler target hooks audit