SlideShare une entreprise Scribd logo
1  sur  76
Télécharger pour lire hors ligne
“Quantum” Performance Effects
v 3.0; February 2015
Sergey Kuksenko
sergey.kuksenko@oracle.com, @kuksenk0
The following is intended to outline our general product direction. It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in
making purchasing decisions. The development, release, and timing
of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 2/52
Intro
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 3/52
Intro: performance engineering
1. Computer Science → Software Engineering
– Build software to meet functional requirements
– Mostly don’t care about HW and data specifics
– Abstract and composable, “formal science”
2. Performance Engineering
– “Real world strikes back!”
– Exploring complex interactions between hardware, software, and data
– Based on empirical evidence, i.e. “natural science”
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 4/52
Intro: what’s the difference?
architecture vs microarchitecture
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5/52
Intro: what’s the difference?
architecture vs microarchitecture
x86
AMD64(x86-64/Intel64)
ARMv7
....
Nehalem
Sandy Bridge
Bulldozer
Bobcat
Cortex-A9
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5/52
Intro: SUTs1
∙ Intel® Core— i5-4300M [2.6 GHz] 1x2x2
– 𝜇arch: Haswell
– launched: Q4’2013
– OS: Xubuntu 14.04 (64-bits)
1System Under Test
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6/52
Intro: SUTs1
∙ Intel® Core— i5-4300M [2.6 GHz] 1x2x2
– 𝜇arch: Haswell
– launched: Q4’2013
– OS: Xubuntu 14.04 (64-bits)
∙ Samsung Exynos 4412, ARMv7 [1.6 GHz] 1x4x1
– 𝜇arch: Cortex-A9
– launched: 2011
– OS: Linaro 12.11
1System Under Test
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6/52
Intro: SUTs (cont.)
∙ AMD Opteron— 4274HE [2.5 GHz] 2x8x1
– 𝜇arch: Bulldozer/Valencia
– launched: Q4’2011
– OS: Oracle Linux Server release 6.0 (64-bits)
∙ Intel® Xeon® CPU E5-2680 [2.70 GHz] 2x8x2
– 𝜇arch: Sandy Bridge
– launched: Q1’2012
– OS: Oracle Linux Server release 6.3 (64-bits)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 7/52
Intro: JVM
∙ Java HotSpot— “1.8.0_25” 32-bits
∙ Java HotSpot— “1.8.0_25” 64-bits
∙ Java HotSpot— Embedded “1.8.0-ea-b79”
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 8/52
Intro: Demo code
https://github.com/kuksenko/quantum
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 9/52
Intro: Demo code
https://github.com/kuksenko/quantum
∙ Required: JMH (Java Microbenchmark Harness)
– http://openjdk.java.net/projects/code-tools/jmh/
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 9/52
Core
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 10/52
Demo1: double sum
private double [] A = new double [2048];
@Benchmark
public double test1 () {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
public double manualUnroll () {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 11/52
Demo1: double sum
private double [] A = new double [2048];
@Benchmark
public double test1 () {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
public double manualUnroll () {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
426 𝑜𝑝𝑠
𝑚𝑠
1120 𝑜𝑝𝑠
𝑚𝑠
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 11/52
Demo1: looking into asm, test1
loop: vaddsd 0x10(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x18(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x20(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x28(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x30(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x38(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x40(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x48(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x50(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x58(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x60(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x68(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x70(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x78(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x80(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x88(%edi ,%eax ,8),%xmm0 ,%xmm0
add $0x10 ,%eax
cmp %ebx ,%eax
jl loop:
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 12/52
Demo1: looking into asm, manualUnroll
loop: vmovsd 0x48(%eax ,%edx ,8),% xmm0
vmovsd %xmm0 ,(% esp)
vmovsd 0x40(%eax ,%edx ,8),% xmm0
vmovsd %xmm0 ,0x8(%esp)
vmovsd 0x78(%eax ,%edx ,8),% xmm0
vaddsd 0x70(%eax ,%edx ,8),%xmm0 ,%xmm1
vmovsd 0x80(%eax ,%edx ,8),% xmm2
vmovsd 0x88(%eax ,%edx ,8),% xmm0
vmovsd %xmm0 ,0x10(%esp)
vmovsd 0x38(%eax ,%edx ,8),% xmm0
vaddsd 0x30(%eax ,%edx ,8),%xmm0 ,%xmm0
vmovsd %xmm0 ,0x18(%esp)
vmovsd 0x58(%eax ,%edx ,8),% xmm0
vaddsd 0x50(%eax ,%edx ,8),%xmm0 ,%xmm3
vmovsd 0x28(%eax ,%edx ,8),% xmm4
vmovsd 0x60(%eax ,%edx ,8),% xmm5
vmovsd 0x68(%eax ,%edx ,8),% xmm6
vmovsd 0x20(%eax ,%edx ,8),% xmm7
vmovsd 0x18(%eax ,%edx ,8),% xmm0
vaddsd 0x10(%eax ,%edx ,8),%xmm0 ,%xmm0
vaddsd %xmm2 ,%xmm1 ,%xmm1
vaddsd %xmm7 ,%xmm0 ,%xmm0
vaddsd 0x10(%esp),%xmm1 ,%xmm1
vaddsd %xmm4 ,%xmm0 ,%xmm0
vaddsd %xmm5 ,%xmm3 ,%xmm2
vaddsd 0x20(%esp),%xmm0 ,%xmm3
vaddsd %xmm6 ,%xmm2 ,%xmm2
vmovsd 0x18(%esp),%xmm0
vaddsd 0x8(%esp),%xmm0 ,%xmm0
vaddsd (%esp),%xmm0 ,%xmm0
vaddsd %xmm0 ,%xmm3 ,%xmm0
vaddsd %xmm0 ,%xmm2 ,%xmm0
vaddsd %xmm0 ,%xmm1 ,%xmm0
vmovsd %xmm0 ,0x20(%esp)
add $0x10 ,%edx
cmp %ebx ,%edx
jl loop:
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 13/52
Demo1: measure time
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double manualUnroll () {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
Demo1: measure time
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double manualUnroll () {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠
𝑜𝑝
𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠
𝑜𝑝
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
Demo1: measure time
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation (2048)
public double manualUnroll () {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠
𝑜𝑝
𝐶𝑃 𝐼 =∼ 2.5
𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠
𝑜𝑝
𝐶𝑃 𝐼 =∼ 0.5
Cycles Per Instruction
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
𝜇arch: x86
CISC vs RISC
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 15/52
𝜇arch: x86
CISC and RISC
modern x86 CPU is not what it seems
All instructions (CISC) are dynamically translated into RISC-like
microoperations ( 𝜇ops).
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 15/52
𝜇arch: Intel’s internals
http://commons.wikimedia.org/wiki/
File:Intel_Nehalem_arch.svg
(c) Appaloosa, CC BY-SA 3.0
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 16/52
𝜇arch: simplified scheme
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 17/52
𝜇arch: looking into instruction tables2
Operation Latency 1
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
addition (floating-point) 3 1
multiplication (floating-point) 5 0.5
addition (integer) 1 0.25
multiplication (integer) 3 1
2Haswell 𝜇arch
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 18/52
Demo1: test1, looking into asm again
loop: vaddsd 0x10(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x18(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x20(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x28(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x30(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x38(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x40(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x48(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x50(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x58(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x60(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x68(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x70(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x78(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x80(%edi ,%eax ,8),%xmm0 ,%xmm0
vaddsd 0x88(%edi ,%eax ,8),%xmm0 ,%xmm0
add $0x10 ,%eax
cmp %ebx ,%eax
jl loop:
𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠
𝑜𝑝
𝐶𝑃 𝐼 =∼ 2.5
∼ 3 𝑐𝑦𝑐𝑙𝑒𝑠
𝑜𝑝
𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 16
19 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 19/52
Demo1: test1, structural view
𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠
𝑜𝑝
𝐶𝑃 𝐼 =∼ 2.5
∼ 3 𝑐𝑦𝑐𝑙𝑒𝑠
𝑜𝑝
𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 16
19 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 20/52
Demo1: manualUnroll
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 21/52
Demo1: manualUnroll, structural view
𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠
𝑜𝑝
𝐶𝑃 𝐼 =∼ 0.5
∼ 1.14 𝑐𝑦𝑐𝑙𝑒𝑠
𝑜𝑝
𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 4 * 4
37 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 22/52
𝜇arch: Dependences
Performance ILP
3
of many programs is limited by natural data
dependencies.
3Instruction Level Parallelism
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 23/52
𝜇arch: Dependences
Performance ILP
3
of many programs is limited by natural data
dependencies.
What to do?
Break Dependency Chains!
3Instruction Level Parallelism
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 23/52
Demo1(cont.): breaking chains in a “right” way
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
Demo1(cont.): breaking chains in a “right” way
...
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
Demo1(cont.): breaking chains in a “right” way
...
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
...
for (int i = 0; i < A.length; i += 2) {
sum0 += A[i];
sum1 += A[i + 1];
}
return sum0 + sum1;
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
Demo1(cont.): breaking chains in a “right” way
...
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
...
for (int i = 0; i < A.length; i += 2) {
sum0 += A[i];
sum1 += A[i + 1];
}
return sum0 + sum1;
...
for (int i = 0; i < array.length; i += 4) {
sum0 += A[i];
sum1 += A[i + 1];
sum2 += A[i + 2];
sum3 += A[i + 3];
}
return (sum0 + sum1) + (sum2 + sum3);
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
Demo1(cont.): double sum final results
Haswell AMD ARM
manualUnroll 0.44 0.45 3.30
test1 1.15 1.50 6.60
test2 0.58 0.80 4.25
test4 0.39 0.43 4.25
test8 0.39 0.25 2.55
time, ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 25/52
Demo2: results
Haswell AMD ARM
DoubleMul.test1 2.84 2.52 8.17
DoubleMul.test2 2.50 2.37 4.25
DoubleMul.test4 0.48 0.49 3.15
DoubleMul.test8 0.25 0.30 2.53
IntMul.test1 1.14 1.16 10.04
IntMul.test2 0.58 0.75 7.38
IntMul.test4 0.38 0.67 4.64
IntSum.test1 0.39 0.32 8.92
IntSum.test2 0.24 0.48 6.12
time, ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 26/52
Branches: to jump or not to jump
public int absSumBranch(int a[]) {
int sum = 0;
for (int x : a) {
if (x < 0) {
sum -= x;
} else {
sum += x;
}
}
return sum;
}
loop: mov 0xc(%ecx ,%ebp ,4),%ebx
test %ebx ,%ebx
jl L1
add %ebx ,%eax
jmp L2
L1: sub %ebx ,%eax
L2: inc %ebp
cmp %edx ,%ebp
jl loop
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 27/52
Branches: to jump or not to jump
public int absSumPredicated(int a[]) {
int sum = 0;
for (int x : a) {
sum += Math.abs(x);
}
return sum;
}
loop: mov 0xc(%ecx ,%ebp ,4),%ebx
mov %ebx ,%esi
neg %esi
test %ebx ,%ebx
cmovl %esi ,%ebx
add %ebx ,%eax
inc %ebp
cmp %edx ,%ebp
jl Loop
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 28/52
Demo3: results
Regular Pattern = (+, –)*
Nehalem Haswell AMD ARM
branch_sorted 0.9 0.5 1.0 5.0
branch_regular 0.9 0.5 0.8 5.0
branch_shuffled 6.4 1.0 2.8 9.4
predicated_sorted 1.3 0.8 0.9 5.6
predicated_regular 1.3 0.8 0.9 5.3
predicated_shuffled 1.3 0.8 0.9 9.3
time, ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 29/52
Demo3: results
Regular Pattern = (+, +, –, +, –, –, +, –, –, +)*
Nehalem Haswell AMD ARM
branch_sorted 0.9 0.5 1.0 5.0
branch_regular 1.6 0.9 1.0 5.0
branch_shuffled 6.4 1.0 2.3 9.5
predicated_sorted 1.3 0.8 0.9 5.6
predicated_regular 1.3 0.8 0.9 5.3
predicated_shuffled 1.3 0.8 0.9 9.3
time, ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 30/52
Demo4: && vs &
public int countConditional(boolean [] f0 , boolean [] f1) {
int cnt = 0;
for (int j = 0; j < SIZE; j++) {
for (int i = 0; i < SIZE; i++) {
if (f0[i] && f1[j]) {
cnt ++;
}
}
}
return cnt;
}
&&
shuffled 1.8 ns/op
sorted 0.6 ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 31/52
Demo4: && vs &
public int countLogical(boolean [] f0 , boolean [] f1) {
int cnt = 0;
for (int j = 0; j < SIZE; j++) {
for (int i = 0; i < SIZE; i++) {
if (f0[i] & f1[j]) {
cnt ++;
}
}
}
return cnt;
}
&&
shuffled 1.8 ns/op
sorted 0.6 ns/op
&
shuffled 1.2 ns/op
sorted 1.2 ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 32/52
Demo5: interface invocation cost
public interface I { public int amount (); }
...
public class C0 implements I { public int amount (){ return 0; } }
public class C1 implements I { public int amount (){ return 1; } }
public class C2 implements I { public int amount (){ return 2; } }
public class C3 implements I { public int amount (){ return 3; } }
...
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(SIZE)
public int sum(I[] a) {
int s = 0;
for (I i : a) {
s += i.amount ();
}
return s;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 33/52
Demo5: results
1 target 2 targets 3 targets 4 targets
sorted 0.8 0.8 4.9 5.0
regular 0.8 4.9 5.0
shuffled 1.0 17.5 19.1
time, ns/op
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 34/52
Not a Real Core
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 35/52
Not a Real Core: HW Multithreading
∙ Simultaneous multithreading, SMT
e.g. Intel® Hyper-Threading Technology
∙ Fine-grained temporal multithreading
e.g. CMT, Sun/Oracle ULTRASparc T1, T2, T3, T4, T5 ...
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 36/52
Back to Demo1: Execution Units Saturation
1 thread 2 threads 2 threads 4 threads
-cpu 1,3 -cpu 2,3
DoubleSum.test1 426 850 840 1660
DoubleSum.test2 845 1690 1260 2500
DoubleSum.test4 1260 2513 1260 2520
DoubleSum.manualUnroll 1120 2240 1260 2504
overall throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
Back to Demo1: Execution Units Saturation
1 thread 2 threads 2 threads 4 threads
-cpu 1,3 -cpu 2,3
DoubleSum.test1 426 850 840 1660
DoubleSum.test2 845 1690 1260 2500
DoubleSum.test4 1260 2513 1260 2520
DoubleSum.manualUnroll 1120 2240 1260 2504
overall throughput, ops/ms
Max single core throughput
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
Back to Demo1: Execution Units Saturation
1 thread 2 threads 2 threads 4 threads
-cpu 1,3 -cpu 2,3
DoubleSum.test1 426 850 840 1660
DoubleSum.test2 845 1690 1260 2500
DoubleSum.test4 1260 2513 1260 2520
DoubleSum.manualUnroll 1120 2240 1260 2504
overall throughput, ops/ms
Max system throughput
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
Demo6: Map.get()
private Map <Integer , Integer > jdk_map;
private int[] keys;
@Benchmark
public int testJdkPrimitive () {
int s = 0;
for (int key : keys) {
s += jdk_map.get(key);
}
return s;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 38/52
Demo6: Map.get()
private Map <Integer , Integer > jdk_map;
private Integer [] boxedKeys;
@Benchmark
public int testJdkBoxed () {
int s = 0;
for (Integer key : boxedKeys) {
s += jdk_map.get(key);
}
return s;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 39/52
Demo6: Map.get()
private TIntIntMap third_party_map;
private int[] keys;
@Benchmark
public int test3dParty () {
int s = 0;
for (int key : keys) {
s += third_party_map.get(key);
}
return s;
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 40/52
Demo6: Map.get() results
-cpu 1
JdkPrimitive
JdkBoxed
3dParty
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1
JdkPrimitive 47
JdkBoxed
3dParty
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1
JdkPrimitive 47
JdkBoxed 71
3dParty
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1
JdkPrimitive 47
JdkBoxed 71
3dParty 74
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3
JdkPrimitive 47
JdkBoxed 71
3dParty 74
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3
JdkPrimitive 47 (25, 25)
JdkBoxed 71
3dParty 74
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3
JdkPrimitive 47 (25, 25)
JdkBoxed 71 (40, 40)
3dParty 74
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3
JdkPrimitive 47 (25, 25)
JdkBoxed 71 (40, 40)
3dParty 74 (43, 43)
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3 -cpu 2
(?) on -cpu 3
JdkPrimitive 47 (25, 25)
JdkBoxed 71 (40, 40)
3dParty 74 (43, 43)
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3 -cpu 2
(?) on -cpu 3
JdkPrimitive 47 (25, 25) (30, ?)
JdkBoxed 71 (40, 40)
3dParty 74 (43, 43)
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3 -cpu 2
(?) on -cpu 3
JdkPrimitive 47 (25, 25) (30, ?)
JdkBoxed 71 (40, 40) (50, ?)
3dParty 74 (43, 43)
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Map.get() results
-cpu 1 -cpu 2,3 -cpu 2
(?) on -cpu 3
JdkPrimitive 47 (25, 25) (30, ?)
JdkBoxed 71 (40, 40) (50, ?)
3dParty 74 (43, 43) (16, ?)
throughput, ops/ms
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
Demo6: Hyper.troll()
public static double d0;
public static double d1;
public static double d2;
@Benchmark
@OperationsPerInvocation (5)
public double troll () {
return (d0 / d2) / ((d1 / d2) / (d0 / d1));
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 42/52
Demo6: division results on Haswell
1 thread
int 250
double 180
throughput, ops/ 𝜇s
-cpu 1,3 -cpu 2,3 -cpu 3
(int, int) (250, 250) (125, 125) (125, 125)
(double, double) (180, 180) (90, 90) (90, 90)
(double, int) (180, 250) (150, 57) (90, 125)
throughput, ops/ 𝜇s
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 43/52
Demo6: division results on AMD
1 thread
int 128
double 300
throughput, ops/ 𝜇s
-cpu 0,1 -cpu 0,2 -cpu 0,8 -cpu 0
(int, int) (92, 92) (128, 128) (128, 128) (64, 64)
(double, double) (150, 150) (300, 300) (300, 300) (150, 150)
(double, int) (280, 120) (290, 128) (300, 128) (120, 64)
throughput, ops/ 𝜇s
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 44/52
Conclusion
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 45/52
Enlarge your knowledge with these simple
tricks!
Reading list:
∙ “Computer Architecture: A Quantitative Approach”
John L. Hennessy, David A. Patterson
∙ CPU vendors documentation
∙ http://www.agner.org/optimize/
∙ etc.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 46/52
Thanks!
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 47/52
Q & A ?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 48/52
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 49/52
Appendix: Frequency Variance
∙ Dynamic CPU Frequency
– TurboBoost and similar
"The processor must be working in the power, temperature, and
specification limits of the thermal design power (TDP)."©Intel
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 50/52
Appendix: TurboBoost in action
max normal freq.
measured freq.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 51/52
Appendix: Set Fixed Frequency!
e.g.
cpufreq-set -u 2600000 -g performance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 52/52

Contenu connexe

Tendances

Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemGuardSquare
 
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itAlexey Fyodorov
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKGuardSquare
 
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTechTalks
 
ProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and TricksProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and Tricksnetomi
 
Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemGuardSquare
 
20181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday201820181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday2018STAMP Project
 
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleKernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleAnne Nicolas
 
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeKotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeVíctor Leonel Orozco López
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Jan Wedekind
 
Appium Automation with Kotlin
Appium Automation with KotlinAppium Automation with Kotlin
Appium Automation with KotlinRapidValue
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoinknight1128
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmAlexey Fyodorov
 
Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Raimon Ràfols
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKGuardSquare
 
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...Codemotion
 
A comparison of apache spark supervised machine learning algorithms for dna s...
A comparison of apache spark supervised machine learning algorithms for dna s...A comparison of apache spark supervised machine learning algorithms for dna s...
A comparison of apache spark supervised machine learning algorithms for dna s...Valerio Morfino
 

Tendances (20)

Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build system
 
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need it
 
First fare 2010 java-beta-2011
First fare 2010 java-beta-2011First fare 2010 java-beta-2011
First fare 2010 java-beta-2011
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
 
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
 
ProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and TricksProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and Tricks
 
Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build system
 
20181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday201820181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday2018
 
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleKernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
 
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeKotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
 
Completable future
Completable futureCompletable future
Completable future
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008
 
Appium Automation with Kotlin
Appium Automation with KotlinAppium Automation with Kotlin
Appium Automation with Kotlin
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoin
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithm
 
Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
 
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...
Fabio Collini - Async code on Kotlin: RxJava or/and coroutines - Codemotion M...
 
A comparison of apache spark supervised machine learning algorithms for dna s...
A comparison of apache spark supervised machine learning algorithms for dna s...A comparison of apache spark supervised machine learning algorithms for dna s...
A comparison of apache spark supervised machine learning algorithms for dna s...
 

En vedette

Железные счётчики на страже производительности
Железные счётчики на страже производительностиЖелезные счётчики на страже производительности
Железные счётчики на страже производительностиSergey Kuksenko
 
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...MarcinStachniuk
 
Zarządzanie zamianami w relacyjnych bazach danych
Zarządzanie zamianami w relacyjnych bazach danychZarządzanie zamianami w relacyjnych bazach danych
Zarządzanie zamianami w relacyjnych bazach danychMarcinStachniuk
 
Liquibase - Zarządzanie zmianami w relacyjnych bazach danych
Liquibase - Zarządzanie zmianami w relacyjnych bazach danychLiquibase - Zarządzanie zmianami w relacyjnych bazach danych
Liquibase - Zarządzanie zmianami w relacyjnych bazach danychMarcinStachniuk
 
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...MarcinStachniuk
 
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.MarcinStachniuk
 
JPoint 2016 - Валеев Тагир - Странности Stream API
JPoint 2016 - Валеев Тагир - Странности Stream APIJPoint 2016 - Валеев Тагир - Странности Stream API
JPoint 2016 - Валеев Тагир - Странности Stream APItvaleev
 
Stream API: рекомендации лучших собаководов
Stream API: рекомендации лучших собаководовStream API: рекомендации лучших собаководов
Stream API: рекомендации лучших собаководовtvaleev
 
Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Baruch Sadogursky
 

En vedette (11)

Железные счётчики на страже производительности
Железные счётчики на страже производительностиЖелезные счётчики на страже производительности
Железные счётчики на страже производительности
 
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
 
Zarządzanie zamianami w relacyjnych bazach danych
Zarządzanie zamianami w relacyjnych bazach danychZarządzanie zamianami w relacyjnych bazach danych
Zarządzanie zamianami w relacyjnych bazach danych
 
Liquibase - Zarządzanie zmianami w relacyjnych bazach danych
Liquibase - Zarządzanie zmianami w relacyjnych bazach danychLiquibase - Zarządzanie zmianami w relacyjnych bazach danych
Liquibase - Zarządzanie zmianami w relacyjnych bazach danych
 
Retrospekcja warsztat Agile3M
Retrospekcja warsztat Agile3MRetrospekcja warsztat Agile3M
Retrospekcja warsztat Agile3M
 
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
Poznaj lepiej swoje srodowisko programistyczne i zwieksz swoja produktywnosc ...
 
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.
Upieksz swoje testy! Testowanie jednostkowe dla sredniozaawansowanych.
 
JPoint 2016 - Валеев Тагир - Странности Stream API
JPoint 2016 - Валеев Тагир - Странности Stream APIJPoint 2016 - Валеев Тагир - Странности Stream API
JPoint 2016 - Валеев Тагир - Странности Stream API
 
Stream API: рекомендации лучших собаководов
Stream API: рекомендации лучших собаководовStream API: рекомендации лучших собаководов
Stream API: рекомендации лучших собаководов
 
Javaland keynote final
Javaland keynote finalJavaland keynote final
Javaland keynote final
 
Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017
 

Similaire à "Quantum" Performance Effects

HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...Yuji Kubota
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllThomas Wuerthinger
 
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)Shing Wai Chan
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Alexandre (Shura) Iline
 
GlassFish BOF
GlassFish BOFGlassFish BOF
GlassFish BOFglassfish
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Dmitry Chuyko
 
Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Revelation Technologies
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the CoreC4Media
 
Batch Applications for the Java Platform
Batch Applications for the Java PlatformBatch Applications for the Java Platform
Batch Applications for the Java PlatformSivakumar Thyagarajan
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaMayank Prasad
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouDavid Delabassee
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018CodeOps Technologies LLP
 
Introduction to JavaFX on Raspberry Pi
Introduction to JavaFX on Raspberry PiIntroduction to JavaFX on Raspberry Pi
Introduction to JavaFX on Raspberry PiBruno Borges
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuVMware Tanzu
 
Project Helidon Overview (Japanese)
Project Helidon Overview (Japanese)Project Helidon Overview (Japanese)
Project Helidon Overview (Japanese)Logico
 

Similaire à "Quantum" Performance Effects (20)

HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
 
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)
JavaOne San Francisco 2013 - Servlet 3.1 (JSR 340)
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.
 
GlassFish BOF
GlassFish BOFGlassFish BOF
GlassFish BOF
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
 
Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
 
Batch Applications for the Java Platform
Batch Applications for the Java PlatformBatch Applications for the Java Platform
Batch Applications for the Java Platform
 
Java Cloud and Container Ready
Java Cloud and Container ReadyJava Cloud and Container Ready
Java Cloud and Container Ready
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance Schema
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
 
JavaMicroBenchmarkpptm
JavaMicroBenchmarkpptmJavaMicroBenchmarkpptm
JavaMicroBenchmarkpptm
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018
 
MySQL Quick Dive
MySQL Quick DiveMySQL Quick Dive
MySQL Quick Dive
 
MySQL JSON Functions
MySQL JSON FunctionsMySQL JSON Functions
MySQL JSON Functions
 
Introduction to JavaFX on Raspberry Pi
Introduction to JavaFX on Raspberry PiIntroduction to JavaFX on Raspberry Pi
Introduction to JavaFX on Raspberry Pi
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFu
 
Project Helidon Overview (Japanese)
Project Helidon Overview (Japanese)Project Helidon Overview (Japanese)
Project Helidon Overview (Japanese)
 

Dernier

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

"Quantum" Performance Effects

  • 1. “Quantum” Performance Effects v 3.0; February 2015 Sergey Kuksenko sergey.kuksenko@oracle.com, @kuksenk0
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 2/52
  • 3. Intro Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 3/52
  • 4. Intro: performance engineering 1. Computer Science → Software Engineering – Build software to meet functional requirements – Mostly don’t care about HW and data specifics – Abstract and composable, “formal science” 2. Performance Engineering – “Real world strikes back!” – Exploring complex interactions between hardware, software, and data – Based on empirical evidence, i.e. “natural science” Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 4/52
  • 5. Intro: what’s the difference? architecture vs microarchitecture Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5/52
  • 6. Intro: what’s the difference? architecture vs microarchitecture x86 AMD64(x86-64/Intel64) ARMv7 .... Nehalem Sandy Bridge Bulldozer Bobcat Cortex-A9 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5/52
  • 7. Intro: SUTs1 ∙ Intel® Core— i5-4300M [2.6 GHz] 1x2x2 – 𝜇arch: Haswell – launched: Q4’2013 – OS: Xubuntu 14.04 (64-bits) 1System Under Test Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6/52
  • 8. Intro: SUTs1 ∙ Intel® Core— i5-4300M [2.6 GHz] 1x2x2 – 𝜇arch: Haswell – launched: Q4’2013 – OS: Xubuntu 14.04 (64-bits) ∙ Samsung Exynos 4412, ARMv7 [1.6 GHz] 1x4x1 – 𝜇arch: Cortex-A9 – launched: 2011 – OS: Linaro 12.11 1System Under Test Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6/52
  • 9. Intro: SUTs (cont.) ∙ AMD Opteron— 4274HE [2.5 GHz] 2x8x1 – 𝜇arch: Bulldozer/Valencia – launched: Q4’2011 – OS: Oracle Linux Server release 6.0 (64-bits) ∙ Intel® Xeon® CPU E5-2680 [2.70 GHz] 2x8x2 – 𝜇arch: Sandy Bridge – launched: Q1’2012 – OS: Oracle Linux Server release 6.3 (64-bits) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 7/52
  • 10. Intro: JVM ∙ Java HotSpot— “1.8.0_25” 32-bits ∙ Java HotSpot— “1.8.0_25” 64-bits ∙ Java HotSpot— Embedded “1.8.0-ea-b79” Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 8/52
  • 11. Intro: Demo code https://github.com/kuksenko/quantum Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 9/52
  • 12. Intro: Demo code https://github.com/kuksenko/quantum ∙ Required: JMH (Java Microbenchmark Harness) – http://openjdk.java.net/projects/code-tools/jmh/ Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 9/52
  • 13. Core Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 10/52
  • 14. Demo1: double sum private double [] A = new double [2048]; @Benchmark public double test1 () { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark public double manualUnroll () { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 11/52
  • 15. Demo1: double sum private double [] A = new double [2048]; @Benchmark public double test1 () { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark public double manualUnroll () { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } 426 𝑜𝑝𝑠 𝑚𝑠 1120 𝑜𝑝𝑠 𝑚𝑠 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 11/52
  • 16. Demo1: looking into asm, test1 loop: vaddsd 0x10(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x18(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x20(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x28(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x30(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x38(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x40(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x48(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x50(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x58(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x60(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x68(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x70(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x78(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x80(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x88(%edi ,%eax ,8),%xmm0 ,%xmm0 add $0x10 ,%eax cmp %ebx ,%eax jl loop: Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 12/52
  • 17. Demo1: looking into asm, manualUnroll loop: vmovsd 0x48(%eax ,%edx ,8),% xmm0 vmovsd %xmm0 ,(% esp) vmovsd 0x40(%eax ,%edx ,8),% xmm0 vmovsd %xmm0 ,0x8(%esp) vmovsd 0x78(%eax ,%edx ,8),% xmm0 vaddsd 0x70(%eax ,%edx ,8),%xmm0 ,%xmm1 vmovsd 0x80(%eax ,%edx ,8),% xmm2 vmovsd 0x88(%eax ,%edx ,8),% xmm0 vmovsd %xmm0 ,0x10(%esp) vmovsd 0x38(%eax ,%edx ,8),% xmm0 vaddsd 0x30(%eax ,%edx ,8),%xmm0 ,%xmm0 vmovsd %xmm0 ,0x18(%esp) vmovsd 0x58(%eax ,%edx ,8),% xmm0 vaddsd 0x50(%eax ,%edx ,8),%xmm0 ,%xmm3 vmovsd 0x28(%eax ,%edx ,8),% xmm4 vmovsd 0x60(%eax ,%edx ,8),% xmm5 vmovsd 0x68(%eax ,%edx ,8),% xmm6 vmovsd 0x20(%eax ,%edx ,8),% xmm7 vmovsd 0x18(%eax ,%edx ,8),% xmm0 vaddsd 0x10(%eax ,%edx ,8),%xmm0 ,%xmm0 vaddsd %xmm2 ,%xmm1 ,%xmm1 vaddsd %xmm7 ,%xmm0 ,%xmm0 vaddsd 0x10(%esp),%xmm1 ,%xmm1 vaddsd %xmm4 ,%xmm0 ,%xmm0 vaddsd %xmm5 ,%xmm3 ,%xmm2 vaddsd 0x20(%esp),%xmm0 ,%xmm3 vaddsd %xmm6 ,%xmm2 ,%xmm2 vmovsd 0x18(%esp),%xmm0 vaddsd 0x8(%esp),%xmm0 ,%xmm0 vaddsd (%esp),%xmm0 ,%xmm0 vaddsd %xmm0 ,%xmm3 ,%xmm0 vaddsd %xmm0 ,%xmm2 ,%xmm0 vaddsd %xmm0 ,%xmm1 ,%xmm0 vmovsd %xmm0 ,0x20(%esp) add $0x10 ,%edx cmp %ebx ,%edx jl loop: Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 13/52
  • 18. Demo1: measure time @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double manualUnroll () { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
  • 19. Demo1: measure time @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double manualUnroll () { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } 𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠 𝑜𝑝 𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠 𝑜𝑝 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
  • 20. Demo1: measure time @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation (2048) public double manualUnroll () { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } 𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠 𝑜𝑝 𝐶𝑃 𝐼 =∼ 2.5 𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠 𝑜𝑝 𝐶𝑃 𝐼 =∼ 0.5 Cycles Per Instruction Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14/52
  • 21. 𝜇arch: x86 CISC vs RISC Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 15/52
  • 22. 𝜇arch: x86 CISC and RISC modern x86 CPU is not what it seems All instructions (CISC) are dynamically translated into RISC-like microoperations ( 𝜇ops). Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 15/52
  • 23. 𝜇arch: Intel’s internals http://commons.wikimedia.org/wiki/ File:Intel_Nehalem_arch.svg (c) Appaloosa, CC BY-SA 3.0 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 16/52
  • 24. 𝜇arch: simplified scheme Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 17/52
  • 25. 𝜇arch: looking into instruction tables2 Operation Latency 1 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 addition (floating-point) 3 1 multiplication (floating-point) 5 0.5 addition (integer) 1 0.25 multiplication (integer) 3 1 2Haswell 𝜇arch Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 18/52
  • 26. Demo1: test1, looking into asm again loop: vaddsd 0x10(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x18(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x20(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x28(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x30(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x38(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x40(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x48(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x50(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x58(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x60(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x68(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x70(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x78(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x80(%edi ,%eax ,8),%xmm0 ,%xmm0 vaddsd 0x88(%edi ,%eax ,8),%xmm0 ,%xmm0 add $0x10 ,%eax cmp %ebx ,%eax jl loop: 𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠 𝑜𝑝 𝐶𝑃 𝐼 =∼ 2.5 ∼ 3 𝑐𝑦𝑐𝑙𝑒𝑠 𝑜𝑝 𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 16 19 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 19/52
  • 27. Demo1: test1, structural view 𝑡𝑖𝑚𝑒 = 1.15 𝑛𝑠 𝑜𝑝 𝐶𝑃 𝐼 =∼ 2.5 ∼ 3 𝑐𝑦𝑐𝑙𝑒𝑠 𝑜𝑝 𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 16 19 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 20/52
  • 28. Demo1: manualUnroll Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 21/52
  • 29. Demo1: manualUnroll, structural view 𝑡𝑖𝑚𝑒 = 0.44 𝑛𝑠 𝑜𝑝 𝐶𝑃 𝐼 =∼ 0.5 ∼ 1.14 𝑐𝑦𝑐𝑙𝑒𝑠 𝑜𝑝 𝑢𝑛𝑟𝑜𝑙𝑙𝑒𝑑 𝑏𝑦 4 * 4 37 𝑖𝑛𝑠𝑡𝑟𝑢𝑠𝑡𝑖𝑜𝑛𝑠 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 22/52
  • 30. 𝜇arch: Dependences Performance ILP 3 of many programs is limited by natural data dependencies. 3Instruction Level Parallelism Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 23/52
  • 31. 𝜇arch: Dependences Performance ILP 3 of many programs is limited by natural data dependencies. What to do? Break Dependency Chains! 3Instruction Level Parallelism Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 23/52
  • 32. Demo1(cont.): breaking chains in a “right” way Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
  • 33. Demo1(cont.): breaking chains in a “right” way ... for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
  • 34. Demo1(cont.): breaking chains in a “right” way ... for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; ... for (int i = 0; i < A.length; i += 2) { sum0 += A[i]; sum1 += A[i + 1]; } return sum0 + sum1; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
  • 35. Demo1(cont.): breaking chains in a “right” way ... for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; ... for (int i = 0; i < A.length; i += 2) { sum0 += A[i]; sum1 += A[i + 1]; } return sum0 + sum1; ... for (int i = 0; i < array.length; i += 4) { sum0 += A[i]; sum1 += A[i + 1]; sum2 += A[i + 2]; sum3 += A[i + 3]; } return (sum0 + sum1) + (sum2 + sum3); Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 24/52
  • 36. Demo1(cont.): double sum final results Haswell AMD ARM manualUnroll 0.44 0.45 3.30 test1 1.15 1.50 6.60 test2 0.58 0.80 4.25 test4 0.39 0.43 4.25 test8 0.39 0.25 2.55 time, ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 25/52
  • 37. Demo2: results Haswell AMD ARM DoubleMul.test1 2.84 2.52 8.17 DoubleMul.test2 2.50 2.37 4.25 DoubleMul.test4 0.48 0.49 3.15 DoubleMul.test8 0.25 0.30 2.53 IntMul.test1 1.14 1.16 10.04 IntMul.test2 0.58 0.75 7.38 IntMul.test4 0.38 0.67 4.64 IntSum.test1 0.39 0.32 8.92 IntSum.test2 0.24 0.48 6.12 time, ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 26/52
  • 38. Branches: to jump or not to jump public int absSumBranch(int a[]) { int sum = 0; for (int x : a) { if (x < 0) { sum -= x; } else { sum += x; } } return sum; } loop: mov 0xc(%ecx ,%ebp ,4),%ebx test %ebx ,%ebx jl L1 add %ebx ,%eax jmp L2 L1: sub %ebx ,%eax L2: inc %ebp cmp %edx ,%ebp jl loop Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 27/52
  • 39. Branches: to jump or not to jump public int absSumPredicated(int a[]) { int sum = 0; for (int x : a) { sum += Math.abs(x); } return sum; } loop: mov 0xc(%ecx ,%ebp ,4),%ebx mov %ebx ,%esi neg %esi test %ebx ,%ebx cmovl %esi ,%ebx add %ebx ,%eax inc %ebp cmp %edx ,%ebp jl Loop Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 28/52
  • 40. Demo3: results Regular Pattern = (+, –)* Nehalem Haswell AMD ARM branch_sorted 0.9 0.5 1.0 5.0 branch_regular 0.9 0.5 0.8 5.0 branch_shuffled 6.4 1.0 2.8 9.4 predicated_sorted 1.3 0.8 0.9 5.6 predicated_regular 1.3 0.8 0.9 5.3 predicated_shuffled 1.3 0.8 0.9 9.3 time, ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 29/52
  • 41. Demo3: results Regular Pattern = (+, +, –, +, –, –, +, –, –, +)* Nehalem Haswell AMD ARM branch_sorted 0.9 0.5 1.0 5.0 branch_regular 1.6 0.9 1.0 5.0 branch_shuffled 6.4 1.0 2.3 9.5 predicated_sorted 1.3 0.8 0.9 5.6 predicated_regular 1.3 0.8 0.9 5.3 predicated_shuffled 1.3 0.8 0.9 9.3 time, ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 30/52
  • 42. Demo4: && vs & public int countConditional(boolean [] f0 , boolean [] f1) { int cnt = 0; for (int j = 0; j < SIZE; j++) { for (int i = 0; i < SIZE; i++) { if (f0[i] && f1[j]) { cnt ++; } } } return cnt; } && shuffled 1.8 ns/op sorted 0.6 ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 31/52
  • 43. Demo4: && vs & public int countLogical(boolean [] f0 , boolean [] f1) { int cnt = 0; for (int j = 0; j < SIZE; j++) { for (int i = 0; i < SIZE; i++) { if (f0[i] & f1[j]) { cnt ++; } } } return cnt; } && shuffled 1.8 ns/op sorted 0.6 ns/op & shuffled 1.2 ns/op sorted 1.2 ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 32/52
  • 44. Demo5: interface invocation cost public interface I { public int amount (); } ... public class C0 implements I { public int amount (){ return 0; } } public class C1 implements I { public int amount (){ return 1; } } public class C2 implements I { public int amount (){ return 2; } } public class C3 implements I { public int amount (){ return 3; } } ... @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(SIZE) public int sum(I[] a) { int s = 0; for (I i : a) { s += i.amount (); } return s; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 33/52
  • 45. Demo5: results 1 target 2 targets 3 targets 4 targets sorted 0.8 0.8 4.9 5.0 regular 0.8 4.9 5.0 shuffled 1.0 17.5 19.1 time, ns/op Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 34/52
  • 46. Not a Real Core Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 35/52
  • 47. Not a Real Core: HW Multithreading ∙ Simultaneous multithreading, SMT e.g. Intel® Hyper-Threading Technology ∙ Fine-grained temporal multithreading e.g. CMT, Sun/Oracle ULTRASparc T1, T2, T3, T4, T5 ... Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 36/52
  • 48. Back to Demo1: Execution Units Saturation 1 thread 2 threads 2 threads 4 threads -cpu 1,3 -cpu 2,3 DoubleSum.test1 426 850 840 1660 DoubleSum.test2 845 1690 1260 2500 DoubleSum.test4 1260 2513 1260 2520 DoubleSum.manualUnroll 1120 2240 1260 2504 overall throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
  • 49. Back to Demo1: Execution Units Saturation 1 thread 2 threads 2 threads 4 threads -cpu 1,3 -cpu 2,3 DoubleSum.test1 426 850 840 1660 DoubleSum.test2 845 1690 1260 2500 DoubleSum.test4 1260 2513 1260 2520 DoubleSum.manualUnroll 1120 2240 1260 2504 overall throughput, ops/ms Max single core throughput Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
  • 50. Back to Demo1: Execution Units Saturation 1 thread 2 threads 2 threads 4 threads -cpu 1,3 -cpu 2,3 DoubleSum.test1 426 850 840 1660 DoubleSum.test2 845 1690 1260 2500 DoubleSum.test4 1260 2513 1260 2520 DoubleSum.manualUnroll 1120 2240 1260 2504 overall throughput, ops/ms Max system throughput Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 37/52
  • 51. Demo6: Map.get() private Map <Integer , Integer > jdk_map; private int[] keys; @Benchmark public int testJdkPrimitive () { int s = 0; for (int key : keys) { s += jdk_map.get(key); } return s; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 38/52
  • 52. Demo6: Map.get() private Map <Integer , Integer > jdk_map; private Integer [] boxedKeys; @Benchmark public int testJdkBoxed () { int s = 0; for (Integer key : boxedKeys) { s += jdk_map.get(key); } return s; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 39/52
  • 53. Demo6: Map.get() private TIntIntMap third_party_map; private int[] keys; @Benchmark public int test3dParty () { int s = 0; for (int key : keys) { s += third_party_map.get(key); } return s; } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 40/52
  • 54. Demo6: Map.get() results -cpu 1 JdkPrimitive JdkBoxed 3dParty throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 55. Demo6: Map.get() results -cpu 1 JdkPrimitive 47 JdkBoxed 3dParty throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 56. Demo6: Map.get() results -cpu 1 JdkPrimitive 47 JdkBoxed 71 3dParty throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 57. Demo6: Map.get() results -cpu 1 JdkPrimitive 47 JdkBoxed 71 3dParty 74 throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 58. Demo6: Map.get() results -cpu 1 -cpu 2,3 JdkPrimitive 47 JdkBoxed 71 3dParty 74 throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 59. Demo6: Map.get() results -cpu 1 -cpu 2,3 JdkPrimitive 47 (25, 25) JdkBoxed 71 3dParty 74 throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 60. Demo6: Map.get() results -cpu 1 -cpu 2,3 JdkPrimitive 47 (25, 25) JdkBoxed 71 (40, 40) 3dParty 74 throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 61. Demo6: Map.get() results -cpu 1 -cpu 2,3 JdkPrimitive 47 (25, 25) JdkBoxed 71 (40, 40) 3dParty 74 (43, 43) throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 62. Demo6: Map.get() results -cpu 1 -cpu 2,3 -cpu 2 (?) on -cpu 3 JdkPrimitive 47 (25, 25) JdkBoxed 71 (40, 40) 3dParty 74 (43, 43) throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 63. Demo6: Map.get() results -cpu 1 -cpu 2,3 -cpu 2 (?) on -cpu 3 JdkPrimitive 47 (25, 25) (30, ?) JdkBoxed 71 (40, 40) 3dParty 74 (43, 43) throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 64. Demo6: Map.get() results -cpu 1 -cpu 2,3 -cpu 2 (?) on -cpu 3 JdkPrimitive 47 (25, 25) (30, ?) JdkBoxed 71 (40, 40) (50, ?) 3dParty 74 (43, 43) throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 65. Demo6: Map.get() results -cpu 1 -cpu 2,3 -cpu 2 (?) on -cpu 3 JdkPrimitive 47 (25, 25) (30, ?) JdkBoxed 71 (40, 40) (50, ?) 3dParty 74 (43, 43) (16, ?) throughput, ops/ms Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 41/52
  • 66. Demo6: Hyper.troll() public static double d0; public static double d1; public static double d2; @Benchmark @OperationsPerInvocation (5) public double troll () { return (d0 / d2) / ((d1 / d2) / (d0 / d1)); } Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 42/52
  • 67. Demo6: division results on Haswell 1 thread int 250 double 180 throughput, ops/ 𝜇s -cpu 1,3 -cpu 2,3 -cpu 3 (int, int) (250, 250) (125, 125) (125, 125) (double, double) (180, 180) (90, 90) (90, 90) (double, int) (180, 250) (150, 57) (90, 125) throughput, ops/ 𝜇s Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 43/52
  • 68. Demo6: division results on AMD 1 thread int 128 double 300 throughput, ops/ 𝜇s -cpu 0,1 -cpu 0,2 -cpu 0,8 -cpu 0 (int, int) (92, 92) (128, 128) (128, 128) (64, 64) (double, double) (150, 150) (300, 300) (300, 300) (150, 150) (double, int) (280, 120) (290, 128) (300, 128) (120, 64) throughput, ops/ 𝜇s Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 44/52
  • 69. Conclusion Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 45/52
  • 70. Enlarge your knowledge with these simple tricks! Reading list: ∙ “Computer Architecture: A Quantitative Approach” John L. Hennessy, David A. Patterson ∙ CPU vendors documentation ∙ http://www.agner.org/optimize/ ∙ etc. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 46/52
  • 71. Thanks! Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 47/52
  • 72. Q & A ? Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 48/52
  • 73. Appendix Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 49/52
  • 74. Appendix: Frequency Variance ∙ Dynamic CPU Frequency – TurboBoost and similar "The processor must be working in the power, temperature, and specification limits of the thermal design power (TDP)."©Intel Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 50/52
  • 75. Appendix: TurboBoost in action max normal freq. measured freq. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 51/52
  • 76. Appendix: Set Fixed Frequency! e.g. cpufreq-set -u 2600000 -g performance Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 52/52