Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.
4. What’s
a
Benchmark
How
fast?
Your
process
vs
Goal
Your
process
vs
Best
PracLces
5. Today
• How
Not
to
Write
Benchmarks
• Benchmark
Setup
&
Results:
-
Wrong
about
the
machine
-
Wrong
about
stats
-
Wrong
about
what
maOers
• Becoming
Less
Wrong
18. Website
Serving
Images
• Access
1
image
1000
Lmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3
19. Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Timing
20. Website
Serving
Images
• Access
1
image
1000
Lmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3
21. Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Timing
• Periodic
interference
22. Website
Serving
Images
• Access
1
image
1000
Lmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3
23. Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Timing
• Periodic
interference
• Different
specs
in
test
vs
prod
machines
24. Website
Serving
Images
• Access
1
image
1000
Lmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3
25. Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Timing
• Periodic
interference
• Different
specs
in
test
vs
prod
machines
• Power
mode
changes
28. Wrong
About
Stats
120
100
80
60
40
20
0
Convergence
of
Median
on
Samples
0
10
20
30
40
50
60
Latency
Time
Stable
Samples
Stable
Median
Decaying
Samples
Decaying
Median
29. Website
Serving
Images
• Access
1
image
1000
Lmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3
37. “Programmers
waste
enormous
amounts
of
Lme
thinking
about
…
the
speed
of
noncriLcal
parts
of
their
programs
...
Forget
about
small
efficiencies
…97%
of
the
Lme:
premature
opKmizaKon
is
the
root
of
all
evil.
Yet
we
should
not
pass
up
our
opportuniLes
in
that
criLcal
3%.”
-‐-‐
Donald
Knuth
38. Wrong
About
What
MaOers
• Premature
opLmizaLon
• UnrepresentaLve
Workloads
39. Wrong
About
What
MaOers
• Premature
opLmizaLon
• UnrepresentaLve
Workloads
• Memory
pressure
48. Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects
• Beware
of
clock
resoluLon
• Dead
Code
EliminaLon
49. Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects
• Beware
of
clock
resoluLon
• Dead
Code
EliminaLon
• Constant
work
per
iteraLon
51. Follow-‐up
Material
• How
NOT
to
Measure
Latency
by
Gil
Tene
– hOp://www.infoq.com/presentaLons/latency-‐piralls
• Taming
the
Long
Latency
Tail
on
highscalability.com
– hOp://highscalability.com/blog/2012/3/12/google-‐taming-‐the-‐
long-‐latency-‐tail-‐when-‐more-‐machines-‐equal.html
• Performance
Analysis
Methodology
by
Brendan
Gregg
– hOp://www.brendangregg.com/methodology.html
• Robust
Java
benchmarking
by
Brent
Boyer
– hOp://www.ibm.com/developerworks/library/j-‐benchmark1/
– hOp://www.ibm.com/developerworks/library/j-‐benchmark2/
• Benchmarking
arLcles
by
Aleksey
Shipilëv
– hOp://shipilev.net/#benchmarking