SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
A Lock-Free Hash Table

The Art of (Java)
Benchmarking
Dr. Cliff Click
Chief JVM Architect & Distinguished Engineer
blogs.azulsystems.com/cliff
Azul Systems
June 14, 2010
Benchmarking is Easy!!!
www.azulsystems.com

•
•
•
•

And Fun!!!
“My Java is faster than your C!!!”
And generally wrong...
Without exception every microbenchmark I've seen has
had serious flaws
─ Except those I've had a hand in correcting

• Serious =
─ “Score” is unrelated to intended measurement or
─ error bars exceed measured values
Split out micro-bench vs macro-bench
www.azulsystems.com

• Micro-benchmarks are things you write yourself
─ Attempt to discover some narrow targeted fact
─ Generally a timed tight loop around some “work”
─ Report score as iterations/sec
─ e.g. allocations/sec – object pooling vs GC

• Macro-benchmarks are supposed to be realistic
─ Larger, longer running
─ e.g. WebServer, DB caching/front-end, Portal App
─ SpecJBB, SpecJAppServer, XMLMark, Trade6
─ Load testing of Your Real App

|

©2007 Azul Systems, Inc.
Some Older Busted Micro-Benchmarks
www.azulsystems.com

• CaffeineMark “logic”
─ trivially made dead by JIT; infinite speedup

• SciMark2 Monte-Carlo
─ 80% of time in sync'd Random.next
─ Several focused tests dead; infinite speedup

• SpecJVM98 _209_db – purports to be a DB test
─ Really: 85% of time in String shell-sort

• SpecJVM98 _227_mtrt
─ Runtime is much less than 1 sec

|

©2007 Azul Systems, Inc.
Some Older Busted Micro-Benchmarks
www.azulsystems.com

• CaffeineMark “logic”
─ trivially made dead by JIT; infinite speedup

• SciMark2 Monte-Carlo
─ 80% of time in sync'd Random.next
─ Several focused tests dead; infinite speedup

• SpecJVM98 _209_db – purports to be a DB test
─ Really: 85% of time in String shell-sort

• SpecJVM98 _227_mtrt
─ Runtime is much less than 1 sec

|

©2007 Azul Systems, Inc.
Dead Loops
www.azulsystems.com

• Timeline:

// how fast is divide-by-10?
long start = Sys.CTM();
for( int i=0; i<N; i++ )
int x = i/10;
return N*1000/(Sys.CTM()-start);

─ 1- Interpret a while, assume 10ms

─ 2- JIT; “x” not used, loop is dead, removed, 10ms
─ 3- “Instantly” execute rest of loop

• Time to run: 20ms – Independent of N!
─ Vary N ==> vary score ==> “Dial-o-Score!”

|

©2007 Azul Systems, Inc.
Dead Loops
www.azulsystems.com

• Sometimes JIT proves “results not needed”
─ Then throws out whole work-loop
─ After running long enough to JIT
─ So loop runs at least a little while first

• Score “ops/sec” not related to trip count 'N'
─ Larger N ==> larger score

• Score can be infinite- or NaN
─ Generally reported as a very large, but valid #
─ And mixed in with other numbers, confusing things
─ (e.g. geomean of infinite's and other more real numbers)

|

©2007 Azul Systems, Inc.
SciMark2 Monte-Carlo
www.azulsystems.com

• 80% of time in synchronize'd Random.next
• 3-letter-company “spammed” it by replacing with intrinsic
doing a CompareAndSwap (CAS)

•
•
•
•
•

|

I was ordered to follow suit (match performance)
Doug Lea said “wait: just make a CAS from Java”
Hence sun.misc.AtomicLong was born
Rapidly replaced by Unsafe.compareAndSwap...
...and eventually java.lang.Atomic*

©2007 Azul Systems, Inc.
Micro-bench Advice: Warmup
www.azulsystems.com

• Code starts interpreted, then JIT'd
─ JIT'd code is 10x faster than interpreter

• JIT'ing happens “after a while”
─ HotSpot -server: 10,000 iterations
─ Plus compile time

• Warmup code with some trial runs
─ Keeping testing until run-times stabilize

|

©2007 Azul Systems, Inc.
Micro-bench Advice: Warmup
www.azulsystems.com

• Not allowing warmup is a common mistake
• Popular failure-mode of C-vs-Java comparisons
─ Found on many, many, many web pages
─ Entire benchmark runs in few milli-seconds
─ There are domains requiring milli-second reboots...

• But most desktop/server apps expect:
─ Reboots are minutes long and days apart
─ Steady-state throughput after warmup is key
─ So a benchmark that ends in <10sec

probably does not measure anything interesting

|

©2007 Azul Systems, Inc.
Micro-bench Advice: “Compile plan”
www.azulsystems.com

• JIT makes inlining & other complex decisions
─ Based on very volatile & random data
─ Inline decisions vary from run-to-run

• Performance varies from run-to-run
─ Stable numbers within a single JVM invocation
─ But could vary by >20% with new JVM launch
─ Bigger apps are more performance-stable

• Micro-benchmarks tend to be “fragile” here
─ e.g. 1 JVM launch in 5 will be 20% slower*

*”S tatistically R igorous Java P erformance E valuation” OOPSLA 2007
|

©2007 Azul Systems, Inc.
Micro-bench Advice: “Compile plan”
www.azulsystems.com

A:
loop1:
…
loop2:
…
call C
…
jne loop2
…
jne loop1
return
C:
…
return
|

©2007 Azul Systems, Inc.

public int a() { for(...) b(); }
public int b() { for(...) c(); }
public int c() { …work... }

20% chance
A inlines B
B calls C

80% chance
A calls B
B inlines C

A:
loop1:
…
call B
jne loop1
return
B:
loop2:
...
jne loop2
…
return
Micro-bench Advice: “Compile plan”
www.azulsystems.com

• Launch the JVM many times
─ Toss 1st launch to remove OS caching effects
─ Average out “good” runs with the “bad”
─ Don't otherwise toss outliers
─ (unless you have good reason: i.e. unrelated load)

• Enough times to get statistically relevant results
─ Might require 30+ runs

• Report average and standard deviation
─ In this case, expect to see a large std.dev

*”Statistically rigorous Java performance evaluation” OOPSLA 2007
|

©2007 Azul Systems, Inc.
Micro-bench Advice: “1st fast, 2nd slow”
www.azulsystems.com

• Timing harness needs to invoke many targets
─ In a loop, repeatedly a few times
─ Else JIT sees 1 hot target in a loop
─ And then does a guarded inline
─ And then hoists the timed work outside of timing loop

class bench1 implements bench { void sqrt(int i); }
class bench2 implements bench { void sqrt(int i); }
static final int N=1000000; // million
...
static int test( bench B ) {
long start = System.currentTimeMillis();
for( int i=0; i<N; i++ )
B.sqrt(i);
// hot loop v-call
return N*1000/(System.currentTimeMillis()-start);
|

©2007 Azul Systems, Inc.
Micro-bench Advice: “1st fast, 2nd slow”
www.azulsystems.com

Pass in one of two
different classes

class bench1 implements bench { void sqrt(int i); }
class bench2 implements bench { void sqrt(int i); }
static final int N=1000000; // million
...
static int test( bench B ) {
long start = System.currentTimeMillis();
for( int i=0; i<N; i++ )
B.sqrt(i);
// hot loop v-call
return N*1000/(System.currentTimeMillis()-start);
|

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• First call: test(new bench1)

long start = Sys.CTM();
for( int i=0; i<N; i++ )
B.sqrt(i);
return N*1000/(Sys.CTM()-start);

|

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• First call: test(new bench1)
─ Single target callsite; JIT does guarded inlining
─ Inlines bench1.sqrt

long start = Sys.CTM();
for( int i=0; i<N; i++ )
B.sqrt(i);
return N*1000/(Sys.CTM()-start);

|

long start = Sys.CTM();
guarded
for( int i=0; i<N; i++ )
inline
Math.sqrt(i); // inline bench1.sqrt
return N*1000/(Sys.CTM()-start);

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• First call: test(new bench1)
─ Single target callsite; JIT does guarded inlining
─ Inlines bench1.sqrt
─ Hoists loop-invariants, dead-code-remove, etc
─ Execution time does NOT depend on N!!!
─ Dreaded “Dial-o-Score!”

long start = Sys.CTM();
long start = Sys.CTM();
// JIT'd loop removed
for( int i=0; i<N; i++ )
return N*1000/(Sys.CTM()-start);
B.sqrt(i);
return N*1000/(Sys.CTM()-start);

|

long start = Sys.CTM();
guarded
for( int i=0; i<N; i++ )
dead code
inline
Math.sqrt(i); // inline bench1.sqrt
return N*1000/(Sys.CTM()-start);

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• Second call: test(new bench2)
─ 2nd target of call; guarded inlining fails
─ Code is incorrect; must be re-JIT'd
─ Measures overhead of N calls to bench2.sqrt
─ Plus guard failure, deoptimization
─ Plus JIT'ing new version of test()
─ Plus virtual call overhead

long start = System.CTM();
for( int i=0; i<N; i++ )
int bench2.sqrt( int i ) {
B.sqrt(i);
// alternate impl
return N*1000/(System.CTM()-start);
}
|

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• Reversing order of calls reverses “good” & “bad”
─ e.g. “test(new

|

©2007 Azul Systems, Inc.

bench2); test(new bench1);”
Micro-bench Advice: v-call optimization
www.azulsystems.com

• Reversing order of calls reverses “good” & “bad”
─ e.g. “test(new

bench2); test(new bench1);”

• Timing harness needs to invoke all targets
─ In a loop, repeatedly a few times

class bench1 implements bench { void sqrt(int i); }
class bench2 implements bench { void sqrt(int i); }
...
// warmup loop
for( int i=0; i<10; i++ ) {
test( new bench1 );
test( new bench2 );
}
// now try timing
printf(test(new bench1));
|

©2007 Azul Systems, Inc.
Micro-bench Advice: v-call optimization
www.azulsystems.com

• Reversing order of calls reverses “good” & “bad”
─ e.g. “test(new

bench2); test(new bench1);”

• Timing harness needs to invoke all targets
─ In a loop, repeatedly a few times

class bench1 implements bench { void sqrt(int i); }
class bench2 implements bench { void sqrt(int i); }
...
// warmup loop
for( int i=0; i<10; i++ ) {
test( new bench1 );
test( new bench2 );
}
// now try timing
printf(test(new bench1));
|

©2007 Azul Systems, Inc.
Micro-bench Advice: GC
www.azulsystems.com

• Avoid GC or embrace it
• Either no (or trivial) allocation, or use verbose:gc to
make sure you hit steady-state

• Statistics: not just average, but also std-dev
• Look for trends
─ Could be creeping GC behavior

• Could be “leaks” causing more-work-per-run
─ e.g. leaky HashTable growing heap or
─ Growing a LinkedList slows down searches

|

©2007 Azul Systems, Inc.
Micro-bench Advice: Synchronization
www.azulsystems.com

• Account for multi-threaded & locking
• I do see people testing, e.g. locking costs on singlethreaded programs

• Never contended lock is very cheap
─ +BiasedLocking makes it even cheaper

• Very slightly contended lock is probably 4x more
• Real contention: Amdahl's Law
─ Plus lots and lots of OS overhead

• java.util.concurrent is your friend

|

©2007 Azul Systems, Inc.
Micro-bench Advice
www.azulsystems.com

• Realistic runtimes
─ Unless you need sub-milli-sec reboots

•
•
•
•
•

|

Warm-up loops - give the JIT a chance
Statistics: plan for variation in results
Dead loops – look for “Dial-o-Score!”, deal with it
1st run fast, 2nd run slow – look for it, deal with it
GC: avoid or embrace

©2007 Azul Systems, Inc.
Macro-bench warnings
www.azulsystems.com

• JVM98 is too small anymore
─ Easy target; cache-resident; GC ignored

• JBB2000, 2005
─ Not much harder target
─ VERY popular, easy enough to “spam”
─ Score rarely related to anything real

• SpecJAppServer, DaCapo, SpecJVM2008, XMLMark
─ Bigger, harder to spam, less popular

|

©2007 Azul Systems, Inc.
Macro-bench warnings
www.azulsystems.com

• Popular ones are targeted by companies
• General idea: JVM engineers are honest
─ But want the best for company
─ So do targeted optimizations
─ e.g. intrinsic CAS for Random.next
─ Probably useful to somebody
─ Never incorrect
─ Definitely helps this benchmark

|

©2007 Azul Systems, Inc.
Typical Performance Tuning Cycle
www.azulsystems.com

•
•
•
•

Benchmark X becomes popular
Management tells Engineer: “Improve X's score!”
Engineer does an in-depth study of X
Decides optimization “Y” will help
─ And Y is not broken for anybody
─ Possibly helps some other program

• Implements & ships a JVM with “Y”
• Management announces score of “X” is now 2*X
• Users yawn in disbelief: “Y” does not help them

|

©2007 Azul Systems, Inc.
SpecJBB2000
www.azulsystems.com

• Embarrassing parallel - no contended locking
• No I/O, no database, no old-gen GC
─ NOT typically of any middle-ware
─ Very high allocation rate of young-gen objects, definitely not

typically
─ But maybe your program gets close?

• Key to performance:

having enough Heap to avoid old-gen GC
during 4-min timed window

|

©2007 Azul Systems, Inc.
SpecJBB2000: Spamming
www.azulsystems.com

• Drove TONS of specialized GC behaviors & flags
─ Across many vendors
─ Many rolled into “-XX:+AggressiveOptimizations”
─ Goal: no old-gen GC in 4 minutes

• 3-letter-company “spammed” - with a 64-bit VM and 12Gig
heap (in an era of 3.5G max heaps)

─ Much more allocation, hence “score” before GC
─ Note that while huge heaps are generically useful to somebody,

12Gig was not typical of the time
─ Forced Sun to make a 64-bit VM

|

©2007 Azul Systems, Inc.
SpecJBB2000: Spamming
www.azulsystems.com

• Drove TONS of specialized GC behaviors & flags
─ Across many vendors
─ Many rolled into “-XX:+AggressiveOptimizations”
─ Goal: no old-gen GC in 4 minutes

• 3-letter-company “spammed” - with a 64-bit VM and 12Gig
heap (in an era of 3.5G max heaps)

─ Much more allocation, hence “score” before GC
─ Note that while huge heaps are generically useful to somebody,

12Gig was not typical of the time
─ Forced Sun to make a 64-bit VM

|

©2007 Azul Systems, Inc.
What can you read from the results?
www.azulsystems.com

• The closer your apps resemble benchmark “X”
─ The closer improvements to X's score impact you

• Huge improvements to unrelated benchmarks
─ Might be worthless to you

• e.g. SpecJBB2000 is a perfect-young-gen GC test
─ Improvements to JBB score have been tied to better young-gen

behavior
─ Most web-servers suffer from OLD-gen GC issues
─ Improving young-gen didn't help web-servers much

|

©2007 Azul Systems, Inc.
SpecJBB2005
www.azulsystems.com

• Intended to fix JBB2000's GC issues
─ No explicit GC between timed windows
─ Penalize score if GC pause is too much

(XTNs are delayed too long)
─ Same as JBB2000, but more XML
─ Needs some Java6-isms optimized

• Still embarrassing parallel – young-gen GC test
• Azul ran up to 1700 warehouse/threads
on a 350Gig heap, allocating 20Gigabytes/sec
for 3.5 days and STILL no old-gen GC

|

©2007 Azul Systems, Inc.
SpecJBB2005
www.azulsystems.com

• Intended to fix JBB2000's GC issues
─ No explicit GC between timed windows
─ Penalize score if GC pause is too much

(XTNs are delayed too long)
─ Same as JBB2000, but more XML
─ Needs some Java6-isms optimized

• Still embarrassing parallel – young-gen GC test
• Azul ran up to 1700 warehouse/threads
on a 350Gig heap, allocating 20Gigabytes/sec
for 3.5 days and STILL no old-gen GC

|

©2007 Azul Systems, Inc.
Some Popular Macro-Benchmarks
www.azulsystems.com

• SpecJVM98 – too small, no I/O, no GC
─ 227_mtrt – too short to say anything
─ Escape Analysis pays off too well here
─ 209_db – string-sort NOT db, performance tied to TLB & cache

structure, not JVM
─ 222_mpegaudio – subject to odd FP optimizations
─ 228_jack – throws heavy exceptions – but so do many appservers; also parsers are popular. Improvements here might carry
over
─ 213_javac – generically useful metric for modest CPU bound
applications

|

©2007 Azul Systems, Inc.
Some Popular Macro-Benchmarks
www.azulsystems.com

• SpecJVM98 – too small, no I/O, no GC
─ 227_mtrt – too short to say anything
─ Escape Analysis pays off too well here
─ 209_db – string-sort NOT db, performance tied to TLB & cache

structure, not JVM
─ 222_mpegaudio – subject to odd FP optimizations
─ 228_jack – throws heavy exceptions – but so do many appservers; also parsers are popular. Improvements here might carry
over
─ 213_javac – generically useful metric for modest CPU bound
applications

|

©2007 Azul Systems, Inc.
SpecJAppServer
www.azulsystems.com

•
•
•
•

Very hard to setup & run
Very network, I/O & DB intensive

Need a decent (not great) JVM (e.g. GC is < 5%)
But peak score depends on an
uber-DB and fast disk or network
• Not so heavily optimized by JVM Engineers
• Lots of “flex” in setup rules (DB & network config)

• So hard to read the results unless your external (non-JVM)
setup is similar

|

©2007 Azul Systems, Inc.
SpecJAppServer
www.azulsystems.com

•
•
•
•

Very hard to setup & run
Very network, I/O & DB intensive

Need a decent (not great) JVM (e.g. GC is < 5%)
But peak score depends on an
uber-DB and fast disk or network
• Not so heavily optimized by JVM Engineers
• Lots of “flex” in setup rules (DB & network config)

• So hard to read the results unless your external (non-JVM)
setup is similar

|

©2007 Azul Systems, Inc.
DaCapo
www.azulsystems.com

•
•
•
•
•
•
•

Less popular so less optimized
Realistic of mid-sized POJO apps
NOT typical of app-servers, J2EE stuff
Expect 1000's of classes loaded & methods JIT'd
Some I/O, more typical GC behavior
Much better score reporting rules
DaCapo upgrades coming soon!
─ New version has web-servers & parallel codes

|

©2007 Azul Systems, Inc.
DaCapo
www.azulsystems.com

•
•
•
•
•
•
•

Less popular so less optimized
Realistic of mid-sized POJO apps
NOT typical of app-servers, J2EE stuff
Expect 1000's of classes loaded & methods JIT'd
Some I/O, more typical GC behavior
Much better score reporting rules
DaCapo upgrades coming soon!
─ New version has web-servers & parallel codes

|

©2007 Azul Systems, Inc.
Some Popular Macro-Benchmarks
www.azulsystems.com

• XMLMark
─ Perf varies by 10x based on XML parser & JDK version
─ Too-well-behaved young-gen allocation
─ Like DaCapo – more realistic of mid-sized POJO apps
─ Very parallel (not a contention benchmark)

unlike most app-servers

• SpecJVM2008
─ Also like DaCapo – realistically sized POJO apps
─ But also has web-servers & parallel apps
─ Newer, not so heavily targeted by Vendors

|

©2007 Azul Systems, Inc.
Some Popular Macro-Benchmarks
www.azulsystems.com

• XMLMark
─ Perf varies by 10x based on XML parser & JDK version
─ Too-well-behaved young-gen allocation
─ Like DaCapo – more realistic of mid-sized POJO apps
─ Very parallel (not a contention benchmark)

unlike most app-servers

• SpecJVM2008
─ Also like DaCapo – realistically sized POJO apps
─ But also has web-servers & parallel apps
─ Newer, not so heavily targeted by Vendors

|

©2007 Azul Systems, Inc.
“Popular” Macro-Benchmark Problems
www.azulsystems.com

• Unrealistic treatment of GC
─ e.g. None in timed window
─ Or perfect young-gen collections
─ Real apps typical trigger full GC every hour or so

• Unrealistic load generation
─ Not enough load to stress system
─ Or very simple or repetitive loads
─ Bottlenecks in getting load to server

|

©2007 Azul Systems, Inc.
“Popular” Macro-Benchmark Problems
www.azulsystems.com

• Benchmark too short for full GC
─ Many real applications leak
─ Broken 3rd party libs, legacy code, etc
─ Leaks accumulate in old-gen
─ Which makes old-gen full GC expensive
─ But benchmark never triggers old-gen full GC

• I/O & DB not benchmarked well
─ But make a huge difference in Real Life
─ Your app might share I/O & DB with others

|

©2007 Azul Systems, Inc.
Summary
www.azulsystems.com

• Macrobenchmarks
─ Targeted by JVM Engineers
─ Buyer Beware!
─ The closer the benchmark is to your problem
─ The more likely improvements will impact you
─ GC is likely to not be typical of real applications
─ Your applications ever go 3.5 days without a full GC?
─ I/O & DB load also probably not typical

|

©2007 Azul Systems, Inc.
Summary
www.azulsystems.com

• Microbenchmarks
Easy to Write, Hard to get Right
Easy to be Fooled
Won't tell you much about macro-code anyways
Warmup – 1's of seconds to 10's of seconds
Statistics – average lots of runs
─ Even out variations in the “compile plan”
─ Call out to many methods in the hot loop
─ Be wary of dead-code super-score results
─
─
─
─
─

|

©2007 Azul Systems, Inc.
www.azulsystems.com

Put Micro-Trust in a Micro-Benchmark!

|

©2007 Azul Systems, Inc.
www.azulsystems.com

Put Micro-Trust in a Micro-Benchmark!

|

©2007 Azul Systems, Inc.

Contenu connexe

Tendances

[GEG1] 10.camera-centric engine design for multithreaded rendering
[GEG1] 10.camera-centric engine design for multithreaded rendering[GEG1] 10.camera-centric engine design for multithreaded rendering
[GEG1] 10.camera-centric engine design for multithreaded rendering
종빈 오
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
Sri Prasanna
 
A synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time JavaA synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time Java
Universidad Carlos III de Madrid
 

Tendances (14)

[GEG1] 10.camera-centric engine design for multithreaded rendering
[GEG1] 10.camera-centric engine design for multithreaded rendering[GEG1] 10.camera-centric engine design for multithreaded rendering
[GEG1] 10.camera-centric engine design for multithreaded rendering
 
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
A synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time JavaA synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time Java
 
Timelapse: interactive record/replay for the web
Timelapse: interactive record/replay for the webTimelapse: interactive record/replay for the web
Timelapse: interactive record/replay for the web
 
The Art Of Performance Tuning
The Art Of Performance TuningThe Art Of Performance Tuning
The Art Of Performance Tuning
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
04 performance
04 performance04 performance
04 performance
 
Performance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismPerformance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive Parallelism
 
Symbian OS - Active Objects
Symbian OS - Active ObjectsSymbian OS - Active Objects
Symbian OS - Active Objects
 
Performance tests with Gatling (extended)
Performance tests with Gatling (extended)Performance tests with Gatling (extended)
Performance tests with Gatling (extended)
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsSoftware Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and Flamegraphs
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
Getting started with Burst – Unite Copenhagen 2019
Getting started with Burst – Unite Copenhagen 2019Getting started with Burst – Unite Copenhagen 2019
Getting started with Burst – Unite Copenhagen 2019
 

Similaire à The Art of Java Benchmarking

QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QAFest
 
谷歌 Scott-lessons learned in testability
谷歌 Scott-lessons learned in testability谷歌 Scott-lessons learned in testability
谷歌 Scott-lessons learned in testability
drewz lin
 
HW Emulators: Does it Belong in your Verification Tool Chest?
HW Emulators: Does it Belong in your Verification Tool Chest?HW Emulators: Does it Belong in your Verification Tool Chest?
HW Emulators: Does it Belong in your Verification Tool Chest?
DVClub
 
Advanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien PouliotAdvanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien Pouliot
Xamarin
 

Similaire à The Art of Java Benchmarking (20)

Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
 
02 performance
02 performance02 performance
02 performance
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Moving from Jenkins 1 to 2 declarative pipeline adventures
Moving from Jenkins 1 to 2 declarative pipeline adventuresMoving from Jenkins 1 to 2 declarative pipeline adventures
Moving from Jenkins 1 to 2 declarative pipeline adventures
 
JavaFX Pitfalls
JavaFX PitfallsJavaFX Pitfalls
JavaFX Pitfalls
 
谷歌 Scott-lessons learned in testability
谷歌 Scott-lessons learned in testability谷歌 Scott-lessons learned in testability
谷歌 Scott-lessons learned in testability
 
Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelle
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Azure Quantum with IBM Qiskit and IonQ QPU
Azure Quantum with IBM Qiskit and IonQ QPUAzure Quantum with IBM Qiskit and IonQ QPU
Azure Quantum with IBM Qiskit and IonQ QPU
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
Java Micro-Benchmarking
Java Micro-BenchmarkingJava Micro-Benchmarking
Java Micro-Benchmarking
 
Eco-friendly Linux kernel development
Eco-friendly Linux kernel developmentEco-friendly Linux kernel development
Eco-friendly Linux kernel development
 
HW Emulators: Does it Belong in your Verification Tool Chest?
HW Emulators: Does it Belong in your Verification Tool Chest?HW Emulators: Does it Belong in your Verification Tool Chest?
HW Emulators: Does it Belong in your Verification Tool Chest?
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
關於測試,我說的其實是......
關於測試,我說的其實是......關於測試,我說的其實是......
關於測試,我說的其實是......
 
Digital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural ModelingDigital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural Modeling
 
Advanced patterns in asynchronous programming
Advanced patterns in asynchronous programmingAdvanced patterns in asynchronous programming
Advanced patterns in asynchronous programming
 
Advanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien PouliotAdvanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien Pouliot
 
How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?
 

Plus de Azul Systems Inc.

Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
Azul Systems Inc.
 

Plus de Azul Systems Inc. (20)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
Zulu Embedded Java Introduction
Zulu Embedded Java IntroductionZulu Embedded Java Introduction
Zulu Embedded Java Introduction
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
 
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
 
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and ToolsIntelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

The Art of Java Benchmarking

  • 1. A Lock-Free Hash Table The Art of (Java) Benchmarking Dr. Cliff Click Chief JVM Architect & Distinguished Engineer blogs.azulsystems.com/cliff Azul Systems June 14, 2010
  • 2. Benchmarking is Easy!!! www.azulsystems.com • • • • And Fun!!! “My Java is faster than your C!!!” And generally wrong... Without exception every microbenchmark I've seen has had serious flaws ─ Except those I've had a hand in correcting • Serious = ─ “Score” is unrelated to intended measurement or ─ error bars exceed measured values
  • 3. Split out micro-bench vs macro-bench www.azulsystems.com • Micro-benchmarks are things you write yourself ─ Attempt to discover some narrow targeted fact ─ Generally a timed tight loop around some “work” ─ Report score as iterations/sec ─ e.g. allocations/sec – object pooling vs GC • Macro-benchmarks are supposed to be realistic ─ Larger, longer running ─ e.g. WebServer, DB caching/front-end, Portal App ─ SpecJBB, SpecJAppServer, XMLMark, Trade6 ─ Load testing of Your Real App | ©2007 Azul Systems, Inc.
  • 4. Some Older Busted Micro-Benchmarks www.azulsystems.com • CaffeineMark “logic” ─ trivially made dead by JIT; infinite speedup • SciMark2 Monte-Carlo ─ 80% of time in sync'd Random.next ─ Several focused tests dead; infinite speedup • SpecJVM98 _209_db – purports to be a DB test ─ Really: 85% of time in String shell-sort • SpecJVM98 _227_mtrt ─ Runtime is much less than 1 sec | ©2007 Azul Systems, Inc.
  • 5. Some Older Busted Micro-Benchmarks www.azulsystems.com • CaffeineMark “logic” ─ trivially made dead by JIT; infinite speedup • SciMark2 Monte-Carlo ─ 80% of time in sync'd Random.next ─ Several focused tests dead; infinite speedup • SpecJVM98 _209_db – purports to be a DB test ─ Really: 85% of time in String shell-sort • SpecJVM98 _227_mtrt ─ Runtime is much less than 1 sec | ©2007 Azul Systems, Inc.
  • 6. Dead Loops www.azulsystems.com • Timeline: // how fast is divide-by-10? long start = Sys.CTM(); for( int i=0; i<N; i++ ) int x = i/10; return N*1000/(Sys.CTM()-start); ─ 1- Interpret a while, assume 10ms ─ 2- JIT; “x” not used, loop is dead, removed, 10ms ─ 3- “Instantly” execute rest of loop • Time to run: 20ms – Independent of N! ─ Vary N ==> vary score ==> “Dial-o-Score!” | ©2007 Azul Systems, Inc.
  • 7. Dead Loops www.azulsystems.com • Sometimes JIT proves “results not needed” ─ Then throws out whole work-loop ─ After running long enough to JIT ─ So loop runs at least a little while first • Score “ops/sec” not related to trip count 'N' ─ Larger N ==> larger score • Score can be infinite- or NaN ─ Generally reported as a very large, but valid # ─ And mixed in with other numbers, confusing things ─ (e.g. geomean of infinite's and other more real numbers) | ©2007 Azul Systems, Inc.
  • 8. SciMark2 Monte-Carlo www.azulsystems.com • 80% of time in synchronize'd Random.next • 3-letter-company “spammed” it by replacing with intrinsic doing a CompareAndSwap (CAS) • • • • • | I was ordered to follow suit (match performance) Doug Lea said “wait: just make a CAS from Java” Hence sun.misc.AtomicLong was born Rapidly replaced by Unsafe.compareAndSwap... ...and eventually java.lang.Atomic* ©2007 Azul Systems, Inc.
  • 9. Micro-bench Advice: Warmup www.azulsystems.com • Code starts interpreted, then JIT'd ─ JIT'd code is 10x faster than interpreter • JIT'ing happens “after a while” ─ HotSpot -server: 10,000 iterations ─ Plus compile time • Warmup code with some trial runs ─ Keeping testing until run-times stabilize | ©2007 Azul Systems, Inc.
  • 10. Micro-bench Advice: Warmup www.azulsystems.com • Not allowing warmup is a common mistake • Popular failure-mode of C-vs-Java comparisons ─ Found on many, many, many web pages ─ Entire benchmark runs in few milli-seconds ─ There are domains requiring milli-second reboots... • But most desktop/server apps expect: ─ Reboots are minutes long and days apart ─ Steady-state throughput after warmup is key ─ So a benchmark that ends in <10sec probably does not measure anything interesting | ©2007 Azul Systems, Inc.
  • 11. Micro-bench Advice: “Compile plan” www.azulsystems.com • JIT makes inlining & other complex decisions ─ Based on very volatile & random data ─ Inline decisions vary from run-to-run • Performance varies from run-to-run ─ Stable numbers within a single JVM invocation ─ But could vary by >20% with new JVM launch ─ Bigger apps are more performance-stable • Micro-benchmarks tend to be “fragile” here ─ e.g. 1 JVM launch in 5 will be 20% slower* *”S tatistically R igorous Java P erformance E valuation” OOPSLA 2007 | ©2007 Azul Systems, Inc.
  • 12. Micro-bench Advice: “Compile plan” www.azulsystems.com A: loop1: … loop2: … call C … jne loop2 … jne loop1 return C: … return | ©2007 Azul Systems, Inc. public int a() { for(...) b(); } public int b() { for(...) c(); } public int c() { …work... } 20% chance A inlines B B calls C 80% chance A calls B B inlines C A: loop1: … call B jne loop1 return B: loop2: ... jne loop2 … return
  • 13. Micro-bench Advice: “Compile plan” www.azulsystems.com • Launch the JVM many times ─ Toss 1st launch to remove OS caching effects ─ Average out “good” runs with the “bad” ─ Don't otherwise toss outliers ─ (unless you have good reason: i.e. unrelated load) • Enough times to get statistically relevant results ─ Might require 30+ runs • Report average and standard deviation ─ In this case, expect to see a large std.dev *”Statistically rigorous Java performance evaluation” OOPSLA 2007 | ©2007 Azul Systems, Inc.
  • 14. Micro-bench Advice: “1st fast, 2nd slow” www.azulsystems.com • Timing harness needs to invoke many targets ─ In a loop, repeatedly a few times ─ Else JIT sees 1 hot target in a loop ─ And then does a guarded inline ─ And then hoists the timed work outside of timing loop class bench1 implements bench { void sqrt(int i); } class bench2 implements bench { void sqrt(int i); } static final int N=1000000; // million ... static int test( bench B ) { long start = System.currentTimeMillis(); for( int i=0; i<N; i++ ) B.sqrt(i); // hot loop v-call return N*1000/(System.currentTimeMillis()-start); | ©2007 Azul Systems, Inc.
  • 15. Micro-bench Advice: “1st fast, 2nd slow” www.azulsystems.com Pass in one of two different classes class bench1 implements bench { void sqrt(int i); } class bench2 implements bench { void sqrt(int i); } static final int N=1000000; // million ... static int test( bench B ) { long start = System.currentTimeMillis(); for( int i=0; i<N; i++ ) B.sqrt(i); // hot loop v-call return N*1000/(System.currentTimeMillis()-start); | ©2007 Azul Systems, Inc.
  • 16. Micro-bench Advice: v-call optimization www.azulsystems.com • First call: test(new bench1) long start = Sys.CTM(); for( int i=0; i<N; i++ ) B.sqrt(i); return N*1000/(Sys.CTM()-start); | ©2007 Azul Systems, Inc.
  • 17. Micro-bench Advice: v-call optimization www.azulsystems.com • First call: test(new bench1) ─ Single target callsite; JIT does guarded inlining ─ Inlines bench1.sqrt long start = Sys.CTM(); for( int i=0; i<N; i++ ) B.sqrt(i); return N*1000/(Sys.CTM()-start); | long start = Sys.CTM(); guarded for( int i=0; i<N; i++ ) inline Math.sqrt(i); // inline bench1.sqrt return N*1000/(Sys.CTM()-start); ©2007 Azul Systems, Inc.
  • 18. Micro-bench Advice: v-call optimization www.azulsystems.com • First call: test(new bench1) ─ Single target callsite; JIT does guarded inlining ─ Inlines bench1.sqrt ─ Hoists loop-invariants, dead-code-remove, etc ─ Execution time does NOT depend on N!!! ─ Dreaded “Dial-o-Score!” long start = Sys.CTM(); long start = Sys.CTM(); // JIT'd loop removed for( int i=0; i<N; i++ ) return N*1000/(Sys.CTM()-start); B.sqrt(i); return N*1000/(Sys.CTM()-start); | long start = Sys.CTM(); guarded for( int i=0; i<N; i++ ) dead code inline Math.sqrt(i); // inline bench1.sqrt return N*1000/(Sys.CTM()-start); ©2007 Azul Systems, Inc.
  • 19. Micro-bench Advice: v-call optimization www.azulsystems.com • Second call: test(new bench2) ─ 2nd target of call; guarded inlining fails ─ Code is incorrect; must be re-JIT'd ─ Measures overhead of N calls to bench2.sqrt ─ Plus guard failure, deoptimization ─ Plus JIT'ing new version of test() ─ Plus virtual call overhead long start = System.CTM(); for( int i=0; i<N; i++ ) int bench2.sqrt( int i ) { B.sqrt(i); // alternate impl return N*1000/(System.CTM()-start); } | ©2007 Azul Systems, Inc.
  • 20. Micro-bench Advice: v-call optimization www.azulsystems.com • Reversing order of calls reverses “good” & “bad” ─ e.g. “test(new | ©2007 Azul Systems, Inc. bench2); test(new bench1);”
  • 21. Micro-bench Advice: v-call optimization www.azulsystems.com • Reversing order of calls reverses “good” & “bad” ─ e.g. “test(new bench2); test(new bench1);” • Timing harness needs to invoke all targets ─ In a loop, repeatedly a few times class bench1 implements bench { void sqrt(int i); } class bench2 implements bench { void sqrt(int i); } ... // warmup loop for( int i=0; i<10; i++ ) { test( new bench1 ); test( new bench2 ); } // now try timing printf(test(new bench1)); | ©2007 Azul Systems, Inc.
  • 22. Micro-bench Advice: v-call optimization www.azulsystems.com • Reversing order of calls reverses “good” & “bad” ─ e.g. “test(new bench2); test(new bench1);” • Timing harness needs to invoke all targets ─ In a loop, repeatedly a few times class bench1 implements bench { void sqrt(int i); } class bench2 implements bench { void sqrt(int i); } ... // warmup loop for( int i=0; i<10; i++ ) { test( new bench1 ); test( new bench2 ); } // now try timing printf(test(new bench1)); | ©2007 Azul Systems, Inc.
  • 23. Micro-bench Advice: GC www.azulsystems.com • Avoid GC or embrace it • Either no (or trivial) allocation, or use verbose:gc to make sure you hit steady-state • Statistics: not just average, but also std-dev • Look for trends ─ Could be creeping GC behavior • Could be “leaks” causing more-work-per-run ─ e.g. leaky HashTable growing heap or ─ Growing a LinkedList slows down searches | ©2007 Azul Systems, Inc.
  • 24. Micro-bench Advice: Synchronization www.azulsystems.com • Account for multi-threaded & locking • I do see people testing, e.g. locking costs on singlethreaded programs • Never contended lock is very cheap ─ +BiasedLocking makes it even cheaper • Very slightly contended lock is probably 4x more • Real contention: Amdahl's Law ─ Plus lots and lots of OS overhead • java.util.concurrent is your friend | ©2007 Azul Systems, Inc.
  • 25. Micro-bench Advice www.azulsystems.com • Realistic runtimes ─ Unless you need sub-milli-sec reboots • • • • • | Warm-up loops - give the JIT a chance Statistics: plan for variation in results Dead loops – look for “Dial-o-Score!”, deal with it 1st run fast, 2nd run slow – look for it, deal with it GC: avoid or embrace ©2007 Azul Systems, Inc.
  • 26. Macro-bench warnings www.azulsystems.com • JVM98 is too small anymore ─ Easy target; cache-resident; GC ignored • JBB2000, 2005 ─ Not much harder target ─ VERY popular, easy enough to “spam” ─ Score rarely related to anything real • SpecJAppServer, DaCapo, SpecJVM2008, XMLMark ─ Bigger, harder to spam, less popular | ©2007 Azul Systems, Inc.
  • 27. Macro-bench warnings www.azulsystems.com • Popular ones are targeted by companies • General idea: JVM engineers are honest ─ But want the best for company ─ So do targeted optimizations ─ e.g. intrinsic CAS for Random.next ─ Probably useful to somebody ─ Never incorrect ─ Definitely helps this benchmark | ©2007 Azul Systems, Inc.
  • 28. Typical Performance Tuning Cycle www.azulsystems.com • • • • Benchmark X becomes popular Management tells Engineer: “Improve X's score!” Engineer does an in-depth study of X Decides optimization “Y” will help ─ And Y is not broken for anybody ─ Possibly helps some other program • Implements & ships a JVM with “Y” • Management announces score of “X” is now 2*X • Users yawn in disbelief: “Y” does not help them | ©2007 Azul Systems, Inc.
  • 29. SpecJBB2000 www.azulsystems.com • Embarrassing parallel - no contended locking • No I/O, no database, no old-gen GC ─ NOT typically of any middle-ware ─ Very high allocation rate of young-gen objects, definitely not typically ─ But maybe your program gets close? • Key to performance: having enough Heap to avoid old-gen GC during 4-min timed window | ©2007 Azul Systems, Inc.
  • 30. SpecJBB2000: Spamming www.azulsystems.com • Drove TONS of specialized GC behaviors & flags ─ Across many vendors ─ Many rolled into “-XX:+AggressiveOptimizations” ─ Goal: no old-gen GC in 4 minutes • 3-letter-company “spammed” - with a 64-bit VM and 12Gig heap (in an era of 3.5G max heaps) ─ Much more allocation, hence “score” before GC ─ Note that while huge heaps are generically useful to somebody, 12Gig was not typical of the time ─ Forced Sun to make a 64-bit VM | ©2007 Azul Systems, Inc.
  • 31. SpecJBB2000: Spamming www.azulsystems.com • Drove TONS of specialized GC behaviors & flags ─ Across many vendors ─ Many rolled into “-XX:+AggressiveOptimizations” ─ Goal: no old-gen GC in 4 minutes • 3-letter-company “spammed” - with a 64-bit VM and 12Gig heap (in an era of 3.5G max heaps) ─ Much more allocation, hence “score” before GC ─ Note that while huge heaps are generically useful to somebody, 12Gig was not typical of the time ─ Forced Sun to make a 64-bit VM | ©2007 Azul Systems, Inc.
  • 32. What can you read from the results? www.azulsystems.com • The closer your apps resemble benchmark “X” ─ The closer improvements to X's score impact you • Huge improvements to unrelated benchmarks ─ Might be worthless to you • e.g. SpecJBB2000 is a perfect-young-gen GC test ─ Improvements to JBB score have been tied to better young-gen behavior ─ Most web-servers suffer from OLD-gen GC issues ─ Improving young-gen didn't help web-servers much | ©2007 Azul Systems, Inc.
  • 33. SpecJBB2005 www.azulsystems.com • Intended to fix JBB2000's GC issues ─ No explicit GC between timed windows ─ Penalize score if GC pause is too much (XTNs are delayed too long) ─ Same as JBB2000, but more XML ─ Needs some Java6-isms optimized • Still embarrassing parallel – young-gen GC test • Azul ran up to 1700 warehouse/threads on a 350Gig heap, allocating 20Gigabytes/sec for 3.5 days and STILL no old-gen GC | ©2007 Azul Systems, Inc.
  • 34. SpecJBB2005 www.azulsystems.com • Intended to fix JBB2000's GC issues ─ No explicit GC between timed windows ─ Penalize score if GC pause is too much (XTNs are delayed too long) ─ Same as JBB2000, but more XML ─ Needs some Java6-isms optimized • Still embarrassing parallel – young-gen GC test • Azul ran up to 1700 warehouse/threads on a 350Gig heap, allocating 20Gigabytes/sec for 3.5 days and STILL no old-gen GC | ©2007 Azul Systems, Inc.
  • 35. Some Popular Macro-Benchmarks www.azulsystems.com • SpecJVM98 – too small, no I/O, no GC ─ 227_mtrt – too short to say anything ─ Escape Analysis pays off too well here ─ 209_db – string-sort NOT db, performance tied to TLB & cache structure, not JVM ─ 222_mpegaudio – subject to odd FP optimizations ─ 228_jack – throws heavy exceptions – but so do many appservers; also parsers are popular. Improvements here might carry over ─ 213_javac – generically useful metric for modest CPU bound applications | ©2007 Azul Systems, Inc.
  • 36. Some Popular Macro-Benchmarks www.azulsystems.com • SpecJVM98 – too small, no I/O, no GC ─ 227_mtrt – too short to say anything ─ Escape Analysis pays off too well here ─ 209_db – string-sort NOT db, performance tied to TLB & cache structure, not JVM ─ 222_mpegaudio – subject to odd FP optimizations ─ 228_jack – throws heavy exceptions – but so do many appservers; also parsers are popular. Improvements here might carry over ─ 213_javac – generically useful metric for modest CPU bound applications | ©2007 Azul Systems, Inc.
  • 37. SpecJAppServer www.azulsystems.com • • • • Very hard to setup & run Very network, I/O & DB intensive Need a decent (not great) JVM (e.g. GC is < 5%) But peak score depends on an uber-DB and fast disk or network • Not so heavily optimized by JVM Engineers • Lots of “flex” in setup rules (DB & network config) • So hard to read the results unless your external (non-JVM) setup is similar | ©2007 Azul Systems, Inc.
  • 38. SpecJAppServer www.azulsystems.com • • • • Very hard to setup & run Very network, I/O & DB intensive Need a decent (not great) JVM (e.g. GC is < 5%) But peak score depends on an uber-DB and fast disk or network • Not so heavily optimized by JVM Engineers • Lots of “flex” in setup rules (DB & network config) • So hard to read the results unless your external (non-JVM) setup is similar | ©2007 Azul Systems, Inc.
  • 39. DaCapo www.azulsystems.com • • • • • • • Less popular so less optimized Realistic of mid-sized POJO apps NOT typical of app-servers, J2EE stuff Expect 1000's of classes loaded & methods JIT'd Some I/O, more typical GC behavior Much better score reporting rules DaCapo upgrades coming soon! ─ New version has web-servers & parallel codes | ©2007 Azul Systems, Inc.
  • 40. DaCapo www.azulsystems.com • • • • • • • Less popular so less optimized Realistic of mid-sized POJO apps NOT typical of app-servers, J2EE stuff Expect 1000's of classes loaded & methods JIT'd Some I/O, more typical GC behavior Much better score reporting rules DaCapo upgrades coming soon! ─ New version has web-servers & parallel codes | ©2007 Azul Systems, Inc.
  • 41. Some Popular Macro-Benchmarks www.azulsystems.com • XMLMark ─ Perf varies by 10x based on XML parser & JDK version ─ Too-well-behaved young-gen allocation ─ Like DaCapo – more realistic of mid-sized POJO apps ─ Very parallel (not a contention benchmark) unlike most app-servers • SpecJVM2008 ─ Also like DaCapo – realistically sized POJO apps ─ But also has web-servers & parallel apps ─ Newer, not so heavily targeted by Vendors | ©2007 Azul Systems, Inc.
  • 42. Some Popular Macro-Benchmarks www.azulsystems.com • XMLMark ─ Perf varies by 10x based on XML parser & JDK version ─ Too-well-behaved young-gen allocation ─ Like DaCapo – more realistic of mid-sized POJO apps ─ Very parallel (not a contention benchmark) unlike most app-servers • SpecJVM2008 ─ Also like DaCapo – realistically sized POJO apps ─ But also has web-servers & parallel apps ─ Newer, not so heavily targeted by Vendors | ©2007 Azul Systems, Inc.
  • 43. “Popular” Macro-Benchmark Problems www.azulsystems.com • Unrealistic treatment of GC ─ e.g. None in timed window ─ Or perfect young-gen collections ─ Real apps typical trigger full GC every hour or so • Unrealistic load generation ─ Not enough load to stress system ─ Or very simple or repetitive loads ─ Bottlenecks in getting load to server | ©2007 Azul Systems, Inc.
  • 44. “Popular” Macro-Benchmark Problems www.azulsystems.com • Benchmark too short for full GC ─ Many real applications leak ─ Broken 3rd party libs, legacy code, etc ─ Leaks accumulate in old-gen ─ Which makes old-gen full GC expensive ─ But benchmark never triggers old-gen full GC • I/O & DB not benchmarked well ─ But make a huge difference in Real Life ─ Your app might share I/O & DB with others | ©2007 Azul Systems, Inc.
  • 45. Summary www.azulsystems.com • Macrobenchmarks ─ Targeted by JVM Engineers ─ Buyer Beware! ─ The closer the benchmark is to your problem ─ The more likely improvements will impact you ─ GC is likely to not be typical of real applications ─ Your applications ever go 3.5 days without a full GC? ─ I/O & DB load also probably not typical | ©2007 Azul Systems, Inc.
  • 46. Summary www.azulsystems.com • Microbenchmarks Easy to Write, Hard to get Right Easy to be Fooled Won't tell you much about macro-code anyways Warmup – 1's of seconds to 10's of seconds Statistics – average lots of runs ─ Even out variations in the “compile plan” ─ Call out to many methods in the hot loop ─ Be wary of dead-code super-score results ─ ─ ─ ─ ─ | ©2007 Azul Systems, Inc.
  • 47. www.azulsystems.com Put Micro-Trust in a Micro-Benchmark! | ©2007 Azul Systems, Inc.
  • 48. www.azulsystems.com Put Micro-Trust in a Micro-Benchmark! | ©2007 Azul Systems, Inc.