Interpreting Performance Test Results

INTERPRETING AND REPORTING
PERFORMANCE TEST RESULTS
ERIC PROEGLER

2
ABOUT ME
• 20 Years in Software, 14 in Performance, Context-
Driven for 12
• Performance Engineer/Teacher/Consultant
• Product Manager
• Board of Directors
• Lead Organizer
• Mentor

3
1. Discover
• ID processes & SLAs
• Define use case
workflows
• Model production
workload
3. Analyze
• Run tests
• Monitor system
resources
• Analyze results
5. Report
• Interpret results
• Make recommendations
• Present to stakeholders
2. Develop
• Develop test scripts
• Configure environment
monitors
• Run shakedown test
3
4. Fix
• Diagnose
• Fix
• Re-test
DAN DOWNING’S 5 STEPS OF LOAD TESTING

4
ABOUT THIS SESSION
• Participatory!
• Graphs from actual projects – have any?
• Learn from each other
• Not About Tools
• First Half
• Making observations and forming hypotheses
• Break (~10:00)
• Second Half
• Interpreting and reporting actionable results

5
WHAT CAN WE OBSERVE ABOUT THIS APP?

9
PERFORMANCE KPIS*
Scalability Throughput
System Capacity
*Key Performance Indicators
Workload Achievement

10
HOW COULD WE ANNOTATE THIS GRAPH?
Note scale of
each metric
Mixedunits(sec.,count,%)

11
WHAT DOES THIS SAY ABOUT CAPACITY?

12
WHAT OBSERVATION CAN WE MAKE HERE?

17
DESCRIBE WHAT HAPPENED HERE?

18
TELL A PLAUSIBLE STORY ABOUT THIS

19
WHAT’S THE LESSON FROM THIS GRAPH?
Hurricane Center “Average” of
US hurricane Forecast models
Averages Lie!

20
MEASURE WHAT WHERE?
Mentora /NoVA
Load Injectors
(remote users)
NeoLoad Controller
(1000 vuser license)
Resource
Monitor
Load Injector
(Local users)
7100
Web servers
(Linux/
WebLogic)
App servers
(Linux,
Tuxedo)
DB server
(Solaris/
Oracle)
F5
Load
Balancer
sshport22
https16000
ssh
port22
ssh port 22
JDBC port 1521
7200
80/443
4
1
2
3
5
6

21
MEASURE WHAT WHERE?
Mentora /NoVA
Load Injectors
(remote users)
NeoLoad Controller
(1000 vuser license)
Resource
Monitor
Load Injector
(Local users)
7100
Web servers
(Linux/
WebLogic)
App servers
(Linux,
Tuxedo)
DB server
(Solaris/
Oracle)
F5
Load
Balancer
sshport22
https16000
ssh
port22
ssh port 22
JDBC port 1521
7200
80/443
Proper load balancing
(really at Web/App
servers)
HW resources
Web Server
connections,
queuing, errors
HW resources
JVM heap memory
DB Connection pools
HW resources
Lock waits / Deadlocks
SQL recompiles
Full table scans
Slowest queries
SAN IOPS
Bandwidth throughput
Load injector capacity
Load
Response time
HW resources

22
MEASURE
WITH WHAT?
Component Metrics Tool OS
perfmon Win
sar Unix
prstat Unix
dstat Unix
perfmon Win
Apache monitor Both
IIS monitor Win
Java Virtual Machine
heap memory, garbage
collection, threads
Jconsole, CLR Both
Java Messaging MQ queues
Hermes JMS
MQ Admin console
Both
Application Servers
DB Connection Pool
Heap memory utilization
Consoles for
WebLogic,
Both
DBMS-SQL Server
locking, Table scans, AQL
compilation
perfmon Win
AWR, Statspack Both
Perfmon Win
EMC SAN IOPS, bandwidth Navisphere Analyzer Win
All-Integrated Many LoadRunner, NeoLoad Win
tcp & http traffic wireshark Win
tcp & http traffic network observer Win
http traffic fiddler Win
http headers visualizer livehttpheaders Win
Server HW
cpu, memory, IO, queue
length, NIC bw
Slow queries, deadlocks,
lock waits
DBMS-Oracle
Network
connections, queued
requests, errors
Web Server

23
ANYTHING CONCERNING HERE?
Before:
Slowest transactions
show spikes of 5 - 8
seconds, every 10
minutes
After:
Spikes substantially
reduced after VM
memory increased to
12 GB

24
WHAT ARE WE LOOKING AT HERE?
When does this
become a problem?
When heap space
utilization keeps
growing, despite
garbage collection, and
reaches its max
allocation

25
ANY HYPOTHESES ABOUT THIS?
Before:
Abnormally high TCP
retransmits between
Web and App server
After:
Network issues
resolved

26
TELL A DATA-SUPPORTED STORY ABOUT THIS;
ANNOTATE THE GRAPH

31
…Think “CAVIAR”
C ollecting
A ggregating
V isualizing
I nterpreting
A ssessing
R eporting
FOR ACTIONABLE PERFORMANCE
RESULTS…

32
COLLECTING
• Objective: Gather all results from test that
• help gain confidence in results validity
• Portray system scalability, throughput & capacity
• provide bottleneck / resource limit diagnostics
• help formulate hypotheses
Types Examples Sources Granularity Value
Load
Users | sessions simulated,
files sent, "transactions"
completed
test "scenario", "test harness"
counts, web / app logs, db
queries
At appropriate levels (virtual
users, throughput) for your
context
Correlate with Response times
yields "system scalability"
Errors
http, application, db,
network; "test harness"
web | app logs, db utility,
network trace
Raw data at most granular level Confidence in results validity
Response
Times
page | action / "transaction" /
end-to-end times
test tools | "scripts", web logs
At various levels of app
granularity, linked to objectives
Fundamental unit of measure
for performance
Resources
network, "server",
"middleware", database,
storage, queues
OS tools (vmstat, nmon, sar,
perfmon), vendor monitoring
tools
5-15 second sampling rates, with
logging, to capture transient
spikes
Correlated with Response
times yields "system capacity"
Anecdotes
manual testing, transient
resources, screenshots
People manually testing or
monitoring during the test
Manual testing by different
people & locations
Confidence / corroberation /
triangulation of results

33
AGGREGATING
• Objective: Summarize measurements using
• Various sized time-buckets to provide tree & forest views
• Consistent time-buckets across types to enable accurate correlation
• Meaningful statistics: scatter, min-max range, variance, percentiles
• Multiple metrics to “triangulate”, confirm (or invalidate) hypotheses
Types Examples Tools Statistics Value
Load
Users/sessions; requests; no.
files / msgs sent/rcvd
avg
Basis for ID'ing load sensitivity
of all other metrics
Errors
Error rate, error counts; by url
|type; http 4xx & 5xx
avg-max
ID if errors correlate with load
or resource metrics
Response
Times
workflow end-to-end time;
page/action time
min-avg-max-Std Deviation
90th percentile
Quantify system scalability
Network
Thruput
Megabits/sec. avg-max
ID if thruput plateaus while
load still ramping, or exceeds
network capacity
App
Thruput
Page view rate; completed
transactions by type
avg-max
ID if page view rate can be
sustained, or if "an hours work
can be done in an hour"
Resources
% cpu; cpu, disk queue depth;
memory usage; IOPS; queued
requests; db contention
avg-max
ID limiting resources; provide
diagnostics; quantify system
capacity
Testing tool graphs,
monitoring tools, Excel pivot
tables

34
VISUALIZING
• Objective:
Gain “forest view” of metrics relative to load
• Turn barrels of numbers into a few pictures
• Vary graph scale & summarization granularity to expose hidden
facts
• ID load point where degradation begins
• ID system tier(s) where bottlenecks appear, limiting resources

35
VISUALIZING
• My key graphs, in order of importance
• Errors over load (“results valid?”)
• Bandwidth throughput over load (“system bottleneck?”)
• Response time over load (“how does system scale?”)
• Business process end-to-end
• Page level (min-avg-max-SD-90th percentile)
• System resources (“how’s the infrastructure capacity?”)
• Server cpu over load
• JVM heap memory/GC
• DB lock contention, I/O Latency

36
INTERPRETING
• Objective:
Draw conclusions from observations, hypotheses
• Make objective, quantitative observations from graphs / data
• Correlate / triangulate graphs / data
• Develop hypotheses from correlated observations
• Test hypotheses and achieve consensus among tech teams
• Turn validated hypotheses into conclusions

37
INTERPRETING
• Observations:
• “I observe that…”; no evaluation at this point!
• Correlations:
• “Comparing graph A to graph B…” – relate observations to each
other
• Hypotheses:
• “It appears as though…” – test these with extended team;
corroborate with other information (anecdotal observations,
manual tests)
• Conclusions:
• “From observations a, b, c, corroborated by d, I conclude that…”

38
SCALABILITY:
RESPONSE TIME OVER LOAD
• Is 2.5 sec / page
acceptable? Need to drill
down to page level to ID
key contributors, look at
90th or 95th percentiles
(averages are misleading)
Two styles for system
scalability; top graph shows
load explicitly on its own y-axis
Note consistent 0.5 sec /
page up to ~20 users
Above that, degrades
steeply to 5x at max load

39
THROUGHPUT PLATEAU WITH LOAD RISING
= BOTTLENECK SOMEWHERE!
• Note throughput
tracking load
through ~45 users,
then leveling off
• Culprit was an
Intrusion Detection
appliance limiting
bandwidth to 60
Mbps
In a healthy system throughput should closely track load

40
BANDWIDTH TRACKING WITH LOAD =
HEALTHY
All 3 web servers show
network interface
throughput tracking
with load throughout
the test
A healthy bandwidth graph looks like Mt. Fuji

41
ERRORS OVER LOAD 
MUST EXPLAIN!
• Note relatively few
errors
• Largely http 404s on
missing resources
Error rate of <1% can be attributed to “noise” and dismissed;
>1% should be analyzed and fully explained
Sporatic bursts of http 500
errors near end of the test
while customer was
“tuning” web servers

42
END USER EXPERIENCE SLA VIOLATIONS
Outlier, not on VPN

43
SLA VIOLATIONS DRILL DOWN
Felipe B. (Brazil, Feb 28th, 7:19AM-1:00PM CST, 10.74.12.55): > 20 second response on
page “Media Object Viewer”.

44
NETWORK THROUGHPUT – RAW GRAPH

45
NETWORK THROUGHPUT - INTERPRETED

46
CAPACITY: SYSTEM RESOURCES - RAW

47
CAPACITY: SYSTEM RESOURCES - INTERPRETED
Monitor resources liberally, provide (and annotate!) graphs
selectively: which resources tell the main story?

48
ASSESSING
• Objective: Turn conclusions into recommendations
• Tie conclusions back to test objectives – were objectives met?
• Determine remediation options at appropriate level – business,
middleware, application, infrastructure, network
• Perform agreed-to remediation
• Re-test
• Recommendations:
• Should be specific and actionable at a business or technical level
• Should be reviewed (and if possible, supported) by the teams that need
to perform the actions (nobody likes surprises!)
• Should quantify the benefit, if possible the cost, and the risk of not
doing it
• Final outcome is management’s judgment, not yours

49
REPORTING
• Objective: Convey recommendations in stakeholders’ terms
• Identify the audience(s) for the report; write / talk in their language
• Executive Summary – 3 pages max
• Summarize objectives, approach, target load, acceptance criteria
• Cite factual Observations
• Draw Conclusions based on Observations
• Make actionable Recommendations
• Supporting Detail
• Test parameters (date/time executed, business processes, load ramp,
think-times, system tested (hw config, sw versions/builds)
• Sections for Errors, Throughput, Scalability, Capacity
• In each section: annotated graphs, observations, conclusions
• Associated Docs (If Appropriate)
• Full set of graphs, workflow detail, scripts, test assets

50
REPORTING
• Step 1: *DO NOT* press “Print” of tool’s default Report
• Who is your audience?
• Why do they want to see 50 graphs and 20 tables? What
will they be able to see?
• Data + Analysis = INFORMATION

51
REPORTING
• Step 2: Understand What is Important
• What did you learn? Study your results, look for
correlations.
• What are the 3 things you need to convey?
• What information is needed to support these 3 things?
• Discuss findings with technical team members: “What
does this look like to you?”

52
REPORTING
• Step 3: So, What is Important?
• Prepare a three paragraph summary for email
• Prepare a 30 second Elevator Summary for when
someone asks you about the testing
• More will consume these than any test report
• Get feedback

53
REPORTING
• Step 4: Preparing Your Final Report: Audience
• Your primary audience is usually executive sponsors
and the business. Write the Summary at the front of the
report for them.
• Language, Acronyms, and Jargon
• Level of Detail
• Correlation to business objectives

54
REPORTING
• Step 5: Audience (cont.)
• Rich Technical Detail within:
• Observations, including selected graphs
• Include Feedback from Technical Team
• Conclusions
• Recommendations

55
REPORTING
• Step 6: Present!
• Remember, no one is going to read the report.
• Gather your audience: executive, business, and
technical.
• Present your results
• Help shape the narrative. Explain the risks. Earn your
keep.
• Call to action! Recommend solutions

56
…REMEMBER: CAVIAR!
C ollecting
A ggregating
V isualizing
I nterpreting
A ssessing
R eporting

57
A FEW RESOURCES
• WOPR (Workshop On Performance and Reliability)
• http://www.performance-workshop.org
• Experience reports on performance testing
• Spring & Fall facilitated, theme-based peer conferences
• SOASTA Community
• http://cloudlink.soasta.com
• Papers, articles, presentations on performance testing
• PerfBytes Podcast
• Mark Tomlinson’s blog
• http://careytomlinson.org/mark/blog/
• Richard Leeke’s blog (Equinox.nz)
• http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard Leeke
• Data visualization
• Scott Barber’s resource page
• http://www.perftestplus.com/resources.htm
• STP Resources
• http://www.softwaretestpro.com/Resources
• Articles, blogs, papers on wide range of testing topics

58
THANKS FOR ATTENDING
Please fill out
an evaluation form
eproegler@soasta.com

Interpreting Performance Test Results

Interpreting Performance Test Results

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Interpreting Performance Test Results

Similaire à Interpreting Performance Test Results (20)

Dernier

Dernier (20)

Interpreting Performance Test Results

Notes de l'éditeur