This document provides guidance on interpreting and reporting performance test results. It discusses collecting various metrics like load, errors, response times and system resources during testing. It emphasizes aggregating the raw data into meaningful statistics and visualizing the results in graphs to gain insights. Key steps in the process include interpreting observations and correlations to develop hypotheses, assessing conclusions to make recommendations, and reporting the findings to stakeholders in a clear and actionable manner. The overall approach is to turn large amounts of data into a few insightful pictures and conclusions that can guide technical or business decisions.
2. 2
ABOUT ME
• 20 Years in Software, 14 in Performance, Context-
Driven for 12
• Performance Engineer/Teacher/Consultant
• Product Manager
• Board of Directors
• Lead Organizer
• Mentor
3. 3
1. Discover
• ID processes & SLAs
• Define use case
workflows
• Model production
workload
3. Analyze
• Run tests
• Monitor system
resources
• Analyze results
5. Report
• Interpret results
• Make recommendations
• Present to stakeholders
2. Develop
• Develop test scripts
• Configure environment
monitors
• Run shakedown test
3
4. Fix
• Diagnose
• Fix
• Re-test
DAN DOWNING’S 5 STEPS OF LOAD TESTING
4. 4
ABOUT THIS SESSION
• Participatory!
• Graphs from actual projects – have any?
• Learn from each other
• Not About Tools
• First Half
• Making observations and forming hypotheses
• Break (~10:00)
• Second Half
• Interpreting and reporting actionable results
24. 24
WHAT ARE WE LOOKING AT HERE?
When does this
become a problem?
When heap space
utilization keeps
growing, despite
garbage collection, and
reaches its max
allocation
25. 25
ANY HYPOTHESES ABOUT THIS?
Before:
Abnormally high TCP
retransmits between
Web and App server
After:
Network issues
resolved
31. 31
…Think “CAVIAR”
C ollecting
A ggregating
V isualizing
I nterpreting
A ssessing
R eporting
FOR ACTIONABLE PERFORMANCE
RESULTS…
32. 32
COLLECTING
• Objective: Gather all results from test that
• help gain confidence in results validity
• Portray system scalability, throughput & capacity
• provide bottleneck / resource limit diagnostics
• help formulate hypotheses
Types Examples Sources Granularity Value
Load
Users | sessions simulated,
files sent, "transactions"
completed
test "scenario", "test harness"
counts, web / app logs, db
queries
At appropriate levels (virtual
users, throughput) for your
context
Correlate with Response times
yields "system scalability"
Errors
http, application, db,
network; "test harness"
web | app logs, db utility,
network trace
Raw data at most granular level Confidence in results validity
Response
Times
page | action / "transaction" /
end-to-end times
test tools | "scripts", web logs
At various levels of app
granularity, linked to objectives
Fundamental unit of measure
for performance
Resources
network, "server",
"middleware", database,
storage, queues
OS tools (vmstat, nmon, sar,
perfmon), vendor monitoring
tools
5-15 second sampling rates, with
logging, to capture transient
spikes
Correlated with Response
times yields "system capacity"
Anecdotes
manual testing, transient
resources, screenshots
People manually testing or
monitoring during the test
Manual testing by different
people & locations
Confidence / corroberation /
triangulation of results
33. 33
AGGREGATING
• Objective: Summarize measurements using
• Various sized time-buckets to provide tree & forest views
• Consistent time-buckets across types to enable accurate correlation
• Meaningful statistics: scatter, min-max range, variance, percentiles
• Multiple metrics to “triangulate”, confirm (or invalidate) hypotheses
Types Examples Tools Statistics Value
Load
Users/sessions; requests; no.
files / msgs sent/rcvd
avg
Basis for ID'ing load sensitivity
of all other metrics
Errors
Error rate, error counts; by url
|type; http 4xx & 5xx
avg-max
ID if errors correlate with load
or resource metrics
Response
Times
workflow end-to-end time;
page/action time
min-avg-max-Std Deviation
90th percentile
Quantify system scalability
Network
Thruput
Megabits/sec. avg-max
ID if thruput plateaus while
load still ramping, or exceeds
network capacity
App
Thruput
Page view rate; completed
transactions by type
avg-max
ID if page view rate can be
sustained, or if "an hours work
can be done in an hour"
Resources
% cpu; cpu, disk queue depth;
memory usage; IOPS; queued
requests; db contention
avg-max
ID limiting resources; provide
diagnostics; quantify system
capacity
Testing tool graphs,
monitoring tools, Excel pivot
tables
34. 34
VISUALIZING
• Objective:
Gain “forest view” of metrics relative to load
• Turn barrels of numbers into a few pictures
• Vary graph scale & summarization granularity to expose hidden
facts
• ID load point where degradation begins
• ID system tier(s) where bottlenecks appear, limiting resources
35. 35
VISUALIZING
• My key graphs, in order of importance
• Errors over load (“results valid?”)
• Bandwidth throughput over load (“system bottleneck?”)
• Response time over load (“how does system scale?”)
• Business process end-to-end
• Page level (min-avg-max-SD-90th percentile)
• System resources (“how’s the infrastructure capacity?”)
• Server cpu over load
• JVM heap memory/GC
• DB lock contention, I/O Latency
36. 36
INTERPRETING
• Objective:
Draw conclusions from observations, hypotheses
• Make objective, quantitative observations from graphs / data
• Correlate / triangulate graphs / data
• Develop hypotheses from correlated observations
• Test hypotheses and achieve consensus among tech teams
• Turn validated hypotheses into conclusions
37. 37
INTERPRETING
• Observations:
• “I observe that…”; no evaluation at this point!
• Correlations:
• “Comparing graph A to graph B…” – relate observations to each
other
• Hypotheses:
• “It appears as though…” – test these with extended team;
corroborate with other information (anecdotal observations,
manual tests)
• Conclusions:
• “From observations a, b, c, corroborated by d, I conclude that…”
38. 38
SCALABILITY:
RESPONSE TIME OVER LOAD
• Is 2.5 sec / page
acceptable? Need to drill
down to page level to ID
key contributors, look at
90th or 95th percentiles
(averages are misleading)
Two styles for system
scalability; top graph shows
load explicitly on its own y-axis
Note consistent 0.5 sec /
page up to ~20 users
Above that, degrades
steeply to 5x at max load
39. 39
THROUGHPUT PLATEAU WITH LOAD RISING
= BOTTLENECK SOMEWHERE!
• Note throughput
tracking load
through ~45 users,
then leveling off
• Culprit was an
Intrusion Detection
appliance limiting
bandwidth to 60
Mbps
In a healthy system throughput should closely track load
40. 40
BANDWIDTH TRACKING WITH LOAD =
HEALTHY
All 3 web servers show
network interface
throughput tracking
with load throughout
the test
A healthy bandwidth graph looks like Mt. Fuji
41. 41
ERRORS OVER LOAD
MUST EXPLAIN!
• Note relatively few
errors
• Largely http 404s on
missing resources
Error rate of <1% can be attributed to “noise” and dismissed;
>1% should be analyzed and fully explained
Sporatic bursts of http 500
errors near end of the test
while customer was
“tuning” web servers
47. 47
CAPACITY: SYSTEM RESOURCES - INTERPRETED
Monitor resources liberally, provide (and annotate!) graphs
selectively: which resources tell the main story?
48. 48
ASSESSING
• Objective: Turn conclusions into recommendations
• Tie conclusions back to test objectives – were objectives met?
• Determine remediation options at appropriate level – business,
middleware, application, infrastructure, network
• Perform agreed-to remediation
• Re-test
• Recommendations:
• Should be specific and actionable at a business or technical level
• Should be reviewed (and if possible, supported) by the teams that need
to perform the actions (nobody likes surprises!)
• Should quantify the benefit, if possible the cost, and the risk of not
doing it
• Final outcome is management’s judgment, not yours
49. 49
REPORTING
• Objective: Convey recommendations in stakeholders’ terms
• Identify the audience(s) for the report; write / talk in their language
• Executive Summary – 3 pages max
• Summarize objectives, approach, target load, acceptance criteria
• Cite factual Observations
• Draw Conclusions based on Observations
• Make actionable Recommendations
• Supporting Detail
• Test parameters (date/time executed, business processes, load ramp,
think-times, system tested (hw config, sw versions/builds)
• Sections for Errors, Throughput, Scalability, Capacity
• In each section: annotated graphs, observations, conclusions
• Associated Docs (If Appropriate)
• Full set of graphs, workflow detail, scripts, test assets
50. 50
REPORTING
• Step 1: *DO NOT* press “Print” of tool’s default Report
• Who is your audience?
• Why do they want to see 50 graphs and 20 tables? What
will they be able to see?
• Data + Analysis = INFORMATION
51. 51
REPORTING
• Step 2: Understand What is Important
• What did you learn? Study your results, look for
correlations.
• What are the 3 things you need to convey?
• What information is needed to support these 3 things?
• Discuss findings with technical team members: “What
does this look like to you?”
52. 52
REPORTING
• Step 3: So, What is Important?
• Prepare a three paragraph summary for email
• Prepare a 30 second Elevator Summary for when
someone asks you about the testing
• More will consume these than any test report
• Get feedback
53. 53
REPORTING
• Step 4: Preparing Your Final Report: Audience
• Your primary audience is usually executive sponsors
and the business. Write the Summary at the front of the
report for them.
• Language, Acronyms, and Jargon
• Level of Detail
• Correlation to business objectives
54. 54
REPORTING
• Step 5: Audience (cont.)
• Rich Technical Detail within:
• Observations, including selected graphs
• Include Feedback from Technical Team
• Conclusions
• Recommendations
55. 55
REPORTING
• Step 6: Present!
• Remember, no one is going to read the report.
• Gather your audience: executive, business, and
technical.
• Present your results
• Help shape the narrative. Explain the risks. Earn your
keep.
• Call to action! Recommend solutions
57. 57
A FEW RESOURCES
• WOPR (Workshop On Performance and Reliability)
• http://www.performance-workshop.org
• Experience reports on performance testing
• Spring & Fall facilitated, theme-based peer conferences
• SOASTA Community
• http://cloudlink.soasta.com
• Papers, articles, presentations on performance testing
• PerfBytes Podcast
• Mark Tomlinson’s blog
• http://careytomlinson.org/mark/blog/
• Richard Leeke’s blog (Equinox.nz)
• http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard Leeke
• Data visualization
• Scott Barber’s resource page
• http://www.perftestplus.com/resources.htm
• STP Resources
• http://www.softwaretestpro.com/Resources
• Articles, blogs, papers on wide range of testing topics