Measurement and Metrics for Test Managers

Presented by:
Rick raig
Software eering
Brought to you by:

340 Corporate Way, Suite Orange Park, FL 32073
888‐2
MG
PM Tutorial
4/7/2014
1:00 PM

“Measurement and Metrics for Test Managers”

C
Quality Engin

300,
68‐8770 ∙ 904‐278‐0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com

A consultant, lecturer, author, and test manager, Rick Craig has led numerous teams
ny
e has
quent
on.
Rick Craig
ity EngineeringSoftware Qual

of testers on both large and small projects. In his twenty-five years of consulting
worldwide, Rick has advised and supported a diverse group of organizations on ma
testing and test management issues. From large insurance providers and
telecommunications companies to smaller software services companies, h
mentored senior software managers and helped test teams improve their
effectiveness. Rick is coauthor of Systematic Software Testing and is a fre
speaker at testing conferences, including every STAR conference since its incepti

© 2014 SQE Training V3.2 1
Introduction
MEASUREMENT AND
METRICS FOR TEST
MANAGERS
Administrivia
Course
timing
Meals
timing
Electronic
devices Facilities
Smoking Breaks
4© 2014 SQE Training V3.2

Course Agenda
1. Introduction to Software Measurement
2. Metrics—Rules of Thumb
3. A Tester’s Dashboard
4. Estimation (Optional)
1
INTRODUCTION TO
SOFTWARE
MEASUREMENT

What is software measurement?
“It’s easy to get numbers, what is y g ,
hard is to know they are right and
understand what they mean”
— Bill Hetzel
What is software measurement?
“ ifi d b i ”“Quantified observations”
about any aspect of software
(product, process, or project)

Lord Kelvin
“To measure is to know”
“If you cannot measure it, you
cannot improve it”
“Th d d h i“The more you understand what is
wrong with a figure, the more
valuable that figure becomes”
There Are Lots and Lots of Measures
Primitive:
– Aspirins consumed this week
– Number of staff assigned to project A
P f i t ifi ti– Pages of requirements specifications
– Hours worked to accomplish change request X
– Number of operational failures in system Y this year
– Lines of code in program Z
Computed:
Defects per 1000 lines of code in program A– Defects per 1000 lines of code in program A
– Productivity in function points delivered
by person B
– Quality Score for project C
– Average coffee consumption per line of code
– Accuracy of hours worked per week is ± 20%

Common Metrics
• Test defects
• Defects after release
O bl• Open problems
• Open issues
• Schedule performance
• Process compliance (e.g., ISO)
• Test results
• Reliability
• Time fixing problems
• Defects from fixes
• Lines of code
• Plan and schedule changes
Uncommon Metrics
• Code coverage
C l it• Complexity
• Cost of rework
• Cost of quality
Defect age

Basic Definitions
The four Ms:
• Measure
Primitive (raw data)
13, 34, 17, 74 42,
34 56 77 94 34• Measure
• Metric
• Meter
• Meta‐measure
34, 56, 77 94, 34,
45, 63 45, 67, 12,
31 61, 06, 91, 42
Computed (information)
What Makes a Good Measure?
• Simple
Obj i• Objective
• Easily collected
• Robust
• Valid

What Can Measures Do for You?
• Facilitate estimation
• Identify risky areasy y
• Measure testing status
• Measure/predict product quality
• Measure test effectiveness
• Identify training opportunitiesIdentify training opportunities
• Identify process improvement opportunities
• Provide “meters” to flag actions
2
METRICS—RULES OF
THUMB

Metrics--Rules of Thumb
• The Human Element
• The Basics
• KISS
• And a Myth or Two
The Human Element
•Without buy‐in, metrics
may be falsified
•Without buy‐in, metrics
may be ignored
Buy‐in
is key

Class Discussion
How do you obtain buy‐in?
Ways to Obtain Buy-in
• Training
• Metrics
• Feedback loops
• ReviewsReviews
• Participation

The Human Element
• Measure processes and products instead of
l if iblpeople if possible
• Beware of the dark side of the Hawthorne
Effect
Two Sides of Measurement
…the
information
…the information
will help me
may be used
against me.
p
understand what
is going on and do
a better job.

The Hawthorne Effect
Measuring people improves their productivity
The Human Element
Tailor metrics to the audience
Users, managers, practitioners all have
different languages
Set the appropriate level of detailpp p
How you present the material matters

Who is your audience?
Users
Developers
Testers
% of Red Cars Soars
26
26 0
26.1
25.5
25.4
25.4
25.5
25.6
25.7
25.8
25.9
26.0
2008 2009 2010
25.1
25.2
25.3

% of Red Cars Soars?
100
50
75
25.5 2625 4
2008 2009 2010
25
25.4
The Human Factor
Training is required
Metrics are not second nature
Your metrics are affected by how they are collected
Establish range of expected values
Publish historical values

The Basics
• Use a metric to validate a metric
• Use meta‐measures
• Use meters when possible
• Consistency sometimes trumps accuracy
• Subjective is good; objective is better
KISS ― Keep It Simple Sir
• More is not always better
• All metrics are not foreverAll metrics are not forever
– Consider temporary metrics
– Consider sampling
• Automate collection when possible

3
’A TESTER’S DASHBOARD
A Dashboard

Establish a Dashboard
• Easy to use/understand at a glance
Quality of productQuality of product
Status
Test effectiveness
Resources
* Remember you need at least two metrics per “instrument”
Issues
Measures of Quality
• It is difficult to develop practical measures of
qualityquality
• The cost to achieve various quality levels must be
taken into account
• Many quality metrics are relatively subjective
• Quality goals will be affected by the industry and
corporate culture

What Is Quality?
•Meeting requirements
(stated and/or implied)Quality
Sample Quality Factors and Criteria
• Correctness
• Reliability
• Testability
• Flexibility
• Usability
• Portability
• Interoperability
• Efficiency
Correctability
Correctness
Correctness
Correctness
• Integrity
• Maintainability
• Revisability
• Survivability
Correctness
Correctness
Correctness
Correctness

Defect Density/Clustering
# of
Defects
per
1,000
Lines of
Code
Module Name
D B A C E F
Defect Density
Issues
Coverage of tests
Weighting of defects
Weighting by relative risk
What to use as the denominator

Effect of Complexity on Quality
asebilityofPost-relea
Defect
Complexity
Probab
Other Measures of Product Quality
• Customer satisfaction
• Repeat customers?
• Referrals?
• Calls to the help desk?
• Timeliness?
• Defect age?
• Complexity?
• Rework?
• Reliability?

Quality of Product
• Record any current measures of product
quality that you are using here. Give them a q y y g
grade for effectiveness (A, B, C, etc.)
• Any new metrics you would use?
* Remember you need at least two metrics per instrument
• Easy to use/understand at a glance:
Status
Test effectiveness
Resources
Issues

Status Reporting
• The Master Test Plan should
specify
– What to report
– How often
– To whom
Common Test Status Metrics
% of Test Cases Executed
IIssues:
• Weighting of TC by coverage metrics
• Weighting of TC by risk
• Weighting of TC by execution effort
• Weighting of TC by time to executeWeighting of TC by time to execute
What do you really want to know?

Sample Test Status Report (raw data)
Project: Online-Trade Date: 4/23/2009
Feature Total # % # %
Tested Tests CompleteCompleteSuccess Success
Open Acct 46 46 100 41 89
Sell Order 36 25 69 25 69
Buy Order 19 17 89 12 63
…..
…..
…..
…..
Totals 395 320 81 311 79
Open and Closed Over Time
Incoming
Released
30
40
2 4 6 8 10 12 14 16 18 20
Weeks
Defects
Fixed
0
10
20
24
22
20
18
16
14
12
veDefects
Detected
12
10
8
6
4
2
0
Days
Cumulativ
Open
0 10 20 30 40

When Is the Software “Good Enough”?
• Test exit criteria met
When to stop testing
• Return On Investment (ROI) not sufficient
• Defect arrival rate
• Resources exhausted
– Time
– Moneyy
• Profiles (based on failures encountered
using profiles of real data)
• Project cancelled!
Software Psychology
What is “good enough”?
# of# of
Bugs
Time

Economics of Test and Failure
Source: IBM Systems Sciences Institute
Stopping Criteria ― Revisited
Abnormal
• Resource exhaustion
– Schedule
Normal
• Test set exit criteria
• Remaining defects
– Budget
– System access
– Patience
• Project redirection
e a g de ects
estimation criteria
– Defect history of past software
– Defect history of current item
– Software complexity
– Combination of these
• Diminishing return
criteria
– Cost to Detect Next Defect
• Combined criteria
“There is no single, valid, rational criterion for stopping. Furthermore, given
any set of applicable criteria, how each is weighed depends very much on
the product, the environment, the culture, and the attitude to risk.”
— Boris Beizer

Test Summary Report
• Report identifier
• References
– Test items (with revision #s)
Adequacy assessment
Evaluation of coverage
Identify uncovered attributes
Summary of activitiesTest items (with revision #s)
– Environments
– References
• Variances (deviations)
– From test plan or
requirements
– Reasons for deviations
• Summary of incidents
Summary of activities
System/CPU usage
Staff time
Elapsed time
Software evaluation
Limitations
Failure likelihood
Approvals
Summary of incidents
– Resolved incidents
– Defect patterns
– Unresolved incidents
Status
• Record any current test status measures that you
are using here. Give them a grade for effectiveness
(A B C etc )(A, B, C, etc.)

Status
Test effectiveness
Resources
Issues
How Do You Measure Test Effectiveness?

A Common Answer
– Coverage
– Defect age (phase or product version)Defect age (phase or product version)
– # of bugs
– Defect density
– Defect removal efficiency
– Defect seeding
– Mutation analysis
– Customer complaints
Three Major Categories

Customer Satisfaction Measures
Issues
Who to ask
“After the fact”
Diffi l i iDifficulty in measuring
Doesn’t differentiate between the
effectiveness of development and testing
Customer Satisfaction Measures
• Subjective is good
• Objective is betterObjective is better

Defect Measures
• Why is it important to track defects?
• What are some ways to analyze defects?• What are some ways to analyze defects?
• DDP
• Defect density
• Defect age
Why is it important to track defects?
• Identify process improvement
• Identify training needsy g
• Identify problematic (high‐risk) areas
• Determine test status

Defect Analysis ― Example
• Phase
• Type• Type
• Severity
• Priority
• Author
A• Age
• Module
Defect Detection Percentage (DDP)
D f t Di d
DDP =
Defects Discovered
x 100%
Defects at Start
85% is the average DRE for US softwareg
projects greater than 1,000 function
points in size.
— Capers Jones

Defect Detection Percentage (DDP)
Issues
Severity and distribution of defects
How to know when all bugs are found
“After the fact”
What constitutes bug‐finding activities?
Some bugs cannot be found in testing
Defect “Value” (Cost Avoidance)
Requirements 1
When discovered Typical hours to rework/fix
High level design 1
Detailed design 1
Code 1
Unit Test 3 – 5
Integration test 5 –10
System/acceptance test 10 – 30
Production 20 – 60+

Defect Age (PhAge)
Phase
discovered
Requirements
High level design
0
Phase
created
1 432 98765
0 321 87654
Detailed design
Coding
0 321 87654
210 76543
10 65432
Defect Age
IssuesIssues
Difficult to do root cause
Requires weighting of defectsq g g
How to handle latent/masked defects

Coverage Measures
Discussion
Requirements vs. design vs. code coverage
Completeness/accuracy of test basis
Coverage of test set vs. coverage of testsg g
executed (e.g., we don’t always run every test)
Coverage vs. actual results (DDP)
Mapping Test Cases to Requirements
Requirements spec.
3 5 1 3 2
Test plan
T t C #33.5.1.3.2
…..
3.5.1.4.7
…..
3.6.4.2.1
Test Case #3
…..
Test Case #5
…..
Test Case #12
…..
3.8.2.7.1
…..
Test Case #19

Requirements/Design Coverage
Test Case
Conceptual model of requirements/ design
coverage:
Test Case
1 2 3 Covered?
Requirement A X X Y
B N
C X X Y
Feature A X Y
B X X YB X X Y
Design A X X Y
B X Y
C N
D X X Y
Requirements/Design Coverage
Issues
Only as good as test basis
Relatively low coverage of code
Code coverage achieved with requirements tests
b k ( )Major bank (20 apps) 20%
Major DBMS vendor 47%
Major h/w s/w vendor 60%
— Source: Bender and Associates

Code Coverage
Test run
Conceptual model of code coverage
Test run
1 2 3 Covered?
Statement A X X X Y
B X X Y
C X Y
D N
E X YE X Y
60% 20% 60% 80%
Code Coverage
Issues
Requires a tool
Doesn’t prove the code actually “works”
correctly
Did we test the “right code”?
Statement vs. branch vs. path

Test Effectiveness
• Record any current test effectiveness measures
that you are using here. Give them a grade for
ff ti (A B C t )effectiveness (A, B, C, etc.)
Status
Test effectiveness
Resources
Issues

Resources
• Resource estimates/consumption are necessary
in order to do test planning, estimation,
budgeting and staffingbudgeting, and staffing
• You must consider the level of granularity in the
collection of these metrics based on the accuracy
of the required metrics and your ability to
validate them
• Some people choose to exclude the resources
instrument from the dashboard because they feel
it is not a “day to day” metric
Resources
Resource metrics are normally collected Resource metrics are normally collected
in terms of
Actual/expected budget
Actual/expected engineering hours
Test environment utilization/availability
Staffing levelsStaffing levels
Contractor availability
Other hardware/software resources

Resources
• Record any current resource measures that
you are using here. Give them a grade for
ff ti (A B C t )effectiveness (A, B, C, etc.)
Quality of ProductQuality of Product
Status
Test Effectiveness
Resources
Issues

Issues
• This is included to address any important
items not otherwise included on theitems not otherwise included on the
dashboard. These are normally subjective and
not necessarily conducive to systematic
analysis
I ld i l t i i i t ll ti f• Issues could involve training, installation of
new hardware/software, politics—even the
weather
A Sample Tester’s Dashboard
Status
• % completion
• Defect info
Product Quality
• Defect density
• Performance, etc.
Test
Effectiveness
• DDP
• Coverage
Resources
E i i h• Engineering hours
• Money
Issues

Avoiding Dysfunction
• Measure processes and products—not people!
• Beware of the dark side of the Hawthorne EffectBeware of the dark side of the Hawthorne Effect
• Remember that more is not always better
• Avoid the exclusive use of top‐down metrics
• Provide training—not all metrics are intuitive
• Consider temporary metricsConsider temporary metrics
Avoiding Dysfunction
• Define each metric, its use, who will see it, expected
ranges etcranges, etc.
• Remember your audience and tailor to their needs
• Always seek multiple interpretations
• Ask your audience what their interpretation of a
metric is before you offer yoursmetric is before you offer yours
• Sell, sell, sell, sell

One Truth and One Myth in Closing
The Truth
Gilb’s Law
“Anything you need to quantify can be
measured in some way that is
superior to not measuring it at all”
—Tom Gilb
The Myth
“Some metrics are always better than
no metrics …”
4
ESTIMATION
(OPTIONAL)

Estimation
Estimate:
1. A tentative evaluation or rough calculation  g
2. A preliminary calculation of the cost of a project
3. A judgment based upon one’s impressions; opinion
—The American Heritage Dictionary
It is very difficult to make a vigorous plausible and job‐It is very difficult to make a vigorous, plausible, and job
risking estimate that is derived by no quantitative method,
supported by little data and certified chiefly by hunches of the
managers.
— Fred Brooks
Test Estimation
Estimation: the creation of an approximate
target for costs and completion dates
The best estimates
Represent the collective wisdom of practitioners and
have their buy‐in
g p
Provide specific, detailed catalogs of the costs,
resources, tasks, and people involved
Present, for each activity estimated, the most likely cost,
effort, and duration

Test Estimation (cont.)
Factors that can influence cost, effort, and duration include:
Required level of quality of the system
Size of the system to be tested
Historical data
Process factors (process maturity, etc.)Process factors (process maturity, etc.)
Material factors (tools, data, etc.)
People factors (skills, experience, managers, etc.)
Test Estimation (cont.)
• Delivery of estimates should include
j tifi tijustification
• Negotiation and re‐work of estimates is
normal
• Final estimates represent a balance of
organizational and project goals in the areasorganizational and project goals in the areas
of quality, schedule, budget, and features

How Good Is Our Industry (at Estimating)?
• Tata: 62% of projects fail to meet schedule
49% have budget overruns
• Moiokken and Jorgensen: 30-40% overruns
Class Discussion
Why is estimating not done well?
Y t fiYour top five reasons:
1) Too many variables____________________
2) ____________________________________
3) ____________________________________
4) ____________________________________
5) ____________________________________

Why Estimates Are Inaccurate ― Part I
• Lack of estimating experience
• Lack of historical data on which to base estimates
• Lack of systematic estimation process sound• Lack of systematic estimation process, sound
techniques, or models suited to the project
• Failure to include essential activities and products
within the scope of the estimates
• Unrealistic expectations or assumptions
• Failure to recognize and address the uncertainty
inherent in project estimatesinherent in project estimates
Practical Software Measurement
Addison‐Wesley, 2001
Why Estimates Are Inaccurate ― Part II
• Lack of education and training
• Confusing the target with the estimate
• Hope‐based planning
• Inability to communicate and support
estimates
• Incomplete, changing, and creeping
requirements
• Quality surprises (test and re‐fix)• Quality surprises (test and re‐fix)
—adapted from Linda M. Laird
The Limitations of Estimation

Bohem’s Cone of Uncertainty
NHC Track Forecast Cone

“Testing” Track Forecast Cone
(or why it is important to constantly re-estimate)
Tasks
T i m e
The Fantasy Factor
Today 1st 3rd
2nd
Weeks
0 1 2 3 4 5 6 7 8 9
What would have to happen to deliver this in four
weeks?weeks?
What should the estimate have been?

Estimation
1, 2, 3, or 4 Variables + Many Modifiers:
Time
If it’s not variable, then it’s fixed.
Size Resources
Time vs. Resources
=

Test Estimation Techniques ― Examples
• Intuition and guess
• Work‐breakdown‐structures
• Three‐point estimates
• Company standards and norms
• % of project effort or staffing
• Industry averages and predictive models (e.g., FP, TPA )
• Team estimation sessions
d b d l h– Wideband Delphi
– Story point sizing
– Poker estimation
– T‐shirt sizing
Karl Wiegers’s Estimation Safety Tips
• A goal is not an estimate
• The estimate you produce should be unrelated
to what you think the requester wants to hear
• The correct answer to any request for an
estimate is “Let me get back to you on that”estimate is “Let me get back to you on that”
• Avoid giving single point estimates
• Incorporate contingency buffers into estimates

Rick Craig’s Tips for Better Estimates
• Do it!
• Collect metrics
• Remember the “fantasy” factor• Remember the fantasy factor
• Don’t “pad” your estimates*
• Don’t spend a ton of time
• Estimates don’t have to be perfect
– Estimates are just estimates
– They will change/constantly as you re‐estimate
– Remember planning risks and contingenciesp g g
– Remember Brooke’s Law
• If the date is fixed, estimate something else
• Use tools
• Use ranges of value instead of discrete numbers

Measurement and Metrics for Test Managers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Measurement and Metrics for Test Managers

Similar to Measurement and Metrics for Test Managers (20)

More from TechWell

More from TechWell (20)

Recently uploaded

Recently uploaded (20)

Measurement and Metrics for Test Managers